INTRODUCTION

Fixed effects estimates are commonly used with panel data to address omitted variable concerns (Alimov, 2015; Ding, Fan, & Lin, 2018; Gooris & Peeters, 2016; Meyer & Sinani, 2009). The problem is that they may yield biased estimates when dynamic endogeneity is present. Dynamic endogeneity arises when the current value of an independent variable is affected by past values of the dependent variable, violating the assumption of strict exogeneity in fixed effects estimators (Arellano & Bond, 1991; Nickell, 1981). A strict exogeneity assumption requires that current observations of the independent variables be completely independent of the previous values of the dependent variable (Wintoki, Linck, & Netter, 2012). In international business research, where the internationalization process is seen as dynamic, this assumption is not always satisfied (Buckley, 1990; Johanson & Vahlne, 1977; Welch & Luostarinen, 1988; Yang, Li, & Delios, 2015). In this case, the widely used fixed effects estimations can generate biased estimates and lead to invalid conclusions.

We analyze 80 quantitative articles using regression analysis that the Journal of International Business Studies (JIBS) published between 2015 and 2017.1 We find that dynamic endogeneity problems are common, especially in some research domains. The hypothesized relationships in 58 of the studies (73%) suggest possible dynamic endogeneity problems, yet that possibility is not clearly stated in 46 of them. Among the 12 papers in which there is any mention at all of a potential problem, just three applied a dynamic panel model and only one of those used the estimator in a rigorous way by reporting the results of tests for autoregression.

When dealing with panel data, the fixed effects approach is not “state of the art” if there is dynamic endogeneity. A more rigorous approach is clearly needed. As well as in international business, dynamic panel models have been widely used in finance (Cremers, Litov, & Sepe, 2017; Hoechle, Schmid, Walter, & Yermack, 2012), and we also see them used in some recent articles published in the Strategic Management Journal (Girod & Whittington, 2017; Oehmichen, Schrapp, & Wolff, 2017).

In the remainder of this article we suggest ways of dealing with dynamic endogeneity. First we explain why the commonly used fixed effects estimator is biased when there is dynamic endogeneity. We then introduce a well-developed dynamic model for panel data as well as the difference generalized method of moments (GMM) estimator, the level GMM estimator, and the system GMM estimator. We use simulations to evaluate the bias introduced by using a fixed effects estimator and then compare the bias, efficiency, and power of the fixed effects estimator with that of the system GMM estimator. To illustrate the proper use of a GMM estimator, we conduct an empirical study of the relationship between the foreign experience of a firm’s top management team and board of directors and the firm’s level of internationalization. We also compare the ability of five econometric models to handle different sources of endogeneity. As with other Journal of International Business Studies editorials, the goal of this one is to develop more rigorous methods for dealing with endogeneity problems in International Business (Bello, Leung, Radebaugh, Tung, & van Witteloostuijn, 2009; Chang, van Witteloostuijn, & Eden, 2010; Lindner, Puck, & Verbeke, 2020; Meyer, van Witteloostuijn, & Beugelsdijk, 2017; Reeb, Sakakibara, & Mahmood, 2012; Zellmer-Bruhn, Caligiuri, & Thomas, 2016). Similar articles have appeared in the Strategic Management Journal (Bettis, Gambardella, Helfat, & Mitchell, 2014; Semadeni, Withers, & Certo, 2014; Wolfolds & Siegel, 2019).

There are several types of endogeneity that crop up in management research: omitted variables, measurement error, simultaneity, and non-random sample selection. The instrumental variable approach is widely used to address these different types of endogeneity (Wooldridge, 2002). The difference-in-differences technique is gaining in popularity, but using it depends on finding a natural experiment (see Chen, Crossland, & Huang, 2016; Fracassi & Tate, 2012). Heckman modeling is widely used to address sample selection bias (Heckman, 1976). Even though endogeneity problems have been in the limelight (Alimov, 2015; Bu & Wagner, 2016; Gooris & Peeters, 2016), dynamic endogeneity remains insufficiently addressed.

As we write above, dynamic endogeneity problems occur when the current value of an independent variable is affected by past values of the dependent variable. According to Wooldridge (2002: 51), “Simultaneity arises when the explanatory variable is determined simultaneously along with the dependent variable.” Dynamic endogeneity can be considered a type of simultaneity problem (Wooldridge, 2002).

Previous studies investigating the performance of the system GMM estimator (Wintoki et al., 2012; Wooldridge, 2002) suggest that it is a better way to deal with dynamic endogeneity than a fixed effects estimator. The GMM estimator does, however, have limitations, although they have barely been discussed in the IB literature. Our simulation shows the advantages and disadvantages of this estimator compared with a fixed effects one, then we use an empirical example to demonstrate a rigorous way to use a GMM estimator. We also discuss methods such as Heckman two-stage modeling, instrumental variables, and difference-in-differences (DID) estimation, and how to use them.

CONTENT ANALYSIS OF PUBLISHED IB RESEARCH

We looked at 80 empirical articles published by JIBS between 2015 and 2017 that use regression analysis and on the basis of their independent and dependent variables assigned them to specific International Business research domains (see Table 1). We ascertained whether a study’s hypothesized relationships were at risk of dynamic endogeneity. As the table shows, only a small percentage of articles in each domain handled the risk of endogeneity by appropriately using a GMM estimator. This suggests a general lack of attention across the board to dynamic endogeneity in international business research. For example, in domain one – activities, strategies, structures and decision-making processes of multinational enterprises – only five of the 30 articles (17%) with hypothesized relationships at risk of dynamic endogeneity clearly draws attention to that risk. In only one was a GMM estimator used to handle it. The problem was prevalent in domains one, two, three and five, less so in domains four and six.

Table 1 Empirical papers published in JIBS (2015 to 2017)

We used the following heuristics to identify the articles at risk of dynamic endogeneity problems.2 While the general criterion was whether the values of the independent variables in the current period might be affected by the values of dependent variables in previous periods, studies where the independent variables were time-invariant or changed very slowly were assumed to face little risk of dynamic endogeneity. Such independent variables can be historical facts, national culture or institutions, a firm’s early experiences, and so on. For example, Gao, Wang, and Che (2018) examined the effects of historical conflicts between China and Japan (e.g., the Sino-Japanese War) on the location choices of Japanese investors in China. Such historical facts do not change over time. Likewise, dynamic endogeneity is not a problem when studying the effects of national culture on corporate governance (Griffin, Guedhami, Kwok, Li, & Shao, 2017) because culture is relatively stable. Similarly, a paper on the effects of MNEs’ initial entry locations on their subsequent location choices (Stallkamp, Pinkham, Schotter, & Buchel, 2018) is unlikely to be at risk of dynamic endogeneity because the independent variable – initial entry location – happens only once.

A second case is where the independent variables are at a higher level than the dependent variables (e.g., the independent variables are at a country or region level while the dependent variables are at the firm level). Firm-level dependent variables are generally unlikely to affect country-level or region-level independent variables so the risk of dynamic endogeneity is low. Research on the impact on dividend payments of an international political crisis – such as the one caused by Iran’s nuclear program – is one example (Huang, Wu, Yu, & Zhang, 2015), since dividend payments are unlikely to have much influence on an international political crisis. There are, however, at least two exceptions. One is when the higher-level independent variable is an aggregate of lower-level variables. For example, in a study of the effects of country-level accounting conservatism on the pricing of IPOs (Boulton, Smart, & Zutter, 2017), the national conservatism construct was based on each firm’s speed in recognizing bad news relative to good news. Such a construct could be affected by a firm’s IPO underpricing. Another exception is when country-level factors (e.g., tax policy) can affect firm outcomes (e.g., costs), but those outcomes can also drive firms to take countermeasures (e.g., relocating their headquarters) (Witt & Lewin, 2007).

If a study’s independent and dependent variables are at the same level (e.g., both at the firm level) and the independent variables are time-variant (e.g., a firm’s R&D expenditures), then one must apply theory and logic to make a judgement about the risk of dynamic endogeneity. For instance, a study examining how a shared language between headquarters and a subsidiary affects the flow of knowledge between these two entities (e.g., Reiche, Harzing, & Pudelko, 2017) is at risk of dynamic endogeneity because a high level of knowledge transfer to the subsidiary is likely to lead headquarters to choose as subsidiary manager someone who speaks its language. Correct categorization clearly depends on the theoretical foundations of the relationships between the variables. That to some extent explains why studies dealing with topics in domains one, two, three and five of Table 1 are at greater risk of dynamic endogeneity. Since dynamic endogeneity occurs when firm-level dependent variables in previous periods affect firm-level independent variables in the current period, this is less likely to be the case in domains four and six, where the independent variables are more likely to be regional or national factors such as culture and institutions, and hence not easily influenced by lower-level dependent variables.

Some of the articles that we identified as being at risk of dynamic endogeneity problems did discuss that risk, but others did not. For example, it is not mentioned in the previously described article on the relationship between shared language and knowledge transfer (Reiche et al., 2017). Another article (Kingsley & Graham, 2017), which examined how information voids in emerging countries – measured by analyst coverage of domestic firms and government transparency – affect the amount of inward foreign direct investment (FDI) received by that country, is at risk of dynamic endogeneity because inward FDI can bring knowledge and ideas which in turn can increase the host country government’s transparency and bring more analyst coverage as well. The authors are clearly aware of this and address the possibility of reverse causality: “We lag all independent variables by 1 year to reduce the risk that our results are driven by reverse causation” (Kingsley & Graham, 2017: 338). However, they do not use a GMM estimator. Their dynamic linear panel model simply controls for the lagged dependent variable. As will be explained below, that treatment can often lead to a biased estimation. Belderbos, Lokshin and Sadowski (2015), on the other hand, used a GMM estimator in their article on how a firm’s foreign research and development affects its profitability. Dynamic endogeneity may arise here because profits can be used to finance foreign R&D investments (Un & Cuervo-Cazurra, 2008). Although Belderbos and his colleagues use a GMM estimator, they do not discuss the possibility of second-order serial correlation or potential over-identification. By contrast, Miletkov, Poulsen and Wintoki (2017) is a good example of using the GMM estimator correctly. They examine the relationship between a firm’s level of international sales and its hiring of foreign independent directors. Such a relationship could be endogenous because foreign independent directors can boost a firm’s foreign sales by identifying opportunities in international markets. To handle this potential endogeneity, not only do the authors apply a GMM estimator and compare it with other estimates, but they also report several statistical tests of the GMM estimate, namely the results of autoregression (AR) tests for serial correlation and that of Hansen’s J test for over-identification.

In sum, it appears that on the whole there has not been adequate attention paid to dynamic endogeneity in empirical research in international business, albeit that the severity of the problem varies across IB domains depending, for instance, on whether the independent variables are regional-level or country-level factors such as culture and institutions. Why is the commonly used fixed effects estimator biased, and how can the GMM estimator better address this?

A DYNAMIC PANEL MODEL AND THE GMM ESTIMATOR

Many studies have shown that traditional fixed effects estimation can help reduce bias arising from omitted variables (Alimov, 2015; Peterson, Arregle, & Martin, 2012; Wooldridge, 2010). It can, however, also lead to biased estimates if the underlying economic process is dynamic. Mathematically, in a panel data model with the fixed effects estimator,

$$y_{it} = \alpha + \beta X_{it} + u_{i} + \varepsilon_{it} \left( {t = 1, 2, \ldots , T} \right),$$
(1)

where \(u_{i}\) is the time-invariant unobserved effect of an individual observation, and \(\varepsilon_{it}\) denotes the idiosyncratic errors. The strictly exogenous assumption \(E\left( {\varepsilon_{it} |X_{ir} ,u_{i} } \right) = 0\) for every time t and r belonging to 1, 2, …, T must be satisfied to obtain an unbiased estimate of \(\beta\) (Wooldridge, 2002). For every t and r in the sample period, the expected value of the idiosyncratic error given the explanatory variables in all periods and the unobserved effect is 0. However, the dependent variable \(y_{it}\) will affect the future value of the independent variable \(X_{it + 1}\) in many cases. Here \(X_{it + 1}\) is a vector that may include one or multiple independent variables that are affected by \(y_{it}\). For example, a firm’s current international performance might normally be expected to influence the composition of the management team. Mathematically, \(X_{it + 1}\) is a function of \(y_{it}\), which is correlated with \(\varepsilon_{it}\) when a dynamic relationship exists. Therefore, \(\varepsilon_{it}\) and \(X_{it + 1}\) must be correlated. The idiosyncratic error term at time t is correlated with the explanatory variable \(X_{it + 1}\) at time t + 1. So the strictly exogenous assumption fails. The fixed effects estimator will always be biased when such a dynamic relationship exists.

Consider now a dynamic panel model for individual \(i\) at time \(t\) of the form

$$y_{it} = \alpha + \mathop \sum \limits_{s} \rho_{s} y_{it - s} + \beta X_{it} + u_{i} + \varepsilon_{it} \left( {t = 2, \ldots , T} \right).$$
(2)

Here s = 1, 2 … where \(y_{it - 1} , y_{it - 2} , \ldots\) are the values of the lagged dependent variable that affect \(X_{it}\). In Eq. (2), \(\rho_{s}\) captures the autocorrelation in y, and \(\beta\) measures the effect of x on y. Equation (2) includes additional lagged terms for the dependent variable. Such a dynamic panel model requires a sequentially exogenous condition \(E(\varepsilon_{it} |X_{it} , \ldots ,X_{i1} ,u_{i} ) = 0\) to obtain unbiased and consistent estimates. This model assumes only that no current or past value of \(X\) will affect the expected value of \(y_{it}\), but it allows the future values of the independent variable X to be correlated with \(\varepsilon_{it}\). Such a dynamic endogenous relationship meets the assumption of being sequentially exogenous; thus, this dynamic panel model is applicable when dynamic endogeneity may be present.

The system GMM estimator proposed by Blundell and Bond (1998) is commonly used in dynamic panel models. Other methods include the estimators proposed by Anderson and Hsiao (1982) and Holtz-Eakin, Newey and Rosen (1988). A system GMM estimator is built upon two other estimators called the difference GMM (D-GMM) estimator and the level-GMM estimator.

The key aspect of the D-GMM approach proposed by Arellano and Bond (1991) is to estimate a difference equation

$$\Delta y_{it} = \mathop \sum \limits_{s} \rho_{s} \Delta y_{it - s} + \beta \Delta X_{it} + \Delta \varepsilon_{it} .$$
(3)

Consider the first term \(\Delta y_{it - 1}\) in the lagged and differenced dependent variable \(\Delta y_{it - s}\). \(\Delta y_{it - 1}\) is endogenous due to the included term \(y_{it - 1}\), which is correlated with the error term \(\Delta \varepsilon_{it}\). However, the lagged values \(y_{it - 2} , y_{it - 3} , \ldots , y_{i1}\) and the lagged and differenced values \(\Delta y_{it - 2} , \Delta y_{it - 3} , \ldots , \Delta y_{i2}\) can be used as instrumental variables for the endogenous \(\Delta y_{it - 1}\). If \(\varepsilon_{it}\) is not serially correlated, then the error term \(\Delta \varepsilon_{it}\) will not be correlated with those instrumental variables. The instruments for other variables in the \(\Delta y_{it - s}\) series could be found using the same method. Using lagged dependent variables as instruments could be an advantage, as valid external instruments are normally difficult to construct. As Wintoki (2012: 582) states “Thus, an important aspect of the methodology is that it relies on a set of ‘internal’ instruments contained within the panel itself…. This eliminates the need for external instruments.” The differenced and lagged values of the dependent variable satisfy the relevance and exogeneity conditions, and therefore are valid instrumental variables (IVs).

However, the D-GMM estimator has several limitations. The weak IV problem is one of them. Since variables lagged by T periods (T = 1, 2,…) are being used as IVs, the correlation between the endogenous variable and the IVs is weak when T is large. Weak IVs may lead to poor performance with finite (in practice relatively small) samples. The IVs’ lag periods must therefore be limited instead of using all past lags to relieve the weak IV issue when evaluating regressions. A second limitation is when the autocorrelation in y that is captured by \(\rho_{s}\) is close to 1. In such cases, \(y_{it}\) nearly follows a random walk and \(\Delta y_{it}\) is close to 0. The correlation between \(\Delta y_{it}\) and \(\Delta y_{it - s}\) is therefore weak, another weak IV problem. A third limitation is that the D-GMM estimator cannot estimate the coefficients of the time-invariant factors captured by \(u_{i}\) because that term will cancel out during the differencing procedure.

Arellano and Bover (1995) addressed the latter two shortcomings by directly estimating the original-level Eq. (2). The resulting estimator is called a level-GMM estimator. They used the lagged and differenced variables \(\Delta y_{it - s - 1} ,\Delta y_{it - s - 2} , \ldots ,\Delta y_{2}\) as IVs for \(y_{it - s}\). However, that requires the additional assumption that the IVs are also uncorrelated with the individual unobserved effect \(u_{i}\). The changes in y among different individuals should not vary significantly. As Wintoki et al. (2012) argue, this assumption is reasonable only over a short time interval and if the individual-level unobserved effects can be assumed not to change substantially over time.

Blundell and Bond (1998) combined Eqs. (2) and (3) and obtained a system GMM estimator, which is today one of the most commonly used in dynamic panel modeling. The system GMM estimator is more efficient than estimating only the difference equation or the level equation. It does, however, impose serious orthogonality conditions, because the assumptions for both Eq. (2) (the level equation) and Eq. (3) (the difference equation) must be satisfied. The statistics of the following three tests should be reported when using the system GMM estimator.

First, the validity of the instruments in the difference equation depends on the assumption that the error term \(\varepsilon_{it}\) is serially uncorrelated. The first order differenced residuals \(\Delta \varepsilon_{it}\) and \(\Delta \varepsilon_{it - 1}\) are correlated by construction, but the second-order correlation between \(\Delta \varepsilon_{it}\) and \(\Delta \varepsilon_{it - 2}\) should be 0. The AR(1) and AR(2) test statistics for the differenced error terms should be reported, with the AR(2) test showing no second-order serial correlation. Researchers should expect a large p value for the AR(2) test. Omitting the autocorrelation test on the differenced residuals may cast doubt on whether the assumption involved in estimating the difference equation is satisfied.

A 2017 JIBS editorial suggests not focusing on significance cutoffs (Meyer et al., 2017). In keeping with that, let us say only that a large p value is ideal for the three tests. Researchers obtaining a small p should question the validity of the system GMM estimator.3

If a panel dataset covers only two periods, there is of course no lagged differenced dependent variable (\(\Delta y_{it - 2}\)) available as an instrument for the endogenous variable (\(\Delta y_{it - 1}\)) and researchers must resort to another valid instrumental variable. If a panel dataset covers three periods, the number of instrumental variables equals the number of endogenous variables (assuming there are no additional instruments), making the model exactly identified. If the panel dataset covers more than three periods, which is common in international business studies, then the instruments will always outnumber the endogenous variables and the Hansen J over-identification test can be used to test the validity of the instruments. Researchers should expect a large p value in the Hansen J test of over-identification, so that the null hypothesis that the instruments used in the difference equation are exogenous cannot be rejected, indicating that the instruments used are exogenous. Such findings add credibility to a system GMM estimation.

Finally, the system GMM estimator makes an additional exogeneity assumption: that any correlation between the endogenous variables and the unobserved fixed effects will remain constant over time. Eichenbaum, Hansen and Singleton (1988) have suggested that this assumption can be tested by a difference-in-Hansen’s J test of exogeneity, which also yields a J-statistic. The null hypothesis is that the subsets of instruments in the level equation are exogenous. Omitting this test may cast doubt on whether the assumptions underlying the estimation of the level equation have been satisfied. Researchers should also expect a large p value coming out of this difference-in-Hansen’s J test, such that the null hypothesis that the instruments in the level equation are exogenous cannot be rejected. Such findings suggest that the estimate given by the system GMM method is credible. Otherwise, the additional subsets used in the level equation are not exogenous, making the results estimated from the system GMM model less reliable.

Researchers may use the D-GMM estimator or other specifications to check the robustness of the results of a system GMM model. When the tests fail, the estimation yielded by the system GMM procedure may not be reliable. Moreover, a system GMM estimator has only weak power when the true relationship between the dependent variable and the independent variable is weak. A simulation will show the scale of this problem and then compare the performance of a fixed effect estimator with that of a GMM estimator.

TWO SIMULATION TESTS

Two sets of simulations illustrate these situations. In both, dynamic endogeneity is assumed to be the only econometric issue of interest. The first simulation will treat panel datasets with different effect sizes for the focal relationship, different dynamic relationship magnitudes, and different panel structures, to illustrate how each factor impacts the bias in a fixed effects estimator. The second one follows previously published estimator evaluations (Certo, Busenbark, Woo, & Semadeni, 2016; Hayashi, 2000; Semadeni et al., 2014) and compares the bias, efficiency, and power of the system GMM estimator with those of a comparable fixed effects estimator. The codes used in the data generation process and the regressions are presented in “Appendix A”.

Bias in Fixed Effects Estimators

In this simulation, different datasets will illustrate how the effect size of the focal relationship, the magnitude of the dynamic relationship, and the panel structure affect the bias in a fixed effects estimator.

Setup

Consider a benchmark dataset of 1000 firms and 10 periods (N = 1000, T = 10). Each observation consists of two variables, an independent variable x and a dependent variable y. For each firm, the initial points \(x_{i0}\) and \(y_{i0}\) are drawn from the standard normal distribution N(0,1) with a moderate correlation of 0.3, a value typically observed between dependent and independent variables in international business studies.

Subsequent values of x and y are generated through the following equations:

$$x_{it} = 0.5 x_{it - 1} + \alpha y_{it - 1} + \varepsilon_{it}^{x}$$
(4)
$$y_{it} = \beta x_{it} + u_{i} + \varepsilon_{it}^{y}$$
(5)

where a normal distribution \(N\left( {0, 0.1} \right)\) describes \(\varepsilon_{it}^{x}\), \(u_{i}\) and \(\varepsilon_{it}^{y}\). That is, all three variables are normally distributed with a mean of 0 and a variance of 0.1. In this case, \(u_{i}\) represents the firm’s unobserved characteristics, and \(\varepsilon_{it}^{x}\) and \(\varepsilon_{it}^{y}\) are idiosyncratic error terms. The dynamic relationship between the dependent variable y and the independent variable x is characterized by two parameters. \(\beta\) captures the effect of x on y, which will be called the “focal effect”. \(\alpha\) captures the marginal effect of y in the last period on x – the “dynamic effect”. A large \(\alpha\) indicates that the dynamic relationship through which the dependent variable y predicts the independent variable x is of considerable importance. The performance of the estimators may depend on the signs and magnitudes of \(\alpha\) and \(\beta\).

In this simulation the size of \(\beta\) was set at a high value (1), a moderate one (0.1) and a low one (0.01) to investigate how that size affects the performance of the estimators. \(\alpha\) was also set as large (0.1) and small (0.01). For the sake of simplicity, both \(\alpha\) and \(\beta\) were assumed to be larger than zero. Besides \(\alpha\) and \(\beta\), different panel structures were also tested. One way to do this is to keep the panel width (N/T) the same but enlarge (or shrink) the number of observations. So a larger panel was created with N = 2000 and T = 20, and another with N = 500 and T = 5. Alternatively, one can vary the panel structure while keeping the number of observations constant. We tested a panel with a larger number of firms (N = 2000 and T = 5) and one with more time periods (N = 100 and T = 100) and applied six combinations of \(\alpha\) and \(\beta\) to each of these four panels. We ran 300 simulations and fitted a fixed effects model, with the seed set at 1 to make the results replicable. We clustered standard errors at the firm level following the practice in international business research. The bias in each setting was calculated as the difference between the estimated value and the true value. The absolute value of the bias under each setting is reported in Table 2.

Table 2 Magnitude of bias in fixed effects estimators

Results

With a positive focal effect \(\beta\) and a positive dynamic effect \(\alpha\), the results show that a fixed effects model always produces a biased estimate, with the magnitude of the bias depending on several factors. First, it is large when the size of focal effect \(\beta\) is large; second it decreases with the number of time periods T. Datasets with a large N and small or modest T are common in international business research, especially that conducted at the firm level (Belderbos et al., 2015; Miletkov et al., 2017), so the magnitude of the bias in the fixed effects estimator is normally large when there is dynamic endogeneity. Third, the size of the dynamic relationship \(\alpha\) does not contribute to the bias. The size of the bias of a fixed effects estimator does not depend on the size of the dynamic relationship – holding other factors constant – on the number of observations, or on the number of firms.

Combining these observations results in four categories, “large focal effect, large dynamic effect”, “large focal effect, small dynamic effect”, “small focal effect, large dynamic effect” and “small focal effect, small dynamic effect”. The relationship between the magnitude of the bias when using a fixed effects estimator and the time periods in a panel is shown in Figure 1. There is a clear pattern. The magnitude of the bias decreases with an increasing number of time periods, and the slope of the decline is greater the larger the focal effect \(\beta\).

Figure 1
figure 1

Magnitude of bias in fixed effects estimators.

A Comparison of a Fixed Effects Estimator and a System GMM Estimator

The same datasets and basic setup were used to compare the bias, efficiency and power of a fixed effects estimator with a system GMM estimator, based on the work of Certo et al. (2016), Hayashi (2000) and Semadeni et al. (2014). Bias was measured by the estimated value minus the true value and by its absolute value. The root mean square error (RMSE) of the coefficient of x was used to measure efficiency. Following Semadeni et al. (2014), we measured power by the percentage of 300 estimates with a p value smaller than 0.05. The simulation results are reported in Table 3.

Table 3 Comparison of fixed effects estimators and system GMM estimators

Results

The results show that regardless of the size of the focal and dynamic effects or of the structure of the panel, the fixed effects estimator is always severely biased, whereas the system GMM estimator is not. For example, in the benchmark sample where the true magnitude of the focal effect was 0.1, the size of the bias in the fixed effects estimator was 0.134, while that of the system GMM estimator was only 0.001, which is significantly smaller. When comparing the simulation results with dynamic relationships of different sizes, a smaller dynamic effect did not alleviate the bias in the fixed effects estimator. For example, in the first row of case 1, the size of the bias in the fixed effects estimator with a small dynamic effect was 0.154, which is even larger than that with a large dynamic effect. The fixed effects estimator is apparently always biased if a dynamic relationship exists, though one should exercise caution when interpreting these results due to the high probability of type I error. Still, the system GMM estimator has a considerably smaller bias under all circumstances.

The system GMM estimator is also more efficient than the fixed effects estimator, as evidenced by its smaller RMSE. In terms of statistical power, the system GMM estimator produces a smaller percentage of significant results when the focal effect is small. When it is larger, or at least medium-sized, the percentages are similar. However, the probability of type II error increases when the focal effect is small. For example, in case one (small effect), the system GMM estimator yields only 21.7% significant results (assuming a small p value, 0.05 for example) for the large dynamic relationship dataset, and 22.7% for the small dynamic relationship one. Thus, it would be wrong to conclude that no relationship exists even if the estimated coefficient is not significant, because the weak statistical power of the GMM estimator may fail to capture it.

Finally, data structure does not alter these findings. Whether the panel is long or wide, the fixed effects estimator is always biased and the system GMM estimator always useful.

These simulation results show that the fixed effects estimator is biased if the current values of the independent variable are affected by the dependent variable. In these simulated samples, the fixed effects estimator overestimates the relationship if the effect is of medium or small size, increasing the probability of type I error. By contrast, the system GMM estimator shows less bias in all circumstances, although it may suffer from low statistical power when the focal effect is small. Figure 2 shows the relationship between the power of a system GMM estimator and the number of time periods, given the level of focal and dynamic effects. When the focal effect is large, the GMM’s power is not a concern, but when the focal effect is small the power of a system GMM estimator is low, and it drops as the number of time periods increases.

Figure 2
figure 2

Power of system GMM estimators.

RECOMMENDATIONS FOR IB RESEARCH WITH AN EMPIRICAL EXAMPLE

Our review of recent empirical research has shown that most reported studies at risk of dynamic endogeneity either do not use the GMM estimator or do not report the results in a rigorous way. We now use an empirical study of the relationship between the international experience of a firm’s managers and directors and the firm’s international expansion to show how the GMM estimator can be applied in IB research.

The hypothesis to be tested is that the international experience of a firm’s top managers and board members facilitates its internationalization. This might be because knowledge of foreign markets or the international social ties built up by managers with international experience can reduce the transaction costs involved in accessing complementary assets such as local labor, finance and distribution channels (Verbeke, 2008, 2013; Verbeke & Yuan, 2010), thus facilitating internationalization (Verbeke, Zargarzadeh, & Osiyevskyy, 2014; Zhou, Wu, & Luo, 2007). Indeed, prior studies have found a significant positive relationship between the international experience of managers and firm internationalization (Carpenter & Fredrickson, 2001; Reuber & Fischer, 1997). Those studies may have suffered from dynamic endogeneity problems as the current values of the independent variables (the managers’ current international experience) depend at least in part on the previous values of the dependent variables (the firms’ internationalization in previous periods).

Data

Internationalization data come from Exhibit 21 in the 10-K filed with the US Securities and Exchange Commission. Top manager and board of directors’ biographical information was extracted from the BoardEx database, which provides the nationality, educational background, and working experience of top managers and board directors of more than 10,000 US public companies. Because prior to 2000 BoardEx’s coverage of US public companies is limited, we use data from 2000 onward. We first matched data on 5198 companies from BoardEx with Compustat data using CIK, CUSIP, and TICKER as a firm’s identifier and then do string matching with the firm’s name. The Compustat database was matched with Exhibit 21 of a firm’s 10 K using its CIK code. Such three-way matching ultimately generated a sample of 3,150 unique firms covering the time period from 2000 to 2013.

Variables

The dependent variable, lnCountries, is the logarithm of one plus the number of foreign countries in which a firm operates (cf. Mihov & Naranjo, 2019; Verbeke & Forootan, 2012).

The independent variable foreign experience is the number of a firm’s directors and managers with a foreign nationality, foreign education, or foreign working experience scaled by the total number of directors and managers.

We include the following control variables. Firm size is the natural logarithm of a firm’s total assets. Size might influence internationalization because it can be regarded as a firm-specific advantage. Large firms are usually successful in their home country (Filatotchev & Piesse, 2009; Verbeke et al., 2014). Firm age is the logarithm of the number of years since a firm was founded. A firm’s stage in the organization life cycle tends to affect its internationalization (Autio, Sapienza, & Almeida, 2000; Filatotchev & Piesse, 2009). Financial liquidity is the ratio of cash flow to total assets. Abundant internal funding can relieve one possible constraint on internationalization (Arndt, Buch, & Mattes, 2012; Carpenter & Fredrickson, 2001). Conversely, leverage, measured as total liabilities divided by total assets, can be expected to constrain internationalization (Filatotchev & Piesse, 2009).

There is evidence that firms with a strong technology position or strong brand names choose foreign direct investment over exporting or licensing (Buckley & Casson, 1976; Dunning, 1988; Grøgaard & Verbeke, 2012; Hennart, 2009; Hennart & Park, 1994; Verbeke & Hillemann, 2013), so advertising intensity and R&D intensity were also included in the analyses. Advertising intensity is measured by the ratio of advertising expenditures over sales and R&D intensity as the ratio of research and development spending over sales.

We also included data on managers. Male ratio is the proportion of males among managers and directors. Average tenure is the average number of years managers and directors had worked for the firm. Board size is the number of board directors. Board independence is the proportion of outside directors on the board. Outside directors can help a firm understand foreign markets (Majocchi & Strange, 2012; Tihanyi, Johnson, Hoskisson, & Hitt, 2003). Year dummies were also included. All these independent variables were lagged 1 year. The estimations were therefore designed to capture the impact of current foreign experience on next year’s level of internationalization.

Table 4 presents descriptive statistics and the correlation matrix of the variables. The correlation coefficient between a firm’s foreign experience and its internationalization level (lnCountries) is 0.233.

Table 4 Descriptive statistics and correlations

It is the dynamic nature of the relationship between the independent and dependent variables that makes this a good example. The dynamic endogeneity – the international experience of the directors and managers is to some extent a function of a firm’s internationalization – calls for a dynamic panel model. The first step is to decide what lags in the dependent variable might capture the most information from the past. If the model does not include all influences from past internationalization on a firm’s present level of internationalization (lnCountries), the equation will be mis-specified. Wintoki et al. (2012) propose an easy way to select the number of lags that capture how past internationalization affects present internationalization. It involves running a pooled OLS regression and detecting how many lags of the dependent variable are statistically significant.

$$lnCountries_{it} = \alpha + \mathop \sum \limits_{j = 1}^{s} \beta_{j} lnCountries_{it - j} + \gamma X_{it} + \varepsilon_{it} ,$$

where the \(\beta_{j}\) s are the coefficients of 1 to s lags of lnCountries, and \(X_{it}\) contains the control variables.

Model 1 of Table 5 shows that the first two lags of lnCountries significantly impact the current value of lnCountries, while the older lags do not. Therefore, lagging lnCountries by two periods is sufficient to capture the persistence of a firm’s internationalization. Older lags can then be used as instruments in GMM estimation. Model 2 drops the first two lags and incorporates the older lags only. The significance of the third to fifth lags indicates that older lags include relevant information about current internationalization, thus validating their role as instruments for the first two lags.

Table 5 Determinants of internationalization, lag selection procedure (DV = lnCountries)

The independent variable serves as an instrument for all the endogenous variables in the GMM estimates. The other exogenous variables, such as firm age, can also be added as instruments. The system GMM model allows including different types of fixed effects to capture time-invariant heterogeneities. We compare the results of a fixed effects, a dynamic OLS, and a system GMM model to show the advantage of the GMM model in handling dynamic endogeneity. The codes to implement a system GMM estimator are described in “Appendix B”.

Model 1 of Table 6 shows the results of a traditional fixed effects model. The significant and positive coefficients of the foreign experience terms are consistent with the results of previous studies. Time dummies in the regression model could be viewed as additional regressors that control for the time trends in, for example, the macroeconomic environment. Model 2, the dynamic OLS model, includes a two-period lag of firm internationalization. The coefficient of foreign experience falls from 0.132 to 0.103, indicating that its effect on internationalization is partly absorbed by its lagged values. The significant coefficients of the lagged internationalization indicate that current internationalization is correlated with previous internationalization.

Table 6 Determinants of internationalization (DV = lnCountries)

Model 2 of Table 6 ignores firm heterogeneity. Dynamic panel models are better at controlling for firm heterogeneity, so we estimated dynamic panel models with internationalization lagged one and two periods.

Past internationalization and the other firm variables served as instruments for all the endogenous variables in the system GMM estimation. The year dummies in the GMM regressions were assumed to be exogenous as Wintoki et al. (2012) propose. All the other regressors are endogenous, and their lagged values were used as instruments. In Model 3 of Table 6 the coefficient of the foreign experience term is smaller than that estimated in the fixed effects model and has lower statistical significance. The difference, which may be attributed to the dynamic relationship between foreign experience and internationalization, suggests that the fixed effects model overstates the real impact of foreign experience on firm internationalization.

When conducting a system GMM estimation, it is important to report the results of several statistical tests. One can use an AR(2) test to test for second-order serial correlation among the differenced residuals. Its failure may cast doubt on whether the assumption on which estimating the difference equation was based is satisfied. With a panel dataset of more than three periods, the Hansen J test of over-identifying restrictions should also be used to test the validity of the instruments. The null hypothesis is that the subsets of instruments in the level equation are exogenous. Failing this test casts doubt on whether the assumption underlying the estimation of internationalization levels is satisfied. In this example the AR(2) test yielded a p value of 0.224, indicating no evidence for the existence of serial correlation in the residuals. The Hansen test of over-identification with a p value of 0.32 implies that the hypothesis that the instruments are exogenous cannot be rejected. The result for a difference-in-Hansen test was a p value of 0.747, so the assumption that the additional subset of instruments used in the system GMM model (lagged difference instruments) is exogenous also cannot be rejected. All of these test results validate the use of a system GMM estimator in our case.

An unanswered question is whether the dynamic relationship really exists – whether a firm’s internationalization indeed affects the foreign experience of its management and board. In Table 7 the current value of foreign experience is regressed against the value of internationalization (lnCountries). Model 1 uses a fixed effects specification. It suggests that the relationship is significant and positive. In order to determine the number of lags needed in the dynamic model we conducted regressions similar to those reported in Table 5. They showed that three lags could capture the persistence of foreign experience. Models 2 and 3 re-estimate the dynamic regressions. The coefficients of lnCountries change little under these two specifications, indicating the presence of a dynamic relationship, and showing that the fixed effects model (Model 1) in Table 6 yields a biased estimate. Hence the need to estimate a dynamic panel model when studying the impact of foreign experience on firm internationalization.

Table 7 Coefficients of models confirming the existence of a dynamic relationship (DV = Foreign experience)

SUMMARY AND RECOMMENDATIONS

Our review of 80 empirical articles shows that prior research has not appropriately dealt with dynamic endogeneity. This affects, to a varying degree, research in all six domains of international business. We explain why the fixed effects estimator is biased in the presence of dynamic endogeneity, and introduce a dynamic panel model using a generalized method of moments estimator. Our simulations show the characteristics of different estimators and indicate that the magnitude of the bias depends on the effect size of the focal relationship and decreases with the number of time periods in the data panel. They also confirm that the presence of a dynamic relationship causes any fixed effects estimator to overestimate the key coefficient. A system GMM estimator outperforms fixed effects estimators in both bias and efficiency, regardless of the magnitude of the focal and dynamic relationships. It does, though, exhibit weak statistical power when the focal effect is small. Type II error occurs more frequently, with a system GMM estimator being more likely to fail to capture a relationship which is actually significant.

A GMM estimator is in general preferable to a fixed effects estimator when there is dynamic endogeneity. However, if the key regression coefficient in a GMM model is not significant, the results must be interpreted with caution because of the low statistical power of the system GMM estimator. Insignificant results may be due to small sample size. With a larger sample, however, the conclusion that there is indeed no significant relationship may be justified.

Table 8 provides a non-technical summary on how to conduct dynamic GMM estimation. There is a two-step procedure. Researchers must first judge whether there is a logical link between the current values of some independent variables and the past values of the dependent variable. Such judgements should be based on the study’s theoretical underpinnings and on logical reasoning, but some heuristics have been suggested (see the content analysis section above). One can also check whether the independent variables are stable over time (Boellis, Mariotti, Minichilli, & Piscitello, 2016). If so, the risk of dynamic endogeneity is low. It is also possible to regress the current values of the independent variables of interest on the lagged dependent variables. The significance of the regression coefficients can help identify whether dynamic endogeneity exists.

Table 8 Checklist for addressing dynamic endogeneity

The use of a system GMM is subject to some restrictions. First, the data structure in panel form must have at least three time periods; otherwise no internal instruments can be used. The dependent variable must also be highly persistent (a first-order autoregression coefficient near one) (Arellano & Bond, 1995). If not, then the system GMM is preferred. If the system GMM can be used, a researcher can run pooled OLS regressions including various lags of the dependent variable and select the best number of lags, as is shown in Table 5. The code presented in “Appendix B” can then be used.

When using a system GMM estimator, the results of three statistical tests should be reported: AR(1) and AR(2) tests for the differenced error terms; Hansen’s J over-identification test; and a difference-in-Hansen’s J test for exogeneity (Arellano & Bond, 1991; Eichenbaum, Hansen, & Singleton, 1988). The effect size of the estimation should also be discussed (Meyer et al., 2017). For example, the estimates of a fixed effects model can be compared with those of a dynamic panel model to obtain an estimate of the impact of dynamic endogeneity. If there is little change in effect sizes, is it because of low statistical power? JIBS data access and research transparency (DART) guidelines (Beugelsdijk, van Witteloostuijn, & Meyer, 2020) require that this be clearly reported.

A fixed effects model, a Heckman model, a difference-in-differences model, a model with instrumental variables and a dynamic panel model are compared in Table 9. Fixed effects models are widely used in management research. They are easy to implement and effective in controlling for time-invariant omitted variables (Greene, 1997). However, they have some limitations. They cannot handle omitted variable bias for some time-varying variables. For example, CEO personality traits can affect firm strategies such as internationalization (Agnihotri & Bhattacharya, 2019; Oesterle, Elosge, & Elosge, 2016), but it is not possible to include all these traits in a regression because of the difficulty of eliciting them through a questionnaire (Hambrick & Mason, 1984). A fixed effects model cannot deal effectively with such time-varying, unobservable factors, and, as this study has clearly shown, nor can it deal with dynamic endogeneity, measurement error, simultaneity, and selection problems.

Table 9 Advantages and disadvantages of different approaches depending on the source of endogeneity

Heckman two-stage models do a good job tackling sample selection and self-selection bias, but are less useful for other sources of endogeneity (Certo et al., 2016; Heckman, 1976). A non-random sample, due perhaps to truncation or censoring, normally leads to a biased estimate, but Heckman’s two-stage regression technique deals with it well (Hamilton and Nickerson, 2003; Heckman, 1976). Heckman modeling is therefore widely applied in IB studies. A researcher may be interested, for instance, in the relationship between a firm’s investment overseas and its level of risk (Tong & Reuer, 2007). Sampling on MNEs may lead to selection bias because some unobservable characteristics may influence which firms invest overseas. By including firms with no foreign subsidiaries one can use Heckman modeling to correct for selection bias, but this technique does not solve endogeneity problems caused by omitted variables, measurement error, simultaneity or dynamic endogeneity. Moreover, it requires that the error term be normally distributed, an assumption that may not be satisfied in IB research.

Difference-in-differences (DID) is powerful and effective for dealing with time-varying omitted variables, but it requires defining a natural experiment (or quasi-experiment), which is often very difficult. Mithani (2017), for example, looked at how a cyclone affected corporate philanthropic contributions by foreign firms in India (the treatment group) and by domestic firms (the control group). The exogeneity of the shock is crucial, and a real exogenous shock relevant to a particular research question is very difficult to find. Firms may anticipate an upcoming shock and make adjustments accordingly and that will lead to biased estimates. Also, the DID method requires that differences between the treatment group and control group were constant before the shock (Roberts & Whited, 2013).

IB researchers have also used the instrumental variables approach which is powerful and effective in dealing with most sources of endogeneity, including dynamic endogeneity. Hennart, Majocchi and Forlani (2019), for example, use the regional divorce rate to handle potential endogeneity between a firm’s share of family members in management and foreign sales. One difficulty is that truly exogenous instrumental variables are hard to find and there must be strong theoretical justification for their exogeneity. Moreover, they should not be correlated with the endogenous regressor (Semadeni et al., 2014). Weak instrument leads to unreliable approximations to the distributions of estimators of the instrumental variable and to distorted hypothesis testing.

Finally, dynamic panel modeling is specifically designed to solve dynamic endogeneity problems. There is no need to find external (exogenous) variables to use as instruments as the technique relies on internal ones. The technique may, however, have low statistical power. Moreover, the validity of the system GMM estimator is built on several statistical tests which may fail. In that case valid conclusions cannot be drawn. In particular, a small p value for an AR(2) test may suggest that the dynamic completeness of the model’s dependent variable is questionable. Including more lagged terms in the dynamic panel model may help. If either the Hansen J over-identification test or the difference-in-Hansen’s J test fails, the instruments used in the difference or level equations may not be exogenous. Researchers may try adjusting the lags of the internal instrument set, but they may be forced to resort to testing other external variables as instruments. These problems have been insufficiently discussed in the IB literature. Moreover, dynamic panel models cannot handle other endogeneity problems, such as those caused by omitted variables, measurement errors, or nonrandom samples. To handle sample selection, researchers can first run a Heckman two-stage model and then a dynamic panel model to account for dynamic endogeneity.

CONCLUSION

We have introduced GMM estimators and how they should be used to deal with dynamic endogeneity. Our simulations show that GMM estimators tend to be less biased than fixed effects estimators when dynamic relationships exist. We provide an example of how to apply a GMM in empirical research as well as a non-technical checklist with a step-by-step guide for applying dynamic panel models. We also look at five econometric models and see how effectively they deal with different sources of endogeneity.

NOTES

  1. 1

    Article details are given in the online appendix. Two of the co-authors independently coded each of the 80 articles, they then reached a consensus about whether the relationships hypothesized in each case posed a risk of dynamic endogeneity (91% agreement), whether the authors acknowledge its potential presence (99% agreement), whether they use GMM (100% agreement), and whether they do it rigorously (100% agreement).

  2. 2

    The papers for illustration are not limited to publications between 2015 and 2017.

  3. 3

    Similar views hold for Hansen tests discussed below. The AR and Hansen tests follow the classic hypothesis testing approach. When the test yields a small p value, researchers conclude that the null hypothesis is rejected. When the p value is large, the data fail to reject the null hypothesis. This is the case of “absence of evidence”. Some approaches such as Bayesian statistics may provide support for the null hypothesis because Bayesian model validation can claim the acceptance with a certain confidence given prior knowledge. That serves as “evidence of absence”. As a practical guide, the AR and Hansen tests are the classic statistical tests for a system GMM estimator.