Distance and Border Effects in International Trade: A Comparison of Estimation Methods

This paper compares various estimation methods often used in the estimation of gravity models of international trade. The authors first discuss different structural and consistent estimation techniques, their underlying assumptions and their impact on estimated coefficients. They then estimate the gravity model for global bilateral trade flows using various empirical methodologies. They focus on a comparison of the distance and border effects across estimation techniques. For the border effects they take into account adjacency effects as well as the distinction between intra-regional and inter-regional trade. Their findings confirm the significantly negative distance and the significantly positive adjacency effect across estimation methods, although the size of the effects varies substantially across methods. The border effects by global regions are much more sensitive – both in size and direction – to the applied estimation method. Although all estimation methods have their own merits and drawbacks, the authors provide some guidelines for future empirical research based on their findings. (Published in Special Issue Distance and Border Effects in Economics) JEL F10 F14

There are several possible explanations for these contrasting results. For example, Brun et al. (2005) argue that changes in infrastructure are responsible for the decline of the distance effect. According to Felbermayr and Kohler (2006), the non-decreasing distance effect can be explained when failing to take into account the extensive margin of trade. Conversely, Berthelon and Freund (2008) show that the increase of the overall distance coefficient is due to changes of distance coefficients across industries. They explore two possible reasons for these changes. First, in some industries, goods have become more substitutable. Second, trade costs have changed as well. The authors argue that the first phenomenon is the most important one. Finally, Behar and Venables (2009) argue that the non-decreasing coefficient of distance over time is due to increased trade at shorter distances, relative to that at longer distances.
The empirical literature on border effects was inspired by the seminal work of McCallum (1995), who shows that Canadian provinces trade up to 22 times more with each other than with US states (the so-called home bias). This finding was initially confirmed by Helliwell and McCallum (1995) and Helliwell (1996). Similarly, Wei (1996) finds that OECD countries trade about 2.5 times more internally than with identical foreign countries. Helliwell (1997) points to an even larger border effect, but it is only half for countries sharing a common border and a common language. Following a similar approach as Wei (1996) and Helliwell (1997), Nitsch (2000) finds that domestic trade within a European Union country is seven to ten times larger than trade with another European Union country. Finally, most of these studies observe a trade increasing effect for adjacent countries.
These findings, and in particular the findings of McCallum (1995), were revisited by Anderson and van Wincoop (2003) who show that the large border effects mainly come from omitting a multilateral resistance term in McCallum's specification. Anderson and van Wincoop (2003) show that the inclusion of the multilateral resistance term considerably reduces McCallum's ratio of inter-provincial trade to province-state trade from 16.4 to 10.7. Additionally, they find that borders reduce trade between the US and Canada by 44% and among other industrialized countries by 29%.
Interestingly, the home bias exists not only at the international level, but also at the intra-national level. According to Wolf (2000), trade between US states is about three times lower than trade within states. Additionally, adjacent states trade 2.6 times more with each other. Strikingly, the distance coefficient is similar to the coefficients found for international trade.
More recently, Coughlin and Novy (2012) compare trade between and within individual US states with trade between states and foreign countries. They find that a state's border generates a larger trade barrier than an international US border. One possible explanation is related to Hillberry and Hummels (2008) who find that trade within the US is heavily concentrated at the local level: trade within a single ZIP code is on average three times larger than trade with partners outside the ZIP code. Hillberry and Hummels (2008) explain their finding by co-location of producers in supply chains to exploit informational spillovers, minimize transportation costs and facilitate just-in-time production. According to Coughlin and Novy (2012), producers also concentrate in order to benefit from external economies of scale in the presence of intermediate goods and associated agglomeration effects (Rossi-Hansberg 2005), as well as from the hub-and-spoke distribution systems and wholesale shipments (Hillberry and Hummels 2003): the domestic border effect then reflects the local concentration of economic activity rather than trade barriers associated with crossing a state border.
The literature shows that the sensitivity of the distance and border effects in trade have been tested for various countries, regions and periods. So far, the sensitivity of these effects to the applied estimation methods has not been tested in a consistent manner. This paper aims to fill this gap. The remainder of the paper is organized as follows. In Section 2 we discuss the main econometric approaches mainly or recently followed in the gravity literature. In Section 3 we present the data and our empirical approach. Section 4 discusses the results from applying various econometric techniques measuring distance and border effects. Section 5 presents some robustness checks. Section 6 concludes.
2 The Econometrics of Gravity 2.1 The general gravity equation While the earliest implementation of the gravity model in international trade was an intuitive copy of its counterpart in physics (Tinbergen 1962), most theoretical trade models now derive an aggregate bilateral demand system that can be written very similar to the physical gravity equation. We can write the gravity equation in general terms as: where X i j denotes nominal exports from country i to j, G is a constant, S i and M j are the capabilities of exporter and importer respectively and f i j is a function of the impact of trade barriers to bilateral trade flows. Hence, exports from i to j increase in exporter and importer economic capabilities (e.g. GDP) and decrease in bilateral trade barriers (e.g. geographical distance, tariffs, non-tariff barriers). Using homothetic budget shares and general equilibrium market clearing under fairly general conditions, one can derive a structural basis for (1) (Costinot and Rodríguez-Clare 2014). Following the model of Anderson and van Wincoop (2003) for instance: where Y i is gross output of exporter i, E j is the total expenditure on goods in j and P i and P j are multilateral trade resistance terms (MTR), capturing equilibrium price levels for the exporting and importing country respectively. 1 Note that we do not suggest time dimensions in this specification, and mainly so for two reasons. First, the bulk of theory in the gravity literature derives static and cross-sectional models from long run equilibrium conditions. Second, most empirics are performed in a panel setting, but in a static way: exploiting the panel dimension allows for time-invariant regressors (such as distance and borders) to infer causation of the model with respect to predicted trade flows. There is a growing literature on "true" dynamic panel models in international trade: see for instance Harris and Matyas (2004); Harris et al. (2009);Baltagi et al. (2014), which we abstract from here.

Log-linearizing the model and OLS
The parametric specification of the empirical gravity model is assumed to follow the exponential functional form: where X is a vector of regressors with elements x i j , b is a vector of coefficients to be estimated, and h is a vector of idiosyncratic error terms so that E(h|X) = 1. Traditional estimation of the gravity model log-linearizes (3) so that: where y = ln(Y) and e = ln(h). This transformation is often applied in empirical trade research as it linearizes the estimation equation, allowing for estimators that are straightforward to implement such as OLS to estimate parameters of interest b . However, this transformation and subsequent estimation with OLS generates a few important issues related to (i) heteroscedasticity, (ii) zero trade flows and (iii) weights of individual observations in optimally fitting the model. We describe these issues in more detail.
Heteroscedasticity First, the validity of the model depends on the orthogonality of e with respect to the regressors X. When estimating (4), one assumes (among other things) that there is no information in the noise, so that: where s > 0 is the standard deviation of the errors. However, if the error term is heteroscedastic, the variance of the error term is not constant (s i 6 = s , 8i).
One potential cause of heterogeneity is omitted variable bias. If the model is misspecified due to omitted variables or the exclusion of a (non)-linear combination of regressors that are correlated with the error term (violating strict exogeneity), this can lead to a heterogeneous pattern of the residuals.
If heteroscedasticity is moderate, we can transform the estimation equation or use robust methods to correct for the standard errors such as White (1980) standard errors. Also, Weighted Least Squares can be used to account for the heteroscedasticity problem and produce an efficient estimator. However, deriving the correct weighting matrix through iteration can be a tedious task.
Heteroscedasticity does not affect the unbiasedness of the OLS estimator, but it affects the efficiency of the estimator as it does not minimize the variance. It also affects the estimated p-values and to a lesser extent confidence intervals and prediction intervals. The estimated standard errors are biased and the bias can go either way. Therefore, other estimators can be more efficient.
Zero trade flows Second, note that ln(x) is undefined for all x  0. Estimating the gravity model in the linearized form from (4) then leads to dropping all observations of zero trade flows (and other non-strictly positive observations in other variables that are subsequently logged). Many trade models actually predict a large fraction of zeros in the bilateral trade matrix, such as models with fixed costs of exporting (e.g. Melitz 2003;Helpman et al. 2008) or models of Bertrand competition (e.g. Eaton and Kortum 2002;Bernard et al. 2003). Running the estimation procedure on only positive values then biases the estimated coefficients, as these "structural" zero trade flows contain valuable information on the equilibrium structure of trade.
Santos Silva and Tenreyro (2006) advocate a Poisson Pseudo Maximum Likelihood (PPML) estimator to deal with both heteroscedasticity and zero-trade flows simultaneously, and we will describe the PPML below. However, PPML does not account for these structural zeros. Some solutions have been proposed such as selection models with a 2-stage estimation procedure, where the first stage estimates the amount of zeros in the system, and the second stage subsequently estimates the bilateral trade values. While Helpman et al. (2008) use a Heckman-type selection model that is derived from theory and accounts for firm heterogeneity, alternatives such as Zero Inflated models deliver biased results. 2 Weights of observations and the loss function Third, different estimators use different weighting schemes and potentially different loss functions in obtaining the optimal fit for the model to produce parameter estimates. Parameter estimates in OLS are obtained by minimizing the Sum of Squared Residuals (SSR), so that: whereb is the estimate of b that minimizes the objective function, and (y Xb ) 2 denotes the quadratic loss function. The first-order conditions of the optimization problem are: There is a unique minimum if X has full rank (or equivalently in the absence of perfect multicollinearity -another assumption of the OLS estimator). The two main candidates for the loss function are the quadratic loss function and the absolute value loss function. OLS uses the quadratic loss function (or least squared errors) which puts larger weight on larger residuals and mainly so for three reasons. First, this loss function generates a unique and stable solution for (5). The 2 Since the Helpman et al. (2008) procedure can only be performed on a small subset of countries (in order to be computationally able to use fixed effects in a non-linear setting), we do not present the results of those estimations here. In addition, since the count data alternatives of Negative Binomial and Zero Inflated models are biased, we do not go into further details on their estimation in this paper. 3 From the second-order conditions, this is indeed a minimum: www.economics-ejournal.org least absolute deviations regression is more robust to outliers, but it can generate multiple and unstable solutions. Second, as we estimate a conditional mean, the loss function is quadratic by construction. Alternative loss functions result in the estimation of other moments of the data. For instance, a linear loss function results in an estimator equal to a quantile (e.g. median) of the posterior distribution, and an "all-or-nothing" loss function results in the estimation of the mode of the posterior distribution. Third, the quadratic loss function is symmetric, so that positive and negative deviations from the estimated parameter are allocated the same weight.
As an aside, one note on the Maximum Likelihood estimation of (4). In the linear model and under homoscedasticity of the error terms, the first-order conditions with respect to b of the objective to be optimized under least squares and Maximum Likelihood (ML) coincide. In the linear model, the log-likelihood function`(b |X) = n 2 ln(2p) n 2 ln(s 2 ) 1 2s 2 (y Xb ) 0 (y Xb ) is the objective function to be maximized. The first-order conditions then write ∂∂ b = (X 0 X) 1 Xy = b OLS .

Non-linear least squares
Instead of log-linearizing (3), we can estimate the coefficients from the model in the original exponential function. Using non-linear least squares (NLS) and optimizing SSR, the objective to estimate parameters of the model becomes: with a system of first-order conditions: The first factor (Y exp(Xb )) is the model to be estimated and the factor exp(Xb )X represents weights to each observation in minimizing the errors.
Some authors (Frankel and Wei 1993;Frankel 1997b;Anderson and van Wincoop 2003) have proposed using the NLS method in estimating the gravity equation: this function gives more weight to observations where exp(Xb ) is large, so that countries with larger S i and M j for instance, get more weight. There is economic intuition for this weighting scheme, as countries with higher GDP tend to report more accurately and therefore get more weight in estimating the model. This is not only GDP, but also other variables that might be used in this dimension such as the MTR and the distance function. However, Santos Silva and Tenreyro (2006) state that (i) this does not address the heteroscedasticity problem and (ii) these observations also have the most variance so more noise is added in the model, which brings down the efficiency of the estimation, leading to larger standard errors.
Generally, the estimator could be efficient if one calculates the appropriate weights. This is largely the optimization method Anderson and van Wincoop (2003) propose in their gravity model as in (2). The problem is that the weights have to be calculated together with the MTR for each country pair relative to all other pairs, making this method very cumbersome. To deal with this issue, alternative, non-parametric methods have been proposed (e.g. Robinson 1987). However, empirical researchers enjoy easily implementable methods and have proceeded to use OLS with a "twist", or alternatively the PPML method. 4

Least Squares Dummy Variables and Taylor approximation
This twist comes from not having to calculate the NLS procedure, but instead accounting for the non-linear MTR by using exporter and importer fixed effects in the estimation, which can be easily implemented in an OLS estimation. We label this as Least Squares Dummy Variables (LSDV). Anderson and van Wincoop (2003) use the fixed effects approach in one of their procedures, and it is the standard procedure for most empirical researchers using the gravity model.
One drawback of the fixed effects approach is that the system of structural non-linear market clearing conditions of Anderson and van Wincoop (2003) is evaded, and general equilibrium comparative statics are not possible (Baier and Bergstrand 2009). Additionally -and often more practically -using fixed effects renders identification of variables of interest in those dimensions impossible. One way to accommodate both problems is to approximate the non-linear system by a first-order Taylor approximation, as proposed by Baier and Bergstrand (2009). This allows to account for MTR to an approximation, while having the dimensions of identification free as the researcher sees fit. We label this method as Baier and Bergstrand (BB). The model is then specified as follows: is a first-order Taylor approximation of the non-linear lnDist i j and q i is country i's GDP share in world GDP. The Taylor approximation has to be performed for each bilateral variable of the distance function separately.

Pseudo Maximum Likelihood Estimators
Another approach is to simplify the first-order conditions of the non-linear system in (7) in order to ease estimation and so to approximate the objective, hence the name "pseudo" (or "quasi"). Since the first-order conditions determine the values of the maximum/minimum, one can start from there, rather than from the objective function to be optimized. We discuss the Poisson Pseudo Maximum Likelihood (PPML) and Gamma Pseudo Maximum Likelihood (GPML) next. The Poisson model is given by: for Y 0, and where l = exp(Xb ). The ML optimization is given by: with first-order conditions: Santos Silva and Tenreyro (2006) propose to replace the weighting factor of the original NLS in (7) by an assumption on its behavior. Then exp(Xb )X ⌘ X, i.e. assuming that weights are proportional to the value of their observations. This satisfies the Poisson model assumption of the conditional mean being proportional to the conditional variance. This is the only assumption taken from the Poisson model, but it happens to coincide with the first-order conditions of the Poisson ML, as seen in (11). If this assumption does not hold in reality however, standard errors are too small. Note that the starting point of the analysis is NLS. Therefore, no distributional assumptions are made on the model parameters (e.g. the dependent variable does not have to be Poisson distributed), nor is there any other relationship to other count models such as Negative Binomial and Zero-Inflated models.
Notice the difference with (7): each observation is given the same weight now, x i , rather than emphasizing those for which exp(Xb ) is large as in NLS. These are not the real first-order conditions of the log likelihood function of the original problem in (7), and therefore are "pseudo", but they are easier to calculate as the second factor is simplified. Furthermore, Gourieroux et al. (1984) show that these estimators are consistent and asymptotically normal.
Another PML is the Gamma Pseudo ML (GPML), given by: for Y 0 and where z > 0 is the scale parameter, k is a slope parameter and G(·) is the Gamma distribution. In the PPML, we assume equi-dispersion, so that the variance of the model is proportional to the mean: V (Y X) µ E(Y X). In the GPML, we assume that the dispersion grows as the observations grow, following V (Y X)µE(Y X) 2 . This leads again to a higher weighting of large deviations as in OLS and NLS. Both PPML and GPML return consistent estimates, and depending on the variance proportionality, one is more efficient. The first-order conditions are now given by: which is very close to the first-order conditions of NLS, but with different weights. 5

Discussion
Table 1 compares the differences across estimation procedures in terms of weighting schemes, dealing with zeros and heteroscedasticity. First, we consider the differences in weighting schemes. On the one hand, OLS puts more weight on large deviations through the exp(Xb )X weighting scheme. On the other hand, we can nest the non-linear models as follows. We can write the first-order conditions of NLS, PPML and GPML in vector form as: where for NLS: k = 1, PPML: k = 0 and GPML: k = 1 respectively, and [exp(Xb )] k X represents the weighting scheme of the errors to be minimized. Hence, from (14), it is immediately visible that NLS gives a weight to deviations from the mean proportional to that mean. PPML puts equal weights on all deviations, and GPML allocates weights that are inversely proportional to the mean. Depending on the location of the outliers, this can lead to an upward or downward adjustment of the estimated coefficient. For instance, consider a toy example with one single outlier in trade-distance space (e.g. one very distant island that trades with only one other country in 5 See also Head and Mayer (2014), who compare first-order conditions of OLS, Poisson true ML and Gamma true ML. Their takeaway is that GPML resembles the first-order conditions of OLS, where GPML looks at percent deviations of the errors versus OLS that looks at log deviations. Here we compare the NLS to the GPML directly. Taking logs of the NLS leads us to the OLS comparison in Head and Mayer (2014). the world). It is easy to imagine the downward sloping regression line in this space. Suppose first that this island trades very little with its partner (so that the observation lies in the North-West of the trade-distance space). Then this outlier will pull the regression line towards itself, leading to a more negative coefficient on distance, and more so for the NLS scheme than the other weighting schemes. However, if this island trades a lot with its partner (now being in the North-East of the trade-distance space), again it will "pull" the coefficient towards itself, but now leading to a less negative coefficient. Considering then that various observations can be in various locations, makes it clear that it is impossible to predict a priori the net impact on the size of the estimated coefficient.
Depending on the setting, researchers might then consider which scheme best represents the observed data. For instance, if one assumes that large GDP countries better report GDP values (i.e. with less measurement error), one can opt for the NLS method, so that these countries get more weight in estimating the GDP coefficient of the gravity model. At the same time, the researcher has to be aware that this choice skews the point estimate towards the impact of large GDP countries on bilateral trade flows. Furthermore, this choice simultaneously affects the estimation of all variables, which could involve contrasting interests in the direction of the weighting scheme.
Second, the log-linearized methods of OLS or LSDV/BB do not include zero trade flows. If these zeros are structural (e.g. Helpman et al. 2008), missing flows contain important information on the presence and intensity of exports. While PPML and GPML include zeros in the non-linear estimation, these methods do not structurally account for these zeros. The method described in Helpman et al. (2008) applies a structural and consistent 2-stage estimation method, but its application in typical packages such as STATA -especially for larger samplesis not straightforward as convergence is hard to achieve. Moreover, finding the required exclusion variable for the first stage is arguably difficult. Helpman et al. (2008) use religion as an exclusion variable in their paper.
Third, heteroscedasticity of the error term can lead to inefficient estimates, as pointed out by Santos Silva and Tenreyro (2006). This is dealt with in the other estimation methods.

Data and Empirical Approach
We use the CEPII/BACI database (Gaulier and Zignago 2010) to create bilateral trade flows at the country level. 6 This is a cleaned and "mirrored" 7 version of the UN Comtrade database that records product-level trade at the Harmonised System 6-digit level for almost all countries and economic entities in the world. We convert export values to 1000's of current US dollars. We aggregate the product-level trade flows to yearly country-level trade flows, drop some accounting aggregates and make some countries that split or merged during the period consistent over time. We extract countries' nominal GDP in 1000's of current US dollars from the website of the World Bank.
We obtain indicators such as distance, adjacency, common language and colonial ties from the GeoDist database at CEPII as presented by Mayer and Zignago (2011). Distance is recorded as the great circle distance in kilometers between the most populated cities in i and j. Adjacency is a dummy variable that takes value 1 if i and j share a common country border and 0 otherwise. Language is a dummy that takes value 1 if both countries share an official language. Colony is a dummy variable that takes value 1 if i and j ever shared a colonial tie.
We use RTA dummies from Jose De Sousa's website, which are set to 1 if both countries are in any form of regional trade agreement and 0 otherwise. 8 We collect WTO accession dates from the WTO website to create a duration database of WTO 6 http://www.cepii.com/anglaisgraph/bdd/baci.htm 7 In order to mitigate the potential problem of non-reporting in the data, Gaulier and Zignago (2010) fill in missing export values with the transpose of the trade matrix, leading to an increase of around 10% observed values in the dataset. 8 http://jdesousa.univ.free.fr/data.htm membership. We then generate a dummy for each country equal to 1 if that country is a member of the WTO in that year. 9 Finally, we allocate countries to one of eight global regions. 10 The resulting database covers the years from 1998 to 2011, has 208 countries or economic entities and 602,784 recorded export flows.
We then estimate the gravity specification given by (2) using different estimation specifications. We apply both linear and non-linear methods. As is common in the gravity literature, we assume that the distance function is linear in observables, so that: lnf i j = lnDist i j + Ad jacency i j + Language i j +Colony i j + RTA i j + W TO i +W TO j .

Distance coefficient across estimation methods
We first analyze the sensitivity of the distance coefficient to the applied estimation techniques. We estimate a series of cross-section gravity models (one for each year between 1998 and 2011), and subsequently plot the coefficients for each year in Figure 1.
The X-axis reports the years 1998 to 2011, the Y-axis plots the estimated coefficients by estimation method. Model specifications are ordinary least squares (OLS), Least Squares Dummy Variable -or OLS with fixed effects (LSDV), Baier and Bergstrand (2009) Taylor approximation method (BB), Poisson Pseudo-Maximum Likelihood (PPML) and Gamma Pseudo-Maximum Likelihood (GPML). The estimated coefficients are reported after correcting for covariates such as GDP and other distance variables explained above. To evade potential endogeneity and following Baier and Bergstrand (2009), we use simple averages for q i ⌘ 1/N, where N is the number of countries in the sample. A few remarks on the results.
First, all methods confirm the negative and significant effect of distance on export flows. We also confirm that the estimated coefficients are fairly stable over time: the difference of the coefficients in 1998 and 2011 is statistically insignificant at the 1% level for all the estimation methods. However, given the short time span of our data set, we cannot interpret this as evidence for or against the "death of distance" hypothesis described in Section 1.
Second, there is substantial variation in the magnitude of the distance effect across estimation procedures: while the linear procedures OLS, LSDV and BB produce results that are around -1.0 to -1.2, the estimated coefficients using PPML are around -0.5 (in line with the findings of Santos Silva and Tenreyro (2006) and subsequent PPML estimations). The estimated coefficients using GPML are around -1.7.
Third, we see that the LSDV and the BB methods almost completely coincide, in line with the suggestion that the first-order Taylor approximation of the MTR is a good approximation. Hence, the BB method appears to be a valid alternative to the LSDV method in identifying the distance coefficient, with additional benefits of speed of computation and additional freedom in dimensions of identification.
Fourth, comparing the estimated size of the distance coefficients, we obtain the following ranking: GPML > LSDV /BB/OLS > PPML. This is in line with the results by Egger and Staub (2016) who obtain the same ranking of methods for their first quartile of distance coefficients.

Border effects across estimation methods
Next, we compare the border effect on international trade. As indicated in Section 1, the border effect in trade refers to adjacent countries on the one hand, and to border effects within versus between regions on the other. The former effect is measured by adjacency dummies in the gravity specification. For the latter effect, we want to test whether trade is larger within global regions relative to trade across these regions, and whether this border effect is heterogeneous across regions.
We define global regions as continents plus some economic coherent regions. In particular, we consider Europe, North America, South America, Association of Southeast Asian Nations (ASEAN), Asia (excl. ASEAN), North Africa, Sub-Sahara Africa and the Pacific. This division is arguably arbitrary, but estimation across continents and alternatively other global trade blocks lead to very similar results (not reported). We use dummies for within versus between global region trade: each region dummy equals 1 if the observed trade flow is an intra-regional  (2), controlling for the observable variables in the distance function: ln(GDP exporter), ln(GDP importer), ln(distance), adjacency, common official language, colonial ties and RTAs. We also control for WTO membership status for the exporter and importer. Exporter and importer fixed effects are used in the LSDV, PPML and GPML models. The coefficients for distance are all significant at the 0,1% level across all years and estimation methods (not reported). Figure 1: Estimation of the distance coefficient one, and zero otherwise. We pool the data across all years and add year dummies to capture global trends. Table 2 presents the results. The interpretation of the coefficients is then as follows: the estimated coefficients present the average impact of the variables of interest on export values from i to j, holding all other variables constant, and setting all dummies to zero (including the regional dummies). For instance consider column (1). A 1% increase in the GDP of the exporting country leads to a 1.1% increase in exports from i to j on average, holding all other variables constant.
Our main interest is in the distance, adjacency and world regions coefficients. First, note that the distance coefficients are more or less of the same magnitude as the cross-sectional estimates in Figure 1. However, correcting for intra-regional flows, there is a discrepancy between the OLS/LSDV/BB coefficients, and the GPML coefficient is smaller than before in absolute value.
Second, the adjacency coefficients are positive and significant at the 5% level in all settings except in the BB model. Moreover, there is quite some variation in the estimated size of the coefficient, ranging between 0.46 and 1.1. Hence for the PPML estimate of Adjacency, the interpretation is that switching the dummy from 0 to 1 increases exports from i to j with exp(0.46) 1 = 58%, keeping all other variables constant. For the GPML coefficient, this even accrues to exp(1.050) 1 = 186%. Note that the adjacency dummy is mostly conditional on both countries being in the same global region, so both coefficients have to be jointly interpreted. The large variation of the estimated coefficients across methods however, shows that precise evaluation of barriers (e.g. for policy analysis) has to be interpreted with caution.
Third, there is a large amount of variation across the estimates of the region dummies. Consider for example the LSDV estimates in column (2). The negative coefficient for the Europe dummy suggests that trade between countries within Europe is exp( 0.511) 1 = 40% lower than if both partners are in different trade blocks. This is an indication that European countries are more "open" relative to the latter countries, as on average, European countries export relatively more outside the EU than within, compared to the other global regions. Similar reasoning holds for the other regions. North America is relatively more open than Europe, as is the ASEAN trade block. Conversely, the Pacific is the most "closed" continent, while South America and North Africa are also trading relatively more within. Note that we control for adjacency when estimating the border effects: this captures the idea that intra-continental trade is larger for countries that also share a common border. More strikingly, estimated openness varies substantially across estimation methods in terms of size, and all regional dummies are at least insignificant at the 5% level in one specification, except that of the Pacific. Hence contrary to our findings for the distance effect, the border effect by global regions appears to be much more sensitive to the selected estimation methodology. Finally, it is interesting to note that intra-ASEAN trade is fairly low. Trade involves relatively more exports to countries outside the region, while the goal of the ASEAN union is to promote more intra-regional trade and create an internal market. Conditional on our results and estimation methods, estimation of the gravity equation indicates that this policy goal is not achieved yet.  (3), all the bilateral variables are first-order Taylor approximated. Country FE depict exporter and importer fixed effects. Adjusted R 2 and/or the Bayesian Information Criterion (BIC) are given where possible. Robust and clustered standard errors are between parenthesis, clustered at the continent level. Significance levels: 5% (*), 1% (**) and 0,1%(***).

Robustness Checks
We check for several potential sources of misspecification. First, we draw the residual versus fitted values plot for the LSDV method in Figure 2. 11 Since we deal with so many observations, traditional scatter plots are not very efficient. Instead, we propose to use a local polynomial smoother plot to represent the underlying scatters. The local polynomial has two added advantages for residual analysis: i) the polynomial smoother is an indicator for potential non-linearities or other patterns in the residuals, and ii) the 95% confidence intervals of the smoother are a nice way to depict potential heteroscedasticity: if there are irregularities in the width of the confidence intervals, this indicates non-constant variance of the error terms. We rerun the original specification of Section 4, but now for the year 2005 only, to evade potential auto-correlation. 12 There is i) a clear structure in the residuals that resembles a third-degree polynomial and ii) potential heteroscedasticity could show up in the left tail of the distribution of the fitted values. We are confident that heteroscedasticity is not affecting our results in any major way (we also check the pattern of heteroscedasticity using PPML, giving almost identical results), and focus on the non-linear pattern of the residuals. Second, to see where the structure comes from, we plot the residuals against each regressor. The residual plots of all regressors look fine, except the residual plot against distance uncovers the same structure as the residuals versus fitted plot. We therefore rerun the model with a polynomial approximation for distance: it might be the case that the linear specification of the distance function is just not correct, and we just assumed it following the bulk of the gravity literature. We rerun the model with a second and third order polynomial for distance, which in effect lowers the pattern in the residual plots, but completely disrupts all of the gravity estimates. We also check if there is a non-linear pattern inside global regions, so to see that the non-linearities do not come from the global sample. We see the same residual pattern recurring for each isolated subsample. We therefore prefer to keep with the mainstream of the literature and use the log-linear distance function.
However, the correct specification of the distance function is an interesting topic in its own.
Third, we check for potential multicollinearity between distance and the border effect, since this might also drive misspecification. We find VIF test results for all variables in the model (excluding the fixed effects) between 1 and 1.5, where VIF values of above 5 or 6 might indicate potential multicollinearity problems. We therefore also reject this potential problem.

Local polynomial smooth
Notes: A local polynomial smoother represents the underlying scatter plot. The gray band indicates the 95% confidence interval, the y = 0 line is indicated in red.

Conclusion
This paper compared the distance and border effects on global bilateral trade flows using various econometric techniques. We clearly confirm the negative distance effect which appears to be stable over time. However, its magnitude varies across estimation methods. For the border effect we distinguish between adjacency effects and regional block effects. In order to measure the border effects correctly, we simultaneously control for both border effects as well as for distance. The adjacency or contingency effect is significantly positive in all estimation methods, expect in the Baier and Bergstrand (2009) method where it is insignificant. Hence the direction of both the distance and adjacency effect appears to be in line with the literature, but their magnitudes vary substantially across estimation methods. Finally, the regional block effects are subject to a large amount of variation, both across regions and across estimation methods.
Our results point to heterogeneity across estimation methods when assessing distance and border effects in international trade. However, our results do not favor particular estimation methods, as these all have merits and shortcomings. Rather, researchers should be aware of the impact of the selected method, that inter alia depends on the features of the data used in the analysis (e.g. impact of outliers). Nevertheless, we are able to provide some guidelines for future empirical research.
First, the use of fixed effects is preferable. If the identification of the variable of interest is in the bilateral dimension (e.g. tariffs, non-tariff trade barriers etc.), it is suggested to use OLS with exporter and importer fixed effects. These fixed effects or dummies capture unobserved heterogeneity, including multilateral resistance terms as pointed out by e.g. Anderson and van Wincoop (2003) and Behar and Nelson (2014). If the variable of interest is in the country level dimension, identification of the variable is impossible with exporter and importer fixed effects. Then we suggest using the Baier and Bergstrand (2009) Taylor approximation approach, which approximates the trade cost variables, corrected for multilateral resistance, while leaving open the country dimension for identification.
Second, in the presence of structural zeros (such as in Helpman et al. (2008)), log-linearizing the gravity equation leads to a biased sample to perform the estimation on: all zero flows drop out as ln(x) is undefined for all x  0. Previous suggestions such as Tobit estimation or adding a constant to all trade flows, have been shown to generate biased coefficients (Santos Silva and Tenreyro 2006). Therefore, non-linear estimation methods such as PPML are preferred in the presence of these structural zeros. Note also that the two-stage Heckman-type selection model as proposed by Helpman et al. (2008) faces two practical drawbacks: (i) finding a convincing exclusion variable for the first stage, and (ii) the procedure only converges under severe sample size restrictions in STATA. This is probably the reason why this procedure is not widely implemented in the empirical gravity literature.
Third, pseudo-maximum likelihood methods are linked to the gravity model through the resemblance of the first-order conditions. No other distributional assumptions on the data are required. Hence, alternative methods such as zeroinflated Poisson estimation are not at all related to the gravity model, and generate biased results.
In the end, we suggest researchers to provide a side-by-side presentation of the fixed effects (or Baier and Bergstrand 2009) and PPML approaches. Note also that we follow the theoretical gravity literature and estimate a cross-sectional gravity equation. If a researcher deems panel estimation to be appropriate, we propose estimating the model using LSDV including exporter-time and importer-time fixed effects. If the bilateral variable of interest is also time-varying, one can include country-pair fixed effects. We are aware that this dimensionality causes potential convergence problems using panel PPML in STATA (e.g. the xtpqml command).