Conspiracy Against the Public Â€“ an Experiment on Collusion

 We study to what extent collusive behavior is affected by the awareness of negative externalities. Theories of outcome-based social preferences suggest that negative externalities make collusion harder to sustain than predicted by standard economic theory, while sociological theories of social ties and intergroup comparisons suggest that bilateral cooperation can be strengthened if there exist outsiders that gain from cooperative break down. We investigate this in a laboratory experiment. Subjects play the infinitely repeated prisoner’s dilemma with and without a negative externality. The externality is implemented by letting subjects make a positive contribution to a public good if they choose to defect from cooperation, i.e. cooperation is collusive since the gains are at the expense of the public. We find that this negative externality increases collusive behavior. Subjects cooperate more if it hurts a third party.

This paper evaluates the performance of alternative estimators of the gravity equation when zero trade flows result from economically-based data-generating processes with heteroscedastic residuals and potentially-omitted variables. In a standard Monte Carlo analysis, the paper finds that this combination can create seriously biased estimates in gravity models with frequencies of zero frequently observed in real-world data, and that Poisson Pseudo-Maximum-Likelihood models can be important in solving this problem. Standard threshold-Tobit estimators perform well in a Tobit-based data-generating process only if the analysis deals with the heteroscedasticity problem. When the data are generated by a Heckman sample selection model, the Zero-Inflated Poisson model appears to have the lowest bias. When the data are generated by a Helpman, Melitz, and Rubinstein-type model with heterogeneous firms, a Zero-Inflated Poisson estimator including firm numbers appears to provide the best results. Testing on real-world data for total trade throws up additional puzzles with truncated Poisson Pseudo-Maximum-Likelihood and Poisson Pseudo-Maximum-Likelihood estimators being very similar, and Zero-Inflated Poisson and truncated Poisson Pseudo-Maximum-Likelihood identical. Repeating the Monte Carlo analysis taking into account the high frequency of very small predicted trade flows in real-world data reconciles these findings and leads to specific recommendations for estimators.

I. Introduction
The gravity model is now enormously popular for analysis of a wide range of trade questions. This popularity is due to its apparently good performance in representing trade flows and to the strong theoretical foundations provided in papers such as Anderson (1979) and Anderson and van Wincoop (2003). Until recently, the gravity equation was almost invariably estimated using data sets converted into logarithms and truncated to contain only positive trade flows. The problem of zero trade flows was rarely emphasized, partly because trade theories were silent on their causes, partly because of a lack of recognition of their frequency, and partly because of the convenience of the log-linear estimator. 1 Recently, there has been growing recognition among trade economists that zero trade flows do not occur randomly and that using truncated samples in logarithms may yield misleading results. Melitz (2003) and myriad subsequent studies have added firm-level foundations to earlier models of zero trade flows, such as those based on the Tobin (1958) and Heckman (1979) models. Santos Silva and Tenreyro (SST) (2006) raised an important additional concern by pointing to the possibility of serious bias from combinations of heteroscedasticity and nonlinearity in the gravity model. Helpman, Melitz and Rubinstein (HMR) (2008) pointed to another potential source of bias in coefficient estimates-failure to account for the increase in demand for exports as the number of exporting firms increases.
Like SST (2006), we believe that at least part of the process of identifying the right estimator must involve analysis of generated data whose statistical properties are known.
In this paper we evaluate different estimators of the gravity equation in the presence of pervasive zero trade flows and under different patterns of heteroscedasticity. Because we are unsure of the process generating real-world data, we test alternative estimators on data sets generated using three different economic models in which the probability of a nonzero trade flow and the volume of trade when trade is nonzero are jointly determined: the threshold-Tobit model of Eaton and Tamura (1994); the Heckman model; and an extended Heckman model of the type used by HMR (2008). 1 As shown in Annex A, zero trade flows account for more than 40% of the possible bilateral trade flows in country-level data and more than 60% in U.S. 10-digit product-level export data, and the probability of zero trade flows is strongly linked to the explanatory variables in gravity models. The frequency of zero trade flows is much higher at the more disaggregated levels of product and firm trade that are increasingly being used for analysis.
Each of the three data-generating processes used in our Monte Carlo simulations provides a plausible economic representation of observed trade flows, and can be parameterized to represent the frequency of zero trade flows observed in real trade data.
The resulting data sets differ from those used by SST (2006) in containing true zero observations, and in being based on economic models that jointly determine whether to trade and, if so, the volume of trade. They are not constant-elasticity models because constant-elasticity models cannot generate finite probability mass at zero trade. Instead of moving from a constant-elasticity model to a model containing zeros by transforming the latent variables from constant-elasticity models with a negative-binomial distribution as in SST (2011), we use limited-dependent variable models whose properties have been thoroughly explored in the literature.
The paper is organized as follows. In the next section we provide a brief review of the relevant literature. Section III discusses the econometric problems associated with zero trade flows. Potential approaches to estimating the gravity equation in the presence of frequent zero trade flows are considered in Section IV. Section V covers the DGPs of our Monte Carlo simulations and the simulation results. Section VI provides an empirical implementation of the gravity estimation using country-level trade data and section VII discusses reconciling Monte Carlo and empirical findings. Finally, the paper concludes in Section VIII. Eaton and Tamura (1994) appear to have been the first to provide a systematic approach to estimation of the gravity equation as a limited-dependent model. Their model builds on the Tobin (1958) model, where non-limit observations arise when the latent dependent variable (including the error term) crosses a threshold value. 2 Another popular approach to estimation uses the Heckman (1979) model to deal with the presence of zero trade flows (see, for example, Francois and Manchin (2007) and Lederman and Ozden (2007)). The third approach that we consider builds on the Heckman model to take into account the extensive margin of firm involvement in trade (HMR 2008). 2 Note that in the Tobit model the effect of the coefficient estimate on the explanatory variable x cannot be interpreted as the effect of x on the dependent variable y. Rather, as McDonald and Moffitt (1980) correctly point out, it is the effect of x on two components of y: "(1) the change in y for those observations above the limit, weighted by the probability of being above the limit; and (2) the change in the probability of being above the limit, weighted by the expected value of y if above" (pg. 318-319).

II. Review of Relevant Literature on Estimating the Gravity Model
In an important and influential article, SST (2006) emphasized the potential econometric problems resulting from a combination of heteroscedastic residuals and zero trade flows. Using Monte Carlo simulations, they showed that traditional estimators may yield severely-biased parameter estimates and identified the Poisson Pseudo Maximum Likelihood (PPML) estimator as the preferred estimator for dealing with these problems.
The data sets used for the main experiments in their 2006 paper included no zeros (SST 2011, p220), suggesting that their important findings about the bias and variance of standard estimators arise from the combination of nonlinearity and heteroscedasticity highlighted in the paper, rather than from success in dealing with the presence of zero observations. While their paper included some experiments with zeros generated by rounding errors, a process quite different from economic processes generating zeros such as those assumed when applying limited-dependent variable estimators.
The tractable and apparently robust PPML estimator proposed by SST is now widely used in estimation of gravity models (see, for example, Shepherd 2010; Anderson and Yotov 2012). SST show why the normal equations used to solve the PPML estimator should make it more robust than OLS-based estimators when the function is nonlinear and the errors are heteroscedastic, and show using Monte Carlo simulations that its coefficients are much less subject to bias in this situation. Based on their evidence, the case for the PPML estimator seems extremely strong for analysis of nonlinear relationships in models where zero values are infrequent. 3 However, because the tests in their 2006 paper were based on samples containing no zero values, the results obtained cannot be expected to provide a reliable guide to the robustness of the PPML estimator in the presence of economically-determined zero values.
Perhaps a more fundamental problem for reliance on the PPML estimator applied to censored samples (including the zeros) is that the move from the traditional, discrete version of the Poisson distribution to the continuous needed for the gravity equation removes the ability to have positive probability mass at the limit observation (Ilienko 2013, p2). While a discrete version of the Poisson distribution could adequately capture the presence of large shares of limit observations of the type identified by Tobin (1958), the 3 Heteroscedasticity in nonlinear models with relatively few zero observations arises in many important applications such as consumption, investment and money demand systems; and representative models for consumer demand, firm costs and profits. The results of SST (2006) appear to deserve much more attention for estimation in this context. continuous version of this distribution is unable to adequately represent these cases. This problem appears worse than the common concern that empirical distributions may be overdispersed relative to the Poisson, in which the mean and the variance are equal. Especially in cases where large fractions of a sample are zero trade flows that are retained in the analysis, the continuous Poisson specification appears likely to be seriously mis-specified.
The Zero-inflated Poisson estimator developed by Lambert (1992) and applied to the gravity model by Burger, van Oordt and Linders (2009) might be one way to retain the advantages of the Poisson estimator while allowing for substantial numbers of zeros.
On the theoretical front, there have also been major recent developments. HMR (2008) provide a heterogeneous-firm trade model that both allows for frequent and asymmetric patterns of zeros in bilateral trade data and introduces a number-of-firms variable, correlated with the probability of non-zero trade that affects trade volume through a love-of-variety effect. They extend the Heckman model so that the standard gravity explanatory variables determine not only the probability of positive bilateral trade (i.e. the probability that the most productive firm finds it profitable to export to a foreign destination) but also the fraction of firms that are productive enough to enter foreign markets when positive trade occurs. This overcomes an omitted-variable problem different from that created by the limited-dependent feature of the data.

III. Intuition on the Econometric Problems Associated with Zero Flows
Since Tobin's famous (1958) paper, it has been known that zero values of the dependent variable can create potentially large limited-dependent variable biases in parameter estimates, even in linear models, if the estimator used does not allow for this feature of the data generating process. Heckman (1979) generalized Tobin's approach to estimation in the presence of this problem, casting it in the context of estimation using samples with nonrandom sample selection. A nonlinear version of Heckman's formulation of the problem is presented in a two equation context: (1) y * 1i = f(xi,β) + u1i (2) y * 2i = f(zi,α) + u2i where xi and zi are vectors of exogenous regressors while β and α are vectors of parameters, and E(uji) = 0, In Heckman's formulation, equation (1) is a behavioral equation and equation (2) is a sample selection rule that determines whether observations in equation (1) are observed or not. In our context, this model results in: A key problem for estimation of equation (1) in the presence of sample selection is that its residuals no longer have the properties assumed in standard regression theory, particularly a zero mean. In this situation, standard regression procedures result in biased estimates of the coefficient vector β because they omit relevant explanatory variables.
The Tobit model is a special case of the Heckman model in which the right hand sides of (1) and (2) are identical (Heckman 1979). In this case, a simple diagram ( Figure   1) provides important insights into the key problems arising in applying standard approaches to estimation. Our version of the figure includes both the nonlinearity and heteroscedasticity of the gravity equation emphasized by SST (2006) and the limiteddependent variable problem.  Figure 1 shows the relationship between a latent variable, y * , and an explanatory variable, x, with the non-stochastic relationship represented by the solid curve, and individual data points by the bullets. The hollow bullets correspond to any value of y * less than or equal to zero. In the censored regression case, the residuals associated with these low values of the latent variable will be replaced by positive residuals that lead to a zero value of the dependent variable, a change represented by the dashed arrows in the diagram. With this model, using a sample containing the zero values-for which E(u)>0-biases the coefficient on the slope and results in an estimate something like the dashed curve (Greene 1981).
Another insight from careful examination of Figure 1 is the difference between the case of censoring shown in the diagram and the case of truncation where all of the observations at the limit (zero in this case) are discarded from the sample. In the censoring case, the error terms on all limit observations but those where y* was precisely zero have been increased. In the case of truncation, observations with negative errors, like the rightmost hollow bullet in the diagram, are more likely to be excluded from the truncated sample and this creates a situation where E(ui)>0. Intuitively, the transformation of the error terms associated with the censored sample seems likely to be greater than that associated with the truncated sample. This suggests that moving from a truncated estimation procedure to one which retains the zero values without changing the estimation procedure may result in greater bias.
The diagram shows another feature of the residuals from application of a regression estimator designed for non-censored problems. Such estimators are likely to find residuals that are both large and serially dependent at both ends of the regression line-note the consistently positive apparent residuals relative to the dashed line in Figure 1 for observations near zero and near the largest observations in the sample. The estimated regression line is likely to be strongly influenced by such extreme observations, particularly when using ordinary least squares (Beggs 1988). However, the implications of this are likely to vary considerably between estimators, given the different weighting implied by the normal equations for the different estimators (SST 2006).
With a little more imagination, Figure 1 can also help visualize a different problem associated with heteroscedasticity even in the absence of the nonlinearity problem identified by SST-a problem associated with mis-measurement of the probability of observations at the limit. Following the notation of Terza (1985), the log-likelihood function for a sample generated by a Tobit-type process is given by where di is an indicator value taking the value 1 for a non-limit observation; (f(yi)) is the density function for non-limit observation i, and F(0) is the distribution function showing the probabilities of zero observations. If the variance relevant to assessing the value of the distribution function is incorrectly specified-as it would be with a model assuming homoscedasticity-this is clearly likely to change the realized values of the distribution function for limit observations, causing potentially-serious bias in parameter estimates, as pointed out by SST (2014) in their assessment of the HMR (2008) estimator.

IV. Potential Approaches to Estimating Gravity Models with Zeros
In estimating the gravity model when there are many zero observations, some key questions must be confronted: i. Which functional form to use for the explanatory variables?
ii. Whether to truncate or censor the zero observations? and iii. What estimator to use?
There seems to be widespread agreement that the gravity model involves relationships between variables that are nonlinear in levels, with a constant-elasticity functional form almost always used. This implies a relationship: where yi represents bilateral trade; xi is a vector of explanatory variables (some of which may be linear, some in logarithms and some dummy variables) for observation i; β is a vector of coefficients; and εi is an error term whose variance, unlike those of equations (1) and (2), need not be constant across observations. As noted by SST, the gravity model has traditionally been estimated after taking logarithms, which allows estimation using linear regression techniques. However, this specification will only involve homoscedastic residuals if εi = exp(xiβ).vi where vi is distributed independently of xi with a zero mean and constant variance. Of course, if this is the true model, the constant-elasticity model estimated in levels will have heteroscedastic errors.
The use of the logarithmic transformation for the dependent variable creates another immediate difficulty when trade is zero, since the log of zero is undefined. The most common response to this problem has been to truncate the sample by deleting the observations with zero trade. This is inefficient, since it ignores the information in the limit observations, and may lead to bias as noted in the previous section. Many studies have replaced the value of imports by the value of imports plus one, allowing the log of the zero values to take a zero value. Others have estimated in levels using nonlinear estimation techniques which allow the zero values to be retained. However, as we have seen from Figure 1, retaining the zero observations without using an estimator that accounts for the limited-dependent nature of the model may lead to more serious bias than simply truncating the sample. What seems to be needed is an approach to estimation that systematically takes into account the information in the limit observations. Once the decision to include the limit observations is taken, a number of other decisions must be confronted. We find it useful to lay out the choices with the decision tree in Figure 2.
At the first stage of the decision tree, analysts must decide between parametric models and semi-parametric models. Semi-parametric models (Chay and Powell 2001) avoid specifying a distribution for the residuals, sometimes at the expense of computational efficiency, and estimate parameters using methods such as Powell's (1984) Censored Least Absolute Deviations (CLAD) model. While such models have been extended to deal with nonlinear models (Berg 1998), it appears that such applications have been infrequent to date. Certainly, most of the focus in estimation of gravity models has been on the parametric branch of the decision tree.

Figure 2. Choosing an estimator for the gravity model with limited-dependent trade
If a parametric approach is taken, the first decision required is whether to adopt a Tobit/Heckman-type model (Amemiya 1984), or a Two-Part model (Jones 2000). The Two-Part model has the desirable feature of allowing the sample selection and the behavioral equations to be estimated independently (Duan et al. 1983). However, this simplification comes at the expense of assuming that these decisions are taken independently, something that seems implausible in a world where decisions on whether to export and on how much to export and how much to ship should they choose to export are taken by individual firms based on the profitability of exports of their particular products to particular markets. In most cases, it seems to us that the variable of interest is the latent variable for the desired level of trade, y * , for which the Tobit/Heckman approach seems the most suited. A key argument for the Two-Part model has been a belief that the performance of the sample-selection models is irretrievably compromised by statistical problems, and particularly multicollinearity. Leung and Yu (1996) show that these problems may be less of a problem for practical implementation than was thought based on earlier studies such as Manning, Duan and Rogers (1987), which included insufficient variation in the exogenous variables to mitigate the multicollinearity between these variables and the inverse Mills ratio added to the behavioral equation when estimating using the two-step estimator of the Heckman (1979) sample selection model. 4 Based on a detailed review of the literature on the Heckman correction for sample selection bias, Puhani (2000) found that the full-information maximum likelihood estimator of Heckman's model generally gives better results than either the two-step Heckman estimator or the Two-Part model, although the Two-Part model is more robust to multicollinearity problems than the other standard estimators. Clearly, these results suggest that the consistency of the data with the assumptions of the Tobin/Heckman models should be examined carefully.
Whichever parametric estimator is chosen, an assumption about the distribution of the residuals must be made. The most common approach is to assume that the residuals are distributed normally. However, alternative assumptions are sometimes used, including the Poisson distribution highlighted by SST (2006) or the Gamma distribution that they also examined. The decision about which distribution to use need not be based solely on judgments about the actual distribution of the residuals. The essence of SST's recommendation of the PPML estimator was that it is more robust to heteroscedasticity than one based on the normal distribution when the residuals are actually normally distributed.
The first part of the Two-Part approach is the use of a qualitative-dependent model such as Probit to determine whether a particular bilateral trade flow will be zero or positive.
The second part is to estimate the relationship between trade values and explanatory variables using only the truncated sample of observations with positive trade (Leung and Yu 1996, p198). Potential estimators for this stage include the standard approach of OLS in logarithms; the nonlinear least squares (NLS) model used by Frankel and Wei (1993); and the PPML and Gamma Pseudo-Maximum Likelihood (GPML) estimators discussed by SST. Estimation of the bias of truncated single-equation estimators will therefore help assess the suitability of Two-Part models.
Under the Tobit/Heckman limited-dependent branches of the decision tree, a decision must be made about whether to estimate using two-step estimators of the type proposed by Heckman (1979) or a maximum likelihood approach (see Tobin (1958), Puhani (2000) and Jones (2000)). One concern with the Heckman two-step estimator is that the variable that adjusts for the probability of sample selection 5 may be close to a linear function of the other explanatory variables, resulting in multicollinearity problems (Puhani 2000). A second concern is that this approach introduces heteroscedasticity into the residuals. An alternative, nonlinear, approach to estimating Heckman's second-step equation is provided by Wales and Woodland (1980, p461). We do not consider this estimator because it performed poorly in their simulations.
The performance of the Heckman-based models appears to depend heavily on whether both equations (1) and (2) are active, and whether there are at least some variables included in equation (1) but excluded from equation (2). Leung and Yu (1996) concluded that this estimator with some excluded variables in the behavioral equation outperformed other estimators for limited-dependent variables.

V. Monte Carlo Simulations
In this section of the paper, we provide a description of the approach we used to create zero observations with frequencies similar to those observed in real-world data and to test the ability of different estimators to retrieve the parameters of these data generating processes.
We also followed the approach of SST and of Westerlund and Wilhelmsson (2011) to creating heteroscedastic errors-that is, we posited a range of different types of heteroscedasticity, and tested their implications for the performance of different estimators.
Because we know the true parameters, we focus on the bias and imprecision of the coefficient estimates, rather than the goodness of fit measures emphasized by Gomez-Herrera (2013).
5 Which is the inverse Mills ratio for the particular observation (Heckman 1979). We adopted the following specification of the gravity equation from SST (2006). 6 (6) yi = exp(xiβ) + εi =exp(xiβ).ηi = exp(β0 + β1x1i + β2x2i).ηi where ηi ≡1+[εi/exp(xiβ)]; x1 is a binary dummy, designed to mimic variables such as border dummies, that equals 1 with probability 0.4; x2 is a standard normal variable to mimic the behavior of continuous explanatory variables such as the log of distance or income; and the data are randomly generated using β0=0, β1= β2 =1. Following SST, we assumed that ηi is log-normally distributed with mean 1 and variance σi 2 .
To assess the sensitivity of the different estimators to different patterns of heteroskedasticity, we used Santos Silva and Tenreyro's four cases: where Case 1 involves an error term that is homoscedastic when estimated in levels; Case 3 is homoscedastic when estimated in logarithms; Case 2 is an intermediate case between 1 and 3; and Case 4 represents a situation in which the variance of the residual is related to the level of a subset of the explanatory variables, as well as to the expected value of the dependent variable. To help interpret such a large number of results, we also present, in Appendix Table 4, measures of the average absolute bias for each coefficient across the four cases and the average absolute bias across both coefficients, together with averages of their standard errors.
We consider two broad types of data-generating-process (DGP): one based on the Eaton-Tamura model and another based on the Heckman model. We also examine a special case of the Heckman DGP, including an additional variable for the number of products traded, proposed by HMR (2008), whose robustness to heteroscedasticity bias does not appear to have been examined on data sets with known parameters, although SST (2014) raise concerns about potentially serious bias. These DGPs have solid economic foundations and can generate data sets with the required fractions of zero trade values. We considered using the SST (2011) DGP but its economic underpinnings and statistical properties were less obvious to us than was the case for the three more conventional limited-dependent variable DGPs that we used and, in any event, its properties have already been examined using Monte Carlo techniques. We investigated the Two-Step Method of Moments estimator of Xiong and Chen (forthcoming), but did not find encouraging results with our sample.

V.1. ET-Tobit data generating process (DGP)
For the threshold-Tobit model, we ensured that a significant number of observations would have zero values by adding a negative intercept term, -k, in the levels version of the datagenerating equation, and then transforming all realizations of the latent variable with a value below zero into zero values. We refer to this DGP as the ET-Tobit DGP because it is the model underlying the Eaton and Tamura (1994) estimator. It has the interpretation of introducing a threshold that must be exceeded before trade actually occurs. It differs fundamentally from the rounding approach used by SST (2006) and the approach of setting observations to zero randomly or for particular groups used by Martinez-Zarzoso (2013).
The data generating process for these initial simulations was equation (6) augmented with the negative intercept term, -k, that defines the ET-Tobit model Within these samples, we found that a value for k of 1 provides numbers of zero trade values consistent with the 40-50 percent of zero values frequently observed in analyses of total bilateral trade. As shown in Appendix Table 1, a k value of 1.5 generated higher shares of zeros. The analysis was performed in Stata 9.2, using double precision to minimize numerical errors, with samples of 1000 observations, replicated 10,000 times as in SST (2006). 7

Semi-Parametric Estimators
Given the pervasive uncertainty about the distribution of the errors, and the relatively poor results obtained using standard limited-dependent variable estimators, it seemed important to investigate the performance of the semi-parametric, Least-Absolute-Deviations (LAD) approach proposed by Powell (1984). Although this model has not, to our knowledge, been applied to gravity models, Paarsch (1984) found the censored version of this model to give satisfactory results in large samples with censored data. Because the estimator for this model in Stata requires linearity, we examined the model in logarithms to make an initial assessment of its suitability. We considered first the truncated LAD estimator based only on the positive observations, and then turned to Powell's (1984) censored LAD estimator.
The results are presented in Table 1.
From the results in Table 1, it appears that the Truncated LAD estimator has quite small bias in Case (3), when the heteroscedasticity is consistent with the functional form adopted for estimation. However, it appears to be very vulnerable to heteroscedasticity. In cases (1), (2) and (4), the bias of the Truncated LAD was typically 20 to 30 percent. The Censored LAD estimator performed worse than the truncated estimator in all cases using k=1, with bias of 0.5 or greater. With k=1.5, the bias averaged 70 percent in cases (1) to (3). Given its overall poor performance we did not pursue the LAD estimator further.

Standard Single-Equation Estimators
In this section, we first considered the performance of eight parametric estimators that do not account for the limited-dependent nature of the DGP. Despite their failure to account for the limited-dependent nature of the data-generating-process, they might plausibly be the best estimators of the key underlying parameters, just as truncated OLS was widely believed to be the best estimator for limited-dependent variable models for the reasons outlined in Manning, Duan and Rogers (1987). The specific estimators considered were: Negative Binomial Maximum Likelihood estimator (NBML). We report the results for k=1 in Table 2. Summary results reporting the average absolute bias and standard errors across the four cases, and across the two coefficients, are included in Appendix Table 4.
An important feature of the results for the truncated OLS-in-logarithms model is its apparently strong sensitivity to heteroscedasticity. In Case 3, where the error distribution is homoscedastic in the log-linear model, the bias in the estimates is around 6 percent for both coefficients. However, when we move to cases involving errors that are heteroscedastic in the log-linear equation, the bias of the estimated coefficient on the normally-distributed variable (β2) rises to around 20 percent in cases 1, 2 and 4. For this estimator, unlike many others, the biases in the coefficients are generally similar for the dummy variable, x1 and the normally distributed explanatory variable, x2. The censored OLS model in logarithms (with 0.1 added to allow inclusion of the zeros) produces results that are generally inferior to those obtained from the truncated OLS estimator. Except in Case 4, the biases were smaller for the coefficient estimate on x2 than on x1.
Truncated NLS is the levels counterpart to the traditional OLS-in-logs estimator and the second stage of a two-part estimator. It has lower bias than its logarithmic counterpart in cases 1 and 2, but is inferior in cases 3 and 4. In Case 3, the bias of this estimator is 40 percent for β2, around seven times the bias of truncated OLS in logs. Perhaps the best thing that can be said for the truncated NLS estimator is that it is consistently less biased than the censored NLS regression model. The superiority of the truncated OLS and NLS models over their censored regression counterparts reinforces our intuition that just solving the "zero problem" and adding the zero-valued observations to the sample is likely quite an unhelpful strategy when the data are generated by a process that determines the levels of observed trade and the probability of zero trade together.
The PPML estimator in levels yielded estimates that were strongly biased in all cases. Because this equation was estimated with the dependent variable in levels, the underlying error structure is consistent with the estimator in case 1, but, In this case, the bias in the estimate of β1 was 0.28 and of β2 was 0.25. The bias was very similar for each of the other cases. This confirms SST's hypothesis that the PPML estimator is robust to heteroscedasticity in the residuals while demonstrating its potential vulnerability to limited-dependent variable bias.
The truncated PPML estimator has much lower bias than the PPML estimator in all cases, again reinforcing our belief that truncated estimators are to be preferred over censored estimators. The truncated PPML, in fact, outperforms truncated OLS in logs in all cases considered, including Case 3 where heteroscedasticity is not a problem for the OLS estimator.
The GPML estimator has very large bias in almost all cases, reinforcing the SST conclusion that this estimator has little role to play in estimation of the gravity model. The NBML estimator is inferior to the two preferred estimators-truncated OLS in logs and truncated PPML-in all except Case 1 with k=1. This good performance seems likely to be an outlier, since the NBML is strongly biased in Case 1 when k=1.5 (see Appendix   Table 2).
When the k=1.5 sample is considered (see Appendix Table 2), the superior performance of truncated PPML relative to full-sample, censored PPML is again evident.
It has by far the lowest mean absolute bias across the four cases considered and is superior to its nearest competitors (the two OLS in logs models) even in case 3, where the pattern of heteroscedasticity is consistent with the model in logarithms. This result suggests a case for use of the truncated PPML, at least in exploratory analyses. It retains the robustness to heteroscedasticity emphasized by SST (2006), but reduces the vulnerability to the limiteddependent variable bias emphasized in the earlier literature originated by Tobin (1958).
The summary results in Appendix Table 4 give the average absolute bias and standard errors across the four cases and again highlight the superiority of truncated PPML in terms of bias. For both coefficients, it has by far the lowest average absolute bias-less than half that of other estimators in almost every case, and on average across the two coefficients. The standard errors associated with this estimator are almost always among the lowest-although OLS in logs and the NBML have lower standard errors around their highly biased estimates.

Limited-Dependent Variable Estimators
In this section, we considered eight estimators designed specifically for estimation with limited-dependent variables. The estimators considered are: (i) the Eaton-Tamura model with the dependent variable in levels; (ii) the Eaton-Tamura model in logs; (iii) a Tobit-Poisson model 8 like that of Terza (1985); (iv) the maximum likelihood version of the Heckman model estimated in logs (see Amemiya (1984)); (v) the Heckman (1979)  Key results for cases with k = 1 are presented in Table 3, while results for k=1.5 are presented in Appendix Table 3. The Eaton-Tamura Tobit estimator in levels has quite low bias for both coefficients-around three percent-in Case 1. However, it exhibited much larger bias in all other cases. The same model in logarithms suffers from quite large bias (25 percent) in cases 1 and 2 but around 6 percent in Case 3, where the pattern of heteroscedasticity is consistent with the estimator. The average bias across cases was much lower for the log version than for the version in levels. The poor result for the ET estimator in levels is somewhat surprising given that the ET-Tobit model is the data-generating process, and that the ET-Tobit model in levels might be expected to avoid the bias associated with the combination of heteroscedasticity and log-linearization.
The performance of the Poisson-Tobit estimator is much worse than the ET-Tobit in all cases. It yields large bias-20 percent or higher in all but one case-and large standard errors in all cases. Because this model lacks the additive intercept that is central to the ET-Tobit data generating process, it suffers from similar problems to the censored PPML estimator.
The Heckman ML estimator in logs performed poorly in virtually every case. The bias of this estimator was smallest in case (3), where the form of heteroscedasticity is consistent with the estimator, but was still 14 percent for each coefficient. The Heckman 2SLS estimator in logs was worse in all cases, while the Heckman ML in levels was sharply inferior in cases 3 and 4. The Heckman 2SLS estimator in levels had the lowest average absolute bias of any of the limited-dependent variable estimators, although this bias was over 16 percent for β1 in all four cases. The ZIP estimator was uniformly inferior to the Heckman 2SLS estimator.
We had expected the ET-Tobit models to outperform alternative models such as Heckman given that the DGP in this case is ET-Tobit. One potential approach to improving the performance of the ET Tobit model might be to adjust for heteroscedasticity. In Table   4, we report results including the adjustments proposed by Maddala (1985), in which the error variance is specified by the process for the log-linear model, with γ and δ are estimated using nonlinear least squares. For the linear-in-levels model, the specified error process is exp . These models should capture heteroscedasticity of the types considered in Cases 1 to 3. The resulting estimates were reasonably satisfactory for Cases 1 and 3 using estimators in logs and for Case 1 using estimators in levels. When the true restriction that β0=0 was imposed, the bias in β1 and β2 fell to very low levels. This suggests that it may be important in estimation of the heteroscedasticity-corrected model to use restrictions on the parameter values when prior information or prior testing suggests that these are valid.
A feature of the results for this DGP is that the limited-dependent variable models were, in almost all cases, outperformed in terms of both bias and standard error by the truncated PPML estimator. This suggests that a two-part approach to estimation might be best if the underlying DGP appears to be of the ET-Tobit type, with a first stage used to assess whether trade occurs or not, and the PPML estimator applied in the second stage.
This is clearly an imperfect strategy given that the truncated PPML estimator has an average bias across the two coefficients of 7 percent, but the average bias of even the best limited-dependent estimator-the Heckman 2SLS in levels-is around 50 percent higher.
The superiority of the truncated PPML estimator over all other estimators appears to be robust to the frequency of zeros in the sample. It also arises in the sample with k=-1.5. It appears worthwhile to pursue limited-dependent-variable estimation only if an appropriate correction for heteroscedasticity can be made-perhaps by following a general to specific approach to estimation in order to introduce as many justified restrictions as is possible.

V.2. Heckman sample selection DGPs
In this section, we examine the performance of the estimators when the data are generated using Heckman Sample Selection models. For this purpose, we first generate the data with frequent zeros based on a standard Heckman sample selection framework. We then construct HMR-type data sets with two sources of omitted variable bias-sample selection and omission of the number of firms engaged in exporting.

The Standard Heckman DGP
For the standard Heckman DGPs, we assume that the selection equation contains at least one variable that is excluded from the behavioral equation. To generate the data for this test, we use equation (6) for the behavioral equation, but add the following sample selection equation: (8) y2i= exp (ziα).η*i ≡ exp (ziα)+ u*2i where ziα ≡ α0+ α1 x1i+ α2 x2i+ α4 x4i; η*i ≡1+[u*2i/exp(ziα)]; x4 is a variable included in the selection equation but excluded from the behavioral equation; and the correlation between ln(η*i) and ln(ηi) is set at 0.5. 9 The dependent variable in (6) is observed when y2i>1.
The results for standard estimators with this DGP are presented in Table 5 while those based on limited-dependent estimators are presented in Table 6. Results for average bias and standard errors across the four cases are summarized in Appendix Table 4. Zero trade flows account approximately for 30% of the observations in this sample.
In Table 5 The Poisson estimator outperforms the standard log-linear estimators in most cases.
Specifically, in cases 1 2 and 3 the bias on the estimate for β1 is about 14%. The bias is smaller in case 4 but the standard error is substantially larger. The bias on the-frequently more important-estimate of β2 is substantially lower than for β1. The truncated PPML performs better than the censored PPML in almost all cases, reinforcing our finding using the ET-Tobit DGP. As in the case of the Tobit DGP we find that as the number of zeros increases so does the bias and standard error of the parameter estimates generated by the PPML estimator. 10 The GPML and NBML estimators suffer from very large bias in almost all cases.
The standard errors associated with these estimators are also large. This finding reinforces our earlier findings-and those of SST-that these estimators do not appear to perform well for estimation of the gravity model.
When we turn to limited-dependent variable estimators, we find that the ET-Tobit estimators are frequently seriously biased. In some cases, such as the ET-Tobit estimator 9 Note that ln(η* i ) and ln(η i ) correspond exactly to the two error terms U 1i and U 2i in Heckman (1979), respectively. The correlation between ln(η* i ) and ln(η i ) is assumed to be 0.5 in order to rule out the case of independence between the two error terms in which case, the standard least squares estimator may be used on the selected subsample without introducing bias. 10 The simulation results with higher percentages of zero observations are available upon request by the authors.
in levels in Case 1, this seems related to the mismatch between the data generating process and the more restricted estimator. In Case 3 with this estimator in logs, the bias is again large with the error process homoscedastic. The Heckman estimators generally have lower bias and standard errors than the ET-Tobit estimators, which is not surprising given that the data were generated using a Heckman process. Perhaps somewhat surprisingly, however, the Heckman estimators in logs performed better than those in levels, despite the problems associated with the log transformation. The Poisson-Tobit estimator in levels performed reasonably well, although the Heckman-Maximum Likelihood estimator had lower average bias and smaller standard errors.
As shown in Appendix Table 4, the Zero-inflated Poisson (ZIP) estimator has, on average, substantially lower bias than any of the other estimators considered. The standard errors of this estimator are slightly larger than for the Heckman ML estimator in logs, the estimator with the second-lowest average absolute bias, suggesting that it is important to have a large sample when considering the use of the ZIP estimator 11 . The Heckman ML estimator has lower bias than the ZIP for categorical explanatory variables, but greater bias for the normally-distributed variable. The Heckman 2SLS estimator in logs may also be worth considering if convergence problems with the Heckman ML prove too difficult although its average bias across the four cases is more than twice as high as that for the ZIP estimator. By contrast with the ET-Tobit DGP, the ZIP estimator has clearly lower bias than the truncated PPML model both on average and in all but one case.

The HMR Data Generating Process
To capture the essence of the DGP à la HMR we need a firm number variable that is nonnegative and positively correlated with the probability of nonzero trade. It should also be correlated with the economic mass variable, and take the value zero when there is no trade. We generate synthetic data sets with our four different types of heteroscedasticity for Monte-Carlo analysis using the following two equations and the procedures laid out in more detail in Annex B: (10.1) y1i=exp(xiβ).ηi ≡ exp(β0+β1x1+β2x2+β3x3)+εi (10.2) y2i=exp(ziα).η*i ≡ exp(α0+α1x1+α2x2+α4x4). + u*2i where x3 is the firm number variable, constructed to be correlated with x2; and all other variables are as defined in equations (6) and (9). Equation (10.2) is the selection equation which determines whether there is nonzero trade. Specifically, if y2i>1 then y1i is positive.
Again, as in the standard Heckman sample selection model, we set the correlation between ln(ηi) and ln(η*i) equal to 0.5. Variable x3 in the behavioral equation mimics the behavior of the firm heterogeneity variable in the HMR theoretical framework. It is-for want of better information-assumed to be uniformly distributed 12 . As in the standard Heckman case, x4 is excluded from the behavioral equation.
Finally, we look at some simulation results when the data are generated à la HMR.
The first source of potential omitted-variable bias comes from the standard sample selection problem: similar factors determine the volume of trade (y1i) and the probability (Prob(y2i>1)) that trade occurs between two countries. The second source of omitted variable bias results from exclusion of a variable for the number of exporting firms (x3).
Like the firm number proxy used by HMR, x3 is uncorrelated with the error term in the trade equation.
From Table 7 and Appendix Table 4, it is clear that the truncated PPML estimator has the lowest average absolute bias of the standard estimators. The truncated OLS estimator in logs has the next lowest average absolute bias, and its standard errors are lower than for the truncated PPML, but the average absolute bias is four times as large as for truncated PPML. The NLS estimators perform extremely poorly, as do the GPML and NBML standard estimators.
The coefficient on x2 is of particular interest when the DGP is based on a Heckman sample selection with omitted variable bias. Since, by construction Cov(x2,x3)=0.5>0, we expect upward bias in the estimated coefficient on x2 when omitted-variable bias is not controlled for.
From Table 8 and Appendix to be useful. One concern with all of the Poisson-based estimators is that they have somewhat higher standard errors than the HMR estimator.

VI. Empirical Estimation
While we are very conscious of the importance of the multilateral resistance terms emphasized by Head and Mayer (2013), and hence concerned about the empirical validity of traditional gravity models, we consider the implications of different estimators for parameter estimates from both traditional (including GDP as explanatory variables) and Anderson-van Wincoop (using country fixed effects) specifications. The results for these analyses are presented in Tables 9 and 10. 16 Since the specification with country fixed effects controls for the multilateral resistance terms the results in Table 10 are our preferred ones.
The results in Table 9 are very informative about the coefficients on GDP and the log of distance. The truncated PPML has very similar but slightly smaller coefficients than the censored PPML. But a more striking feature of the Monte-Carlo tests with an HMR-based DGP, yields coefficient estimates above those of the other Poisson estimators, but still well below those for traditional estimators. The HMR estimator is an outlier in the other direction, with much lower coefficients on GDP, and somewhat lower coefficients on distance, than the PPML-based estimators.
The results for the Anderson-van Wincoop formulation are presented in Table 10 with model estimates when we control for firm heterogeneity à la Helpman, Melitz and Rubinstein. The results on the log of distance variable are very similar with this model to those for the traditional gravity model. The PPML model results for this coefficient are clustered around -0.8, while those for the traditional model are around -1.4. The HMR Model is again an outlier with a coefficient of -1.14.

VII. Reconciling Monte Carlo and Empirical Findings
The contrast between the empirical results and those from the Monte Carlo analysis raises comparability of the density and distribution functions of this variable and the log standard normal, we rescaled this composite driving variable to have the same mean as the log standard normal. We found that the distribution of this variable was well-represented by a log-normal distribution with mean -5.728896 and standard deviation 3.523514, rather than 0 and 1 as assumed with the log standard normal. To provide a visual comparison of the two distributions, we compare their cumulative distribution functions 19 (CDFs) in Figure   3. The figure reveals that the empirically-based distribution has a much higher fraction of the explanatory variable near zero than the standard normal variable.

Figure 3. CDFs for the Log Standard Normal and the Empirical Distribution
Note: Generated using MS EXCEL with 50000 data points per distribution at a spacing of 0.0001.
A focus on a couple of data points in Figure 3 highlights the stark differences between the two distributions. With the empirical distribution, 80 (90) percent of the observations are below 0.06 (0.27). With the log standard normal distribution, only 0.3 (11) percent of the 19 We had intended to compare their density functions but it proved to be impossible to plot these on the same scale.   (2006) about its parameter estimates depending upon the units of measurement.

VIII. Conclusions
The purpose of this paper was to evaluate the performance of different estimators of the   (1) a , b and c denote 10%, 5% and 1% level of significance, respectively. (2) For the HMR estimator a polynomial with degree 3 in z (i.e. z 1 , z 2 and z 3 ) are included in order to control for the firm heterogeneity. Specifically, we first estimate the Probit equation predicting whether or not country j and country k have trade with each other. From the Probit equation estimates we obtain jk p the predicted probability of exports from j to k. For exporter j and importer k, the inverse Mills ratio is equal to where Φ and Φ denote the density and the cdf. of the unit-normal distribution, respectively and

Patterns of Zeros in Trade Data
Since the purpose of this paper is to investigate the performance of different estimators when trade is characterized by frequent zero flows it is important to know what patterns of zeros are observed in real trade data. In particular, we are interested in how frequently zeros appear and whether they appear to be distributed independently of the variables that determine levels of nonzero trade flows. We look at the frequency of zeros using two samples of trade data at different levels of disaggregation. The first is the SST (2006) data set of 18360 observations on bilateral aggregate trade flows for 136 exporters and 135 importers in 1990. The second data set, developed under the leadership of Robert Feenstra and available from the NBER website, is for product-level US exports to more than 100 countries for more than 9000 ten-digit HS classifications. 20 We use the average of U.S. exports from 2002 to 2006 to identify consistently zero trade flows. The percentage of zero trade flows in the SST (2006) data set is 47.6%. Only 40% of the sample involves trade in both directions while 7.4% of the sample involves trade flows in only one direction. Table A.1, which shows in detail the features of the country-level trade data, reveals that smaller and poorer economies export to a much smaller number of destinations than other countries. Twenty-six such countries have zero trade with over 100 countries. Figure 1 presents a histogram and a kernel density for the percentages of zeros in the exports of the 136 exporters. It shows that 26 countries of the Notes: (1) The kernel density estimate shown by the solid line is calculated using the Epannechnikov kernel function with a band width equal to 0.0822. 20 See for example Feenstra, Romalis and Schott (2002). The link to the data is: http://cid.econ.ucdavis.edu/data/sasstata/usxss.html . Notes: (1) The kernel density estimate is calculated using the Epanechnikov kernel function with a band width equal to 0.025. (2) The total number of 10-digit products the U.S. exports is 9037.
sample-all of which are high-income economies-have zero exports to under 8% of their possible foreign markets. However, many other countries have zero exports to between 40% and 90% of their potential trading partners.
In U.S. product-level exports at the 10 digit level, zero trade flows are also very frequent. This is despite the fact that the U.S. is the largest economy in the world and one of the largest exporters, so the frequency of zeros is likely much lower than in the exports of other countries at this level of disaggregation. 21 Zero observations represent 63% of the total 1346513 (149 destinations * 9037 products) possible export flows. Figure 2 provides a histogram and a kernel density for the percentages of zeros in U.S. exports in the 9037 10-digit H.S. classifications. It shows that for 7702 of the 9037 H.S. classifications (i.e. 85%), the U.S. had zero exports to 40% or more of the 149 foreign destinations for the period 2002 -2006. Table A.2 presents the patterns of zero trade flows in both samples by groups of exporters' GDP, by importers' GDP, and by groups of bilateral distances. These are standard explanatory variables for bilateral trade in gravity regressions. For both samples, the GDP of exporters and importers and the bilateral distance are clearly important determinants of the likelihood of zero bilateral trade. On average, larger exporters and larger importers have positive exports to a larger number of foreign markets. Countries are 21 Note that the average presence of zero exports for the period 2002-2006 is less than the presence of zero exports we would expect to observe using annual export data. To shed more light on the determinants of positive bilateral trade flows we turned to regression techniques. For the aggregate country-level data set we ran a Probit regression to predict nonzero trade flows. For US exports, we used a linear probability model (LPM) because STATA does not allow us to include the 9037 product-specific dummies needed to control for product-specific factors in the disaggregated model. 22 Table 3 shows that variables such as GDP, GDPC (per capita GDP) and bilateral distance are important and statistically significant determinants of the likelihood of positive bilateral trade. A common border is found to reduce the probability of positive trade. This surprising result was also obtained by HMR (2008) in their first-stage Probit regressions. Potential explanations include: (a) that countries sharing borders are more likely to be involved in conflicts that disrupt their bilateral trade, and (b) that the dummy variable on contiguity is strongly negatively correlated with bilateral distance, and may be confounding the effect of sharing a common border. As a rough tie-breaker, we ran the same regressions excluding the bilateral distance variable, and found that the contiguity variable became positive and statistically significant. 23 The detailed analysis of the patterns of zeros in both aggregate and very disaggregated trade data suggests that-as might be expected from economic theory-zero trade flows are not randomly determined. They are the result of fundamental economic determinants such as the inability of any exporter from a particular country to meet the cost threshold for positive trade. The resulting relationships between the decision to trade and the resulting volume of trade are likely to have implications for estimation strategies. 22 We also ran a Probit regression for each of the 9037 10-digit products and on average the results are quantitatively and qualitatively the same. These results are available upon request. 23 The strong correlation between bilateral distance and common border dummy may also explain why the common border dummy is not found to have a robust and significant positive effect on the volume of trade once trade takes place in the empirical results presented in Section VII. Results for both the Probit and the LPM models excluding the bilateral distance variables involved positive and statistically significant effects of a common border on the probability of positive bilateral trade. These results are available from the authors upon request. Notes: (1) The numbers are the percentages of zero trade flows in the total number of possible trade/export flows. (1) Probit regression results report the coefficient estimates and z-statistics in the parentheses (2) LPM is the Linear Probability Model. T-statistics computed using the robust standard error with allowance for clustering on the exporter-importer pairs are in parentheses.

Generation of the HMR DGP for Monte Carlo Simulations
Step 1 Generate a uniformly distributed random variable x3 with 1000 observations. Sort x3 from lowest to highest value.
Create an indicator variable name order that is equal to 1, 2…1000. So order is strongly correlated with x3. This indicator variable will be used to merge x3 with the data set of y2 and other variables.
Save this data file under file name Uniform.
Step 2 Generate random variables x1, x2, y2 with correlation matrix [1, 0, 0\0, 1, 0.5\0, 0.5, 1]. In other words, the correlation between x1 and x2 and x1 and y2 is zero while the correlation between x2 and y2, which is the dependent variable of the equation that determines the sample selection rule, is 0.5. By construction, all of these variables are normally distributed with mean zero and variance 1. x1 then is transformed into a dummy variable with 40% of zeros. Specifically, for x1 with values below the 40th percentile x1 is replaced by 0 and for x1 with values above the 40th percentile x1 is replaced by 1.
Sort the data by y2 from lowest to highest values. The order of the other variables changes accordingly and their correlations remain intact.
Create an indicator variable order that is equal to 1, 2…1000. So variable order is strongly correlated with y2. This indicator variable will be used to merge with the data set of x3 in the Uniform file we saved in Step 1.
Step 3 Merge those variables x1, x2, y2 and x6 above with the data file Uniform including x3 by indicator variable order. Note that the correlation between x 2 and y 2 remains intact.
Transform the normally distributed variable y2~N(0,1) so that y2 has 25% of its values less than zero. Note that all the correlations remain intact as the transformation only shifts the distribution of y2 to the right.
Replace the values of x3 below the 25 th percentile by order with zeros. This means that the observations with zero trade always have zero firms.