Individual and Time Effects in Nonlinear Panel Models with Large N, T

We derive fixed effects estimators of parameters and average partial effects in (possibly dynamic) nonlinear panel data models with individual and time effects. They cover logit, probit, ordered probit, Poisson and Tobit models that are important for many empirical applications in micro and macroeconomics. Our estimators use analytical and jackknife bias corrections to deal with the incidental parameter problem, and are asymptotically unbiased under asymptotic sequences where $N/T$ converges to a constant. We develop inference methods and show that they perform well in numerical examples.


Introduction
Fixed effects estimators of nonlinear panel data models can be severely biased because of the incidental parameter problem (Neyman and Scott, 1948). A growing literature, surveyed in Arellano and Hahn (2007), shows that the leading term of an asymptotic expansion of the bias as both the cross-sectional dimension N and time series dimension T of the panel grow, can be characterized and corrected. In models with individual effects, the leading bias term is of order 1/T and comes from the estimation of the individual effects. This result, however, does not apply to models with individual and time effects, where both of these effects are treated as parameters to be estimated. In this paper we show that the estimation of the time effects causes an additional incidental parameter bias of order 1/N. Thus, if N and T are similarly large, the bias produced by the estimation of the time effects is of similar order of magnitude to the bias produced by the estimation of the individual effects, and both biases need to be corrected. We provide the corresponding analytical and jackknife bias corrections.
The asymptotic approximation to the fixed effects estimators that lets the two dimensions of the panel to grow with the sample size is motivated by the recent availability of long panels and other large pseudo-panel data structures where the indices might not correspond to individuals and time periods. Examples of these data sets include traditional microeconomic panel surveys with a long history of data such as the PSID and NLSY, international crosscountry panels such as the Penn World Table,  We focus on semi-parametric models with log-likelihood functions that are concave in all parameters, and where each individual effect α i and time effect γ t enter the log-likelihood for observation (i, t) additively as α i + γ t . This is the most common specification for the individual and time effects in linear models and is also a natural specification in the nonlinear models that we consider. Imposing concavity of the log-likelihood function greatly facilitates showing consistency in our setting where the dimension of the parameter space grows with the sample size. The most popular limited dependent variable models, including logit, probit, ordered probit, Tobit and Poisson models have concave loglikelihood functions, possibly after reparametrization (Olsen, 1978, andPratt, 1981). We note here that the general expansion that we derive in Appendix B do not impose additivity and concavity, but we use these restrictions to apply the expansion to fixed effects estimators. The models that we consider are semi-parametric because the joint distribution of the explanatory variables and the unobserved effects is left unspecified. The explanatory variables can be either strictly exogenous or predetermined.
We derive bias expansions and corrections for fixed effects estimators of common parameters β and average partial effects (APEs). The vector β includes all the unknown parameters that enter the log-likelihood function other than the individual and time effects, such as index coefficients in a probit model. The APEs are functions of the data, the common parameters, and the individual and time effects in nonlinear models. We find that the properties of the fixed effects estimators of β and the APEs are different. For β, the order of the bias is 1/T + 1/N, which is of the same as the rate of convergence 1/ √ NT under sequences where N/T converges to a constant. For the APEs, we uncover that the incidental parameter problem is negligible asymptotically because the order of the bias, 1/N + 1/T , is smaller than the rate of convergence, which is 1/ √ N + 1/ √ T , slower than for model parameters. To the best of our knowledge, this rate result is new for fixed effects estimators of average partial effects in nonlinear panel models with individual and time effects. 1 In numerical examples we find that the bias corrections, while not necessary to center the asymptotic distribution of APE estimators, do improve their finitesample properties, specially in dynamic models.
The bias correction eliminates the bias terms of orders 1/T and 1/N from the fixed effects estimators. We consider two methods to implement the correction: an analytical bias correction similar to Hahn and Newey (2004) and Hahn and Kuersteiner (2011), and a suitable modification of the split panel jackknife of Dhaene and Jochmans (2015). 2 However, the theory of the previous papers does not cover the models that we consider, because, in addition to not allowing for time effects, it assumes either identical distribution or stationarity over time for the processes of the observed variables, conditional on the unobserved effects. These assumptions are violated in our models due to the presence of the time effects, so we need to adjust the asymptotic theory accordingly. The individual and time effects introduce strong correlation in both dimensions of the panel. Conditional on the unobserved effects, we impose crosssectional independence and weak time-serial dependence, and we allow for heterogeneity in both dimensions.
Simulation evidence indicates that our corrections improve the estimation and inference performance of the fixed effects estimators of parameters and average effects. The analytical corrections dominate the jackknife corrections in a probit model 1 Galvao and Kato (2014) also found slow rates of convergence for fixed effects estimators in linear models with individual effects under misspecification. Fernández-Val and Lee (2013) pointed out this issue in nonlinear models with only individual effects.
2 A similar split panel jackknife bias correction method was outlined in Hu (2002).
for sample sizes that are relevant for empirical practice. In the online supplement (see Appendix E), we illustrate the corrections with an empirical application on the relationship between competition and innovation using a panel of U.K. industries, following Aghion et al. (2005). We find that the inverted-U pattern relationship found by Aghion et al. is robust to relaxing the strict exogeneity assumption of competition with respect to the innovation process and to the inclusion of innovation dynamics. We also uncover substantial positive state dependence in the innovation process.
Literature review. The Neyman and Scott incidental parameter problem has been extensively discussed in the econometric literature; see, for example, Heckman (1981), Lancaster (2000), and Greene (2004). There is also a vast literature that shows how to tackle the problem in specific models under asymptotic sequences where T is fixed and N grows to infinity. However, there are results, e.g. from Honoré and Tamer (2006), Chamberlain (2010), and Chernozhukov et al. (2013), showing that model parameters and APEs are not point identified in important nonlinear panel data models under fixed-T asymptotic sequences, implying that no fixed-T consistent point estimators exist in these models. A recent response to the incidental parameter problem is to adopt an alternative asymptotic approximation where both N and T grow with the sample size. Under these large-T sequences, the fixed effects estimator is consistent but has bias in the asymptotic distribution. This asymptotic bias is the large-T version of the incidental parameter problem and has motivated the development of bias corrections. Examples of papers that use this approximation include Phillips and Moon (1999), Hahn and Kuersteiner (2002), Lancaster (2002), Woutersen (2002), Alvarez and Arellano (2003), Hahn and Newey (2004), Carro (2007), Bonhomme (2009), Fernández-Val (2009), Kuersteiner (2011), Fernández-Val andVella (2011) and Kato et al. (2012). This previous work, however, does not cover models with time effects. 3 Our contribution to this literature is to extend the large-T bias corrections to models with two-way unobserved effects such as the individual and time effects commonly included in linear models.
The large-T panel literature on models with both individual and time effects is sparse. Pesaran (2006), Bai (2009) and Moon and Weidner (forthcoming, 2015b) study linear regression models with interactive individual and time fixed effects. The fixed effects estimators in these models also have asymptotic bias of order 1/T + 1/N, but the methods used to derive this bias rely on linearity and therefore cannot be applied to the nonlinear models that we consider. Hahn and Moon (2006) consider bias corrected fixed effects estimators in panel linear autoregressive models with additive individual and time effects. Regarding non-linear models, there is independent and contemporaneous work by Charbonneau (2012Charbonneau ( , 2014, which extends the conditional fixed effects estimators to logit and Poisson models with individual and time effects. She differences out the individual and time effects by conditioning on sufficient statistics. The conditional approach completely eliminates the asymptotic bias coming from the estimation of the incidental parameters, but it does not permit estimation of average partial effects and has not been developed for models with predetermined regressors. We instead consider estimators of model parameters and average partial effects in nonlinear models with predetermined regressors. The two approaches can therefore be considered as complementary. Outline of the paper. The rest of the paper is organized as follows. Section 2 introduces the model and fixed effects estimators. Section 3 describes the bias corrections to deal with the incidental parameter problem and illustrates how the bias corrections work through an example. Section 4 provides the asymptotic theory. Section 5 presents Monte Carlo results. The Appendix collects the proofs of the main results, and an online supplement to the paper contains additional technical derivations, numerical examples, and an empirical application (Fernández-Val and Weidner, 2015b) (see Appendix E).

Model
for a scalar outcome variable of interest Y it and a vector of explanatory variables X it . We assume that the outcome for individual i at time t is generated by the sequential process: is a known probability function, and β is a finite dimensional parameter vector. The variables α i and γ t are unobserved individual and time effects that in economic applications capture individual heterogeneity and aggregate shocks, respectively. The model is semiparametric because we do not specify the distribution of these effects nor their relationship with the explanatory variables.
The conditional distribution f Y represents the parametric part of the model. The vector X it contains predetermined variables with respect to Y it . Note that X it can include lags of Y it to accommodate dynamic models.
We consider two running examples throughout the analysis: Example 1 (Binary Response Model). Let Y it be a binary outcome and F be a cumulative distribution function, e.g. the standard normal or standard logistic distribution. We can model the conditional distribution of Y it using the single-index specification with individual and time effects In a labor economics application, Y can be an indicator for female labor force participation and X can include fertility indicators and other socio-economic characteristics.
Example 2 (Poisson Model). Let Y it be a non-negative integervalued outcome, and f (·; λ) be the probability mass function of a Poisson random variable with mean λ > 0. We can model the conditional distribution of Y it using the single index specification with individual and time effects In an industrial organization application, Y can be the number of patents that a firm produces and X can include investment in R&D and other firm characteristics.
For estimation, we adopt a fixed effects approach, treating the realization of the unobserved individual and time effects as parameters to be estimated. We collect all these effects in the vector φ NT = (α 1 , . . . , α N , γ 1 , . . . , γ T ) ′ . The model parameter β usually includes regression coefficients of interest, while the vector φ NT is treated as a nuisance parameter. The true values of the parameters, denoted by β 0 and φ 0 NT = (α 0 1 , . . . , α 0 N , γ 0 1 , . . . , γ 0 T ) ′ , are the solution to the population conditional maximum likelihood problem t . The penalty produces a maximizer of L NT that is automatically normalized. We could equivalently impose v ′ NT φ NT = 0 as a constraint, but for technical reasons we prefer to work with an unconstrained optimization problem. There are other possible normalizations for φ NT , such as α 1 = 0. The model parameter β is invariant to the choice of normalization, that is, our asymptotic results on the estimator for β are independent of this choice of normalization. Our choice is convenient for certain intermediate results that involve the incidental parameter φ NT , its score vector and its Hessian matrix. The pre-factor (NT ) −1/2 in L NT (β, φ NT ) is just a rescaling.
Other quantities of interest involve averages over the data and unobserved effects where E denotes the expectation with respect to the joint distribution of the data and the unobserved effects, provided that the expectation exists. δ 0 NT is indexed by N and T because the can be heterogeneous across i and/or t; see Section 4.2. These averages include average partial effects (APEs), which are often the ultimate quantities of interest in nonlinear models. The APEs are invariant to the choice of normalization for φ NT if α i and γ t enter ∆(X it , β, α i , γ t ) as α i + γ t . Some examples of partial effects that satisfy this condition are the following: Example 1 (Binary Response Model). If X it,k , the kth element of X it , is binary, its partial effect on the conditional probability of Y it is In Appendix B we derive asymptotic expansions that apply to general models with multiple unobserved effects. In order to use these expansions to obtain the asymptotic distribution of the panel fixed effects estimators, we need to derive the properties of the expected Hessian of the incidental parameters, a matrix with increasing dimension, and to show the consistency of the estimator of the incidental parameter vector. The additive specification α i + γ t is useful to characterize the Hessian and we impose strict concavity of the objective function to show the consistency.
where β k is the kth element of β, and X it,−k and β −k include all elements of X it and β except for the kth element. If X it,k is continuous and F is differentiable, the partial effect of X it,k on the conditional probability of Y it is where ∂F is the derivative of F .
Example 2 (Poisson Model). If X it includes Z it and some known transformation H(Z it ) with coefficients β k and β j , the partial effect (2.5)

Fixed effects estimators
We estimate the parameters by solving the sample analog of problem (2.1), i.e.
As in the population case, we shall impose conditions guaranteeing that the solution to this maximization problem exists and is unique with probability approaching one as N and T become large. For computational purposes, we note that the solution to the program (2.6) for β is the same as the solution to the program that imposes v ′ NT φ NT = 0 directly as a constraint in the optimization, and is invariant to the normalization. In our numerical examples we impose either α 1 = 0 or γ 1 = 0 directly by dropping the first individual or time effect. This constrained program has good computational properties because its objective function is concave and smooth in all the parameters. We have developed the commands probitfe and logitfe in Stata to implement the methods of the paper for probit and logit models (Cruz-González et al., 2015). 5 When N and T are large, e.g., N > 2,000 and T > 50, we recommend the use of optimization routines that exploit the sparsity of the design matrix of the model to speed up computation such as the package Speedglm in R (Enea, 2012). For a probit model with N = 2,000 and T = 52, Speedglm computes the fixed effects estimator in less than 2 min with a 2 × 2.66 GHz 6-Core Intel Xeon processor, more than 7.5 times faster than our Stata command probitfe and more than 30 times faster than the R command glm. 6 To analyze the statistical properties of the estimator of β it is convenient to first concentrate out the nuisance parameter φ NT . For given β, we define the optimal  φ NT (β) as The fixed effects estimators of β and φ NT are (2.8) Estimators of APEs can be formed by plugging-in the estimators of the model parameters in the sample version of (2.2), i.e.
(2.9) Again,  δ NT is invariant to the normalization chosen for φ NT if α i and γ t enter ∆(X it , β, α i , γ t ) as α i + γ t . 5 We refer to this companion work for computational details.
6 Additional comparisons of computational times are available from the authors upon request.

Incidental parameter problem and bias corrections
In this section we give a heuristic discussion of the main results, leaving the technical details to Section 4. We illustrate the analysis with numerical calculations based on a variation of the classical Neyman and Scott (1948) variance example.

Incidental parameter problem
Fixed effects estimators in nonlinear models suffer from the incidental parameter problem (Neyman and Scott, 1948). The source of the problem is that the dimension of the nuisance parameter φ NT increases with the sample size under asymptotic approximations where either N or T passes to infinity. To describe the problem let  . (3.1) The fixed effects estimator is inconsistent under the traditional Neyman and Scott asymptotic sequences where N → ∞ and T is fixed, i.e., plim N→∞ β NT ̸ = β 0 . Similarly, the fixed effects estimator is inconsistent under asymptotic sequences where T → ∞ and N is fixed, i.e., plim Under asymptotic approximations where either N or T is fixed, there is only a fixed number of observations to estimate some of the components of φ NT , T for each individual effect or N for each time effect, rendering the estimator  φ NT (β) inconsistent for φ NT (β).
The nonlinearity of the model propagates the inconsistency to the estimator of β.
A key insight of the large-T panel data literature is that the incidental parameter problem becomes an asymptotic bias problem under an asymptotic approximation where N → ∞ and T → ∞ (e.g., Arellano and Hahn, 2007). For models with only individual effects, this literature derived the expansion β NT = β 0 + B/T + o P (T −1 ) as N, T → ∞, for some constant B. The fixed effects estimator is consistent because plim N,T →∞ β NT = β 0 , but has bias in the asymptotic distribution if B/T is not negligible relative to 1/ √ NT , the order of the standard deviation of the estimator. This asymptotic bias problem, however, is easier to tackle than the inconsistency problem that arises under the traditional Neyman and Scott asymptotic approximation. We show that the same insight still applies to models with individual and time effects, but with a different expansion for β NT . We characterize the expansion and develop bias corrections.

Bias expansions and bias corrections
Some expansions can be used to explain our corrections. For smooth likelihoods and under appropriate regularity conditions, as N, T → ∞, for some V ∞ that we also characterize in Theorem 4.1. Under asymptotic sequences where N/T → κ 2 as N, T → ∞, the fixed effects estimator is asymptotically biased because Relative to fixed effects estimators with only individual effects, the presence of time effects introduces additional asymptotic bias through D β ∞ . This asymptotic result predicts that the fixed effects estimator can have significant bias relative to its dispersion. Moreover, confidence intervals constructed around the fixed effects estimator can severely undercover the true value of the parameter even in large samples. We show that these predictions provide a good approximations to the finite sample behavior of the fixed effects estimator through analytical and simulation examples in Sections 3.3 and 5.
The analytical bias correction consists of subtracting estimators of the leading terms of the bias from the fixed effect estimator of The bias corrected estimator can be formed as The analytical correction therefore centers the asymptotic distribution at the true value of the parameter, without increasing asymptotic variance. This asymptotic result predicts that in large samples the corrected estimator has small bias relative to dispersion, the correction does not increase dispersion, and the confidence intervals constructed around the corrected estimator have coverage probabilities close to the nominal levels. We show that these predictions provide a good approximations to the behavior of the corrections in Sections 3.3 and 5 even in small panels with N < 60 and T < 15.
We also consider a jackknife bias correction method that does not require explicit estimation of the bias. This method is based on the split panel jackknife (SPJ) of Dhaene and Jochmans (2015) applied to the time and cross-section dimension of the panel. Alternative jackknife corrections based on the leave-oneobservation-out panel jackknife (PJ) of Hahn and Newey (2004) and combinations of PJ and SPJ are also possible. We do not consider corrections based on PJ because they are theoretically justified by second-order expansions of β NT that are beyond the scope of this paper.
To describe our generalization of the SPJ, define the fixed effects estimator of β in the subpanel with cross sectional indices A and time series indices B as and leaving out half of the individuals of the panel. 7 In choosing the cross sectional indexing of the panel, one might want to take into account individual clustering structures and other dependences to preserve them in the SPJ. For example, all the individuals belonging to the same cluster should be indexed such that they remain in the same subpanel after the cross sectional split. If there are no cross sectional dependences, the indexing of the individuals is unrestricted. We recommend to construct  β N/2,T as the average of the estimators obtained from all possible partitions of N/2 individuals to avoid ambiguity and arbitrariness in the choice of the division. 8 The bias corrected estimator is To give some intuition about how the corrections work, note that

Illustrative example
To illustrate how the bias corrections work in finite samples, we consider a simple model where the solution to the population program (3.1) has closed form. This model corresponds to a variation of the classical Neyman and Scott (1948) variance example that includes both individual and time effects, Y it | α, γ , β ∼ N (α i + γ t , β). It is well-know that in this case Moreover, from the well-known results on the degrees of freedom adjustment of the estimated variance This correction reduces the order of the bias from (T −1 ∨ N −1 ) to (T −2 ∨ N −2 ), and introduces additional higher order terms.
When N is large, we can approximate the average over all possible partitions by the average over S ≪ P randomly chosen partitions to speed up computation.
9 Okui (2013) derived the bias of fixed effects estimators of autocovariances and autocorrelations in this model.

Table 1
Biases and Standard Deviations for Y it | α, γ , β ∼ N (α i + γ t , β). The analytical correction increases finite-sample variance because the factor (1 + 1/T + 1/N) > 1. We compare the biases and standard deviations of the fixed effects estimator and the corrected estimator in a numerical example below. For the Jackknife correction, straightforward calculations give The correction therefore reduces the order of the bias from (T −1 ∨ N −1 ) to (TN) −1 . 10 Table 1 presents numerical results for the bias and standard deviations of the fixed effects and bias corrected estimators in finite samples. We consider panels with N, T ∈ {10, 25, 50}, and only report the results for T ≤ N since all the expressions are symmetric in N and T . All the numbers in the table are in percentage of the true parameter value, so we do not need to specify the value of β 0 .
We find that the analytical and jackknife corrections offer substantial improvements over the fixed effects estimator in terms of bias. The first and fourth row of the table show that the bias of the fixed effects estimator is of the same order of magnitude as the standard under independence of Y it over i and t conditional on the unobserved effects. The fifth row shows the increase in standard deviation due to analytical bias correction is small compared to the bias The last row shows that the jackknife yields less precise estimates than the analytical correction when T = 10. Table 2 illustrates the effect of the bias on the inference based on the asymptotic distribution. It shows the coverage probabilities of 95% asymptotic confidence intervals for β 0 constructed in the usual way as These probabilities do not depend on the value of β 0 because the limits of the intervals are proportional to  β. For the Jackknife we compute the probabilities numerically by simulation with β 0 = 1. As a benchmark of comparison, we also consider confidence intervals constructed from the unbiased Here we find that the confidence intervals based on the fixed effect estimator display 10 In this example it is possible to develop higher-order jackknife corrections that completely eliminate the bias because we know the entire expansion of β NT . For  (2015) for a discussion on higher-order bias corrections of panel fixed effects estimators.

Table 2
Coverage probabilities for severe undercoverage for all the sample sizes. The confidence intervals based on the corrected estimators have high coverage probabilities, which approach the nominal level as the sample size grows. Moreover, the bias corrected estimators produce confidence intervals with very similar coverage probabilities to the ones from the unbiased estimator.

Asymptotic theory for bias corrections
In nonlinear panel data models the population problem (3.1) generally does not have closed form solution, so we need to rely on asymptotic arguments to characterize the terms in the expansion of the bias (3.2) and to justify the validity of the corrections.

Asymptotic distribution of model parameters
We consider panel models with scalar individual and time effects that enter the likelihood function additively through π it = α i + γ t . In these models the dimension of the incidental parameters is dim φ NT = N + T . The leading cases are single index models, where the dependence of the likelihood function on the parameters is through an index X ′ it β + α i + γ t . These models cover the probit and Poisson specifications of Examples 1 and 2. The additive structure only applies to the unobserved effects, so we can allow for scale parameters to cover the Tobit and negative binomial models. We focus on these additive models for computational and analytical tractability, because we can establish the consistency of the fixed effects estimators under a concavity assumption in the log-likelihood function with respect to all the parameters.
We make the following assumptions: Assumption 4.1 (Panel Models). Let ν > 0 and µ > 4(8 + ν)/ν. Let ε > 0 and let B 0 ε be a subset of R dim β+1 that contains an εneighborhood of (β 0 , π 0 it ) for all i, t, N, T . 11 (i) Asymptotics: we consider limits of sequences where N/T → κ 2 , 0 < κ < ∞, as N, The realizations of the parameters and unobserved effects that generate the observed data are denoted by β 0 and φ 0 . (iv) Smoothness and moments: We assume that (β, π )  → ℓ it (β, π ) is four times continuously differentiable over B 0 ε a.s. The partial derivatives of ℓ it (β, π ) with respect to the elements of (β, π ) up to fourth order are bounded in absolute value uniformly over (β, π ) ] is a.s. uniformly bounded over a.s. Furthermore, there exist constants b min and b max such that Remark 1 (Assumption 4.1). Assumption 4.1(i) defines the large-T asymptotic framework and is the same as in Hahn and Kuersteiner (2011). The relative rate of N and T exactly balances the order of the bias and variance producing a non-degenerate asymptotic distribution. Assumption 4.1(ii) does not impose identical distribution nor stationarity over the time series dimension, conditional on the unobserved effects, unlike most of the large-T panel literature, e.g., Hahn and Newey (2004) and Hahn and Kuersteiner (2011). These assumptions are violated by the presence of the time effects, because they are treated as parameters. The mixing condition is used to bound covariances and moments in the application of laws of large numbers and central limit theorems-it could replaced by other conditions that guarantee the applicability of these results.
Assumption 4.1(iii) is the parametric part of the panel model.
We rely on this assumption to guarantee that ∂ β ℓ it and ∂ π ℓ it have martingale difference properties. Moreover, we use certain Bartlett identities implied by this assumption to simplify some expressions, but those simplifications are not crucial for our results. We provide expressions for the asymptotic bias and variance that do not apply these simplifications in Remark 3. Assumption 4.1(iv) imposes smoothness and moment conditions in the log-likelihood function and its derivatives. These conditions guarantee that the higher-order stochastic expansions of 11 For example, B 0 ε can be chosen to be the Cartesian product of the ε-ball around β 0 and the interval [π min , π max ], with π min ≤ π it − ε and π max ≥ π it + ε for all i, t, N, T . We can have π min = −∞ and π max = ∞, as long as this is compatible with Assumption 4.1(iv) and (v). the fixed effect estimator that we use to characterize the asymptotic bias are well-defined, and that the remainder terms of these expansions are bounded.
The most commonly used nonlinear models in applied economics such as logit, probit, ordered probit, Poisson, and Tobit models have smooth log-likelihoods functions that satisfy the concavity condition of Assumption 4.1(v), provided that all the elements of X it have cross sectional and time series variation. Assumption 4.1(v) guarantees that β 0 and φ 0 are the unique solution to the population problem (2.1), that is all the parameters are point identified.
To describe the asymptotic distribution of the fixed effects estimator  β, it is convenient to introduce some additional notation. Let H be the (N + T ) × (N + T ) expected Hessian matrix of the log-likelihood with respect to the nuisance parameters evaluated at the true parameters, i.e.
with q = 0, 1, 2. The kth component of Ξ it corresponds to the population least squares projection of E φ (∂ β k π ℓ it )/E φ (∂ π 2 ℓ it ) on the space spanned by the incidental parameters under a metric The operator D βπ q partials out individual and time effects in nonlinear models. It corresponds to individual and time differencing when the model is linear. To see this, consider the normal lin- Remark 2. The complete proof of Theorem 4.1 is provided in the Appendix. Here we point out why the argument for the consistency proof in models with only individual effects does not apply to our setting, give a heuristic derivation of the asymptotic distribution, and highlight where some of the assumptions are used in the proof.
(i) The consistency proof for models with only individual effects relies on partitioning the log-likelihood in the sum of individual log-likelihoods that depend on a fixed number of parameters, the model parameter β and the corresponding individual effect α i . The maximizers of the individual loglikelihood are then consistent estimators of all the parameters as T becomes large by standard arguments. This approach does not work in models with individual and time effects because there is no partition of the data that is only affected by a fixed number of parameters, and whose size grows with the sample size. (ii) In the following we give a heuristic discussion of the asymptotic distribution result for  β. A first-order Taylor series expansion to approximate the first order conditions of (2.8) where the first term has zero mean and determines the asymptotic variance, and the second and third term determine the asymptotic bias. Thus, by the central limit theorem and the information equality, The second and third terms satisfy where B ∞ and D ∞ are characterized from a second-order Taylor series expansion to approximate  φ 0 around φ 0 . We refer to the Appendix for the details of this derivation. There we show that B ∞ and D ∞ originate from the elements of  φ 0 corresponding to the individual effects and time effects, respectively. Plugging those results into (4.4), and solving for This derivation shows that the source of the bias is that the This problem arises from the substitution of the incidental parameter φ by the sample analog  φ 0 that has a rate of convergence slower than where ω it = H it ∂F it andX it is the residual of the population projection of X it on the space spanned by the incidental parameters under a metric weighted by E φ (ω it ). For the probit model where all the components of X it are strictly exogenous, The asymptotic bias is therefore a positive definite matrix weighted average of the true parameter value as in the case of the probit model with only individual effects (Fernández-Val, 2009).
Example 2 (Poisson Model). In this case Substituting these values in the expressions of the bias of Theorem 4.1 yields and D ∞ = 0, whereX it is the residual of the population projection of X it on the space spanned by the incidental parameters under a metric weighted by E φ (ω it ). If in addition all the components of X it are strictly exogenous, then we get the no asymptotic bias result

Remark 3 (Bias and Variance Expressions for Conditional Moment Models).
In the derivation of the asymptotic distribution, we apply Bartlett identities implied by Assumption 4.1(iii) to simplify the expressions. The following expressions of the asymptotic bias and variance do not make use of these identities and therefore remain valid in conditional moment models that do not specify the entire conditional distribution of Y it : and W ∞ is the same as in Theorem 4.1.
For example, consider the Poisson fixed effects estimator in the whereX it is defined as in Example 2. If all the components of X it are strictly exogenous, then we get again the no asymptotic bias result

Asymptotic distribution of APEs
In nonlinear models we are often interested in APEs, in addition to model parameters. These effects are averages of the data, parameters and unobserved effects; see expression (2.2). For the panel models of Assumption 4.1 we specify the partial effects as . The restriction that the partial effects depend on α i and γ t through π it is natural in our panel and the partial effects are usually defined as differences or derivatives of this conditional expectation with respect to the components of X it . For example, the partial effects for the binary response and Poisson models described in Section 2 satisfy this restriction.
The distribution of the unobserved individual and time effects is not ancillary for the APEs, unlike for model parameters. We therefore need to make assumptions on this distribution to define and interpret the APEs, and to derive the asymptotic distribution of their estimators. We control the heterogeneity of the partial effects assuming that the individual effects and explanatory variables are identically distributed cross sectionally and/or stationary over time. If (X it , α i , γ t ) is identically distributed over i but can be heterogeneously distributed over t, then E[∆ it ] = δ 0 t , and δ 0 NT , and δ 0 NT = δ 0 does not change with N and T . We also impose smoothness and moment conditions on the function ∆ that defines the partial effects. We use these conditions to derive higher-order stochastic expansions for the fixed effect estimator of the APEs and to bound the remainder terms in these expansions.
12 In the working paper version, Fernández-Val and Weidner (2015a), we also consider inference conditional on the unobserved effects by assuming that {α i } N and {γ t } T are deterministic sequences.
(iii) Smoothness and moments: The function (β, π )  → ∆ it (β, π ) is four times continuously differentiable over B 0 ε a.s. The partial derivatives of ∆ it (β, π ) with respect to the elements of (β, π ) up to fourth order are bounded in absolute value uniformly over (β, π ) ∈ B 0 ε by a function M(Z it ) > 0 a.s., ] is a.s. uniformly bounded over N, T . (iv) Non-degeneracy and moments: 0 which is the population projection of ∂ π ∆ it /E φ [∂ π 2 ℓ it ] on the space spanned by the incidental parameters under the metric given by E φ [−∂ π 2 ℓ it ]. We use analogous notation to the previous section for the derivatives with respect to β and higher order derivatives with respect to π . Let δ 0 NT and  δ be the APE and its fixed effects estimator, defined as in Eqs. (2.2) and (2.9) with ∆(X it , β, α i , γ t ) = ∆ it (β, α i + γ t ). 13 The following theorem establishes the asymptotic distribution of  δ.
for some deterministic sequence r NT → ∞ such that r NT In this decomposition the first term captures variation due to parameter estimation, whereas the second term captures variation due to estimation of a population mean by a sample mean. Under Assumption 4.2(iv) the convergence rate r NT is determined by the convergence rate of δ − δ 0 NT , which depends on the sampling properties of the unobserved effects. For example, if {α i } N and {γ t } T are independent sequences, and α i and γ t are independent for all i, t, then r NT = √ NT /(N + T − 1), and  .
In the expression of V In numerical examples, however, we find that correcting the mean and variance for parameter estimation improves the finite-sample estimation and inference properties of the APE estimators.

Remark 5 (Average Effects from Bias Corrected Estimators
The asymptotic variance of  δ is the same as in Theorem 4.2.
In the following examples we assume that the APEs are constructed from asymptotically unbiased estimators of the model parameters.

Example 1 (Binary Response Model). Consider the partial effects defined in (2.3) and (2.4) with
Using the notation previously introduced for this example, the components of the asymptotic bias of  δ are whereΨ it is the residual of the population regression of , where g it does not depend on π. For example, g it (β) = β k +β j h(Z it ) in (2.5). Using the notation previously introduced for this example, the components of the asymptotic bias are and D δ ∞ = 0, whereg it is the residual of the population projection of g it on the space spanned by the incidental parameters under a metric weighted by E φ [ω it ]. The asymptotic bias is zero if all the components of X it are strictly exogenous or g it (β) is constant. The latter arises in the leading case of the partial effect of the kth component of X it since g it (β) = β k . This no asymptotic bias result applies to any type of regressor, strictly exogenous or predetermined.

Bias corrected estimators
The results of the previous sections show that the asymptotic distributions of the fixed effects estimators of the model parameters and APEs can have biases of the same order as the variances under sequences where T grows at the same rate as N. This is the large-T version of the incidental parameter problem that invalidates any inference based on the fixed effect estimators even in large samples. In this section we describe how to construct analytical and jackknife bias corrections for the fixed effect estimators and give conditions for the asymptotic validity of these corrections.
The jackknife correction for the model parameter β in Eq.
(3.4) is generic and applies to the panel model. For the APEs, the jackknife correction is formed similarly as where  δ N,T /2 is the average of the 2 split jackknife estimators of the APE that use all the individuals and leave out the first and second halves of the time periods, and  δ N/2,T is the average of the 2 split jackknife estimators of the APE that use all the time periods and leave out half of the individuals. The analytical corrections are constructed using sample analogs of the expressions in Theorems 4.1 and 4.2, replacing the true values of β and φ by the fixed effects estimators. To describe these corrections, we introduce some additional notation. For any function of the data, unobserved effects and parameters g itj (β, denote the fixed effects estimator, e.g.,  The kth component of  Ξ it corresponds to a least squares regression and L is a trimming parameter for estimation of spectral expectations such that L → ∞ and L/T → 0 (Hahn and Kuersteiner, 2011). Here we use truncation instead of kernel smoothing in the estimation of spectral expectations following Hahn and Kuersteiner (2007). Note that, unlike for variance estimation, a kernel is not needed to ensure that the bias estimator be positive. Instead of choosing a value of L, our recommendation for practice is to conduct a sensitivity analysis by reporting estimates for multiple values of L starting from L = 0. From our experience based on extensive Monte Carlo simulations, we do not recommend values of L greater than 4, because the finite-sample dispersion of the estimator quickly increases with L. We refer to Section 5 for an example of sensitivity analysis with respect to L. The factor T /(T − j) is a degrees of freedom adjustment that rescales the time series averages T −1  T t=j+1 by the number of observations instead of by T . Similar corrections for conditional mean models can be formed using the sample analogs of the expressions of B ∞ and D ∞ in Remark 3. We do not spell out these estimators for the sake of brevity.
Asymptotic (1 − p)-confidence intervals for the components of β can be formed as We have implemented the analytical correction at the level of the estimator. Alternatively, we can implement the correction at the level of the score or first order conditions by solving The fixed effects estimators of the components of the asymptotic bias are The estimator of the asymptotic variance depends on the sampling properties of the unobserved effects. Under the independence assumption of Remark 4, Note that we do not need to specify the convergence rate r NT to make inference because the standard errors √  V δ /r NT do not depend on r NT . Bias corrected estimators and confidence intervals can be constructed in the same fashion as for the model parameter.
We use the following homogeneity assumption to show the validity of the jackknife corrections for the model parameters and  This assumption might seem restrictive for dynamic models where X it includes lags of the dependent variable because in this case it restricts the unconditional distribution of the initial conditions of Y it . Note, however, that Assumption 4.3 allows the initial conditions to depend on the unobserved effects. In other words, it does not impose that the initial conditions are generated from the stationary distribution of Y it conditional on X t i and φ. are the probability limits of the fixed effects estimators of β in the subpanels that include all the individuals and the first and second halves of the time periods, respectively. These implications can be tested using variations of the Chow-type test proposed in Dhaene and Jochmans (2015). We provide an example of the application of these tests to our setting in Section S.1.1 of the supplemental material.

Assumption 4.3 (Unconditional Homogeneity
The following theorems are the main results of this section. They show that the analytical and jackknife bias corrections eliminate the bias from the asymptotic distribution of the fixed effects estimators of the model parameters and APEs without increasing variance, and that the estimators of the asymptotic variances are consistent.
Under the conditions of Theorem 4.1 and Assumption 4.3, Under the conditions of Theorems 4.1 and 4.2, and Assumption 4.3, Remark 7 (Rate of Convergence). The rate of convergence r NT depends on the properties of the sampling process for the explanatory variables and unobserved effects (see Remark 4).

Monte Carlo experiments
This section reports evidence on the finite sample behavior of fixed effects estimators of model parameters and APEs in static models with strictly exogenous regressors and dynamic models with predetermined regressors such as lags of the dependent variable. We analyze the performance of uncorrected and biascorrected fixed effects estimators in terms of bias and inference accuracy of their asymptotic distribution. In particular we compute the biases, standard deviations, and root mean squared errors of the estimators, the ratio of average standard errors to the simulation standard deviations (SE/SD); and the empirical coverages of confidence intervals with 95% nominal value (p; .95). 14 Overall, we find that the analytically corrected estimators dominate the uncorrected and jackknife corrected estimators. 15 A possible explanation for the better finite-sample performance of the analytical over the jackknife corrections is that the jackknife increases dispersion because the components of the bias are estimated from subsamples that include half of the observations of the panel. We observe this variance increase in all our numerical examples, specially in short panels. The jackknife corrections are also more sensitive than the analytical corrections to Assumption 4.3. All the results are based on 500 replications. The designs correspond to static and dynamic probit models. As in the analytical example of Section 3.3, we find that our large T asymptotic approximations capture well the behavior of the fixed effects estimator and the bias corrections in moderately long panels with N = 56 and T = 14.

Static probit model
The data generating process is where α i ∼ N (0, 1/16), γ t ∼ N (0, 1/16), ε it ∼ N (0, 1), and β = 1. We consider two alternative designs for X it : autoregressive process and linear trend process both with individual and time effects. In the first design, X it = X i,t−1 /2 + α i + γ t + υ it , υ it ∼ N (0, 1/2), and X i0 ∼ N (0, 1). In the second design, X it = 2t/T + α i + γ t + υ it , υ it ∼ N (0, 3/4), which violates Assumption 4.3. In both designs X it is strictly exogenous with respect to ε it conditional on the individual and time effects. The variables α i , γ t , ε it , υ it , and X i0 are independent and i.i.d. across individuals and time periods.
We generate panel data sets with N = 56 individuals and three different numbers of time periods T : 14, 28 and 56. 16 Table 3 reports the results for the probit coefficient β, and the APE of X it . We compute the APE using (2.4). Throughout the table, MLE-FETE corresponds to the probit maximum likelihood 14 The standard errors are computed using the expressions (4.7) and (4.9) with estimator with individual and time fixed effects, Analytical is the bias corrected estimator that uses the analytical correction, and Jackknife is the bias corrected estimator that uses SPJ in both the individual and time dimensions. The cross-sectional division in the jackknife follows the order of the observations. All the results are reported in percentage of the true parameter value. We find that the bias is of the same order of magnitude as the standard deviation for the uncorrected estimator of the probit coefficient causing severe undercoverage of the confidence intervals. This result holds for both designs and all the sample sizes considered. The bias corrections, specially Analytical, remove the bias without increasing dispersion, and produce substantial improvements in rmse and coverage probabilities. For example, Analytical reduces rmse by 50% and increases coverage by 26% in the first design with T = 14. As in Hahn and Newey (2004) and Fernández-Val (2009), we find very little bias in the uncorrected estimates of the APE, despite the large bias in the probit coefficients. Jackknife performs relatively worse in the second design that does not satisfy Assumption 4.3.

Dynamic probit model
The data generating process is .5, and β Z = 1. We consider two alternative designs for Z it : autoregressive process and linear trend process both with individual and time effects. In the first design, Z it = Z i,t−1 /2 + α i + γ t + υ it , υ it ∼ N (0, 1/2), and Z i0 ∼ N (0, 1). In the second design, Z it = 1.5t/T +α i +γ t +υ it , υ it ∼ N (0, 3/4), which violates Assumption 4.3. The variables α i , γ t , ε it , υ it , and Z i0 are independent and i.i.d. across individuals and time periods. We generate panel data sets with N = 56 individuals and three different numbers of time periods T : 14, 28 and 56. Table 4 reports the simulation results for the probit coefficient β Y and the APE of Y i,t−1 . We compute the partial effect of Y i,t−1 using the expression in Eq. (2.3) with X it,k = Y i,t−1 . This effect is commonly reported as a measure of state dependence for dynamic binary processes. Table 5 reports the simulation results for the estimators of the probit coefficient β Z and the APE of Z it . We compute the partial effect using (2.4) with X it,k = Z it . Throughout the tables, we compare the same estimators as for the static model. For the analytical correction we consider two versions, Analytical (L = 1) sets the trimming parameter to estimate spectral expectations L to one, whereas Analytical (L = 2) sets L to two. 17 Again, all the results in the tables are reported in percentage of the true parameter value.
The results in Table 4 show important biases toward zero for both the probit coefficient and the APE of Y i,t−1 in the two designs. This bias can indeed be substantially larger than the corresponding standard deviation for short panels yielding coverage probabilities below 70% for T = 14. The analytical corrections significantly reduce biases and rmse, bring coverage probabilities close to their nominal level, and have little sensitivity to the trimming parameter L. The jackknife corrections reduce bias but increase dispersion, producing less drastic improvements in rmse and coverage than the analytical corrections. The results for the APE of Z it in Table 5 17 In results not reported for brevity, we find little difference in performance of increasing the trimming parameters to L = 3 and L = 4. These results are available from the authors upon request. Notes: All the entries are in percentage of the true parameter value. 500 repetitions. Data generated from the probit model: N(0, 1/16) and β = 1. In design 1, are similar to the static probit model. There are significant bias and undercoverage of confidence intervals for the coefficient β Z , which are removed by the corrections, whereas there are little bias and undercoverage in the APE. As in the static model, Jackknife performs relatively worse in the second design.

Concluding remarks
In this paper we develop analytical and jackknife corrections for fixed effects estimators of model parameters and APEs in semiparametric nonlinear panel models with additive individual and time effects. Our analysis applies to conditional maximum likelihood estimators with concave log-likelihood functions, and therefore covers logit, probit, ordered probit, ordered logit, Poisson, negative binomial, and Tobit estimators, which are the most popular nonlinear estimators in empirical economics.
We are currently developing similar corrections for nonlinear models with interactive individual and time effects (Chen et al., 2014). Another interesting avenue of future research is to derive higher-order expansions for fixed effects estimators with individual and time effects. These expansions are needed to justify theoretically the validity of alternative corrections based on the leave-one-observation-out panel jackknife method of Hahn and Newey (2004).

Appendix A. Notation and choice of norms
We write A ′ for the transpose of a matrix or vector A. We use 1 n for the n ×n identity matrix, and 1 n for the column vector of length n whose entries are all unity. For square n × n matrices B, C , we use We write wpa1 for ''with probability approaching one'' and wrt for ''with respect to''. All the limits are taken as N, T → ∞ jointly.
As in the main text, we usually suppress the dependence on NT of all the sequences of functions and parameters to lighten the notation, e.g. we write L for L NT and φ for φ NT . Let where ∂ x f denotes the partial derivative of f with respect to x, and additional subscripts denote higher-order partial derivatives.
We refer to the dim φ-vector S(β, φ) as the incidental parameter score, and to the dim φ × dim φ matrix H(β, φ) as the incidental parameter Hessian. We omit the arguments of the functions when they are evaluated at the true parameter values (β 0 , φ 0 ), e.g. H = H (β 0 , φ 0 ). We use a bar to indicate expectations conditional on φ, e.g. ∂ β L = E φ [∂ β L], and a tilde to denote variables in deviations with respect to expectations, e.g. ∂ β  L = ∂ β L − ∂ β L. We use the Euclidean norm ∥.∥ for vectors of dimension dim β, and we use the norm induced by the Euclidean norm for the corresponding matrices and tensors, which we also denote by ∥.∥. For matrices of dimension dim β × dim β this induced norm is the spectral norm. The generalization of the spectral norm to higher order tensors is straightforward, e.g. the induced norm of the dim β × dim β × dim β tensor of third partial derivatives of L(β, φ) wrt β is given by This choice of norm is immaterial for the asymptotic analysis because dim β is fixed with the sample size. Notes: All the entries are in percentage of the true parameter value. 500 repetitions. Data generated from the probit model: N(0, 1/16), β Y = 0.5, and β Z = 1. In design 1, Z it = Z i,t−1 /2+α i +γ t +ν it , ν it ∼ i.i.d. N(0, 1/2), and Z i0 ∼ N(0, 1). In design 2, Z it = 1.5t/T + ν it , and ν it ∼ i.i.d. In contrast, it is important what norms we choose for vectors of dimension dim φ, and their corresponding matrices and tensors, because dim φ is increasing with the sample size. For vectors of dimension dim φ, we use the ℓ q -norm The particular value q = 8 will be chosen later. 19 We use the norms that are induced by the ℓ q -norm for the corresponding matrices and tensors, e.g. the induced q-norm of the dim φ×dim φ×dim φ tensor of third partial derivatives of L(β, φ) 18 We use the letter q instead of p to avoid confusion with the use of p for probability. 19 The main reason not to choose q = ∞ is the assumption ∥  H ∥ q = o P (1) below, which is used to guarantee that ∥H −1 ∥ q is of the same order as ∥H Note that in general the ordering of the indices of the tensor would matter in the definition of this norm, with the first index having a special role. However, since partial derivatives like ∂ φ g φ h φ l L(β, φ) are fully symmetric in the indices g, h, l, the ordering is not important in their case.
For mixed partial derivatives of L(β, φ) wrt β and φ, we use the norm that is induced by the Euclidean norm on dim β-vectors and the q-norm on dim φ-indices, e.g.
where we continue to use the notation ∥.∥ q , even though this is a mixed norm.
Note that for w, x ∈ R dim φ and q ≥ 2, Notes: All the entries are in percentage of the true parameter value. 500 repetitions. Data generated from the probit model: N(0, 1/16), β Y = 0.5, and β Z = 1. In design 1, Z it = Z i,t−1 /2 + α i + γ t + ν it , ν it ∼ i.i.d. N(0, 1/2), and Z i0 ∼ N(0, 1). In design 2, Z it = 1.5t/T + ν it , and ν it ∼ i.i.d. Thus, whenever we bound a scalar product of vectors, matrices and tensors in terms of the above norms we have to account for this additional factor (dim φ) (q−2)/q . For example, For higher-order tensors, we use the notation ∂ φφφ L(β, φ) inside the q-norm ∥.∥ q defined above, while we rely on standard index and matrix notation for all other expressions involving those partial derivatives, e.g. ∂ φφ ′ φ g L(β, φ) is a dim φ × dim φ matrix for every g = 1, . . . , dim φ. Occasionally, e.g. in Assumption B.1(vi), we use the Euclidean norm for dim φ-vectors, and the spectral norm for dim φ × dim φ-matrices, denoted by ∥.∥, and defined as ∥.∥ q with q = 2. Moreover, we employ the matrix infinity norm ∥A∥ ∞ = max i  j |A ij |, and the matrix maximum norm ∥A∥ max = max ij |A ij | to characterize the properties of the inverse of the expected Hessian of the incidental parameters in Appendix D.
For r ≥ 0, we define the sets B(r, , which are closed balls of radius r around the true parameter values β 0 and φ 0 , respectively.

Appendix B. Asymptotic expansions
In this section, we derive asymptotic expansions for the score of the profile objective function, L(β,  φ(β)), and for the fixed effects estimators of the parameters and APEs,  β and  δ. We do not employ the panel structure of the model, nor the particular form of the objective function given in Section 4. Instead, we consider the estimation of an unspecified model based on a sample of size NT and a generic objective function L(β, φ), which depends on the parameter of interest β and the incidental parameter φ. The estimators  φ(β) and  β are defined in (2.7) and (2.8). The proof of all the results in this Section are given in the supplementary material (see Appendix E). We make the following high-level assumptions. These assumptions might appear somewhat abstract, but will be justified by more primitive conditions in the context of panel models.
Let Assumption B.1 hold. Then where U = U (0) + U (1) , and  , The remainder terms of the expansions satisfy sup β∈B(r β ,β 0 ) with R 1 (β) satisfying the same bound as R(β). Thus, the spectral norm bounds in Assumption B.1(vi) for dim φ-vectors, matrices and tensors are only used after separating expectations from deviations of expectations for certain partial derivatives. Otherwise, the derivation of the bounds is purely based on the qnorm for dim φ-vectors, matrices and tensors.
The proofs are given in Section S.3 of the supplementary material. Theorem B.1 characterizes asymptotic expansions for the incidental parameter estimator and the score of the profile objective function in the incidental parameter score S up to quadratic order. The theorem provides bounds on the remainder terms R φ (β) and R(β), which make the expansions applicable to estimators of β that take values within a shrinking r βneighborhood of β 0 wpa1. Given such an r β -consistent estimator  β that solves the first order condition ∂ β L(β,  φ(β)) = 0, we can use the expansion of the profile objective score to obtain an asymptotic expansion for  β. This gives rise to the following corollary of Theorem B.1. Let W ∞ := lim N,T →∞ W .

Expansion for average effects
We invoke the following high-level assumption, which is verified under more primitive conditions for panel data models in the next section.

B.2 hold and let
Remark 9. The expansion of the profile score ∂ β k L(β,  φ(β)) in Theorem B.1 is a special case of the expansion in Theorem B.4, for . Assumption B.2 also exactly matches with the corresponding subset of Assumption B.1.

C.1. Application of general expansion to panel estimators
We now apply the general expansion of Appendix B to the panel fixed effects estimators considered in the main text. For the objective function specified in (2.1) and (4.1), the incidental parameter score evaluated at the true parameter value is The penalty term in the objective function does not contribute to S, because at the true parameter value v ′ φ 0 = 0. The corresponding expected incidental parameter Hessian H is given in (4.2). Appendix D discusses the structure of H and H −1 in more detail. Define and the operator D β ∆ it := ∂ β ∆ it − ∂ π ∆ it Ξ it , which are similar to Ξ it and D β ℓ it in Eq. (4.3).
The following theorem shows that Assumptions 4.1 and 4.2 for the panel model are sufficient for Assumptions B.1 and B.2 for the general expansion, and particularizes the terms of the expansion to the panel estimators. The proof is given in the supplementary material (see Appendix E). (ii) The approximate Hessian and the terms of the score defined in Theorem B.1 can be written as  .
(iii) In addition, let Assumption 4.2 hold. Then, Assumption B.2 is satisfied for the partial effects defined in (2.2). By Theorem B.4,
Analogously to the proof of U (1a,2) = o P (1), one can show that the O P (1/ √ NT ) part of H −1 (αα) has an asymptotically negligible contribution to U (1a,1) . Thus, uniformly over i. Note that both the denominator and the numerator of U (1a,1) i are of order T . For the denominator this is obvious because of the sum over T . For the numerator there are two sums over T , but both ∂ π ℓ iτ and D βπ ℓ it − E φ (D βπ ℓ it ) are mean zero weakly correlated processes, so that their sums are of order √ T . By the WLLN over i (remember that we have cross-sectional independence, conditional on φ, and we assume finite moments), (1) +o P (1).