Inference in Second-Order Identified Models

First-order asymptotic analyses of the Generalized Method of Moments (GMM) estimator and its associated statistics are based on the assumption that the population moment condition identifies the parameter vector both globally and locally at first order. In linear models, global and firstorder local identification are equivalent but in nonlinear models they are not. In certain econometric models of interest, parameters are globally identified but only identified locally at second order. In these scenarios the standard GMM inference techniques based on first-order asymptotics are invalid, see Dovonon and Renault (2013) and Dovonon and Hall (2016). In this paper, we explore how to perform inference in moment condition models that only identify the parameters locally to second order. For inference about the parameters, we consider inference based on conventional Wald and LM statistics, and also the Generalized Anderson Rubin (GAR) statistic (Anderson and Rubin, 1949; Dufour, 1997; Staiger and Stock, 1997; Stock and Wright, 2000) and the KLM statistic (Kleibergen, 2002, 2005). Both the GAR and KLM statistics have been proposed as methods of inference in the presence of weak identification and are known to be “identification robust” in the sense that their limiting distribution is the same under first-order and weak identification. For inference about the model specification, we consider the identification-robust J statistic (Kleibergen, 2005) and the GAR statistic. In each case, we derive the limiting distribution of statistics under both null and local alternative hypotheses. We show that under their respective null hypotheses the GAR, KLM and J statistics have the same limiting distribution as would apply under first-order or weak identification, thus showing their identification robustness extends to second-order identification. We explore the power properties in detail in two empirically relevant models with second-order identification. In the panel autoregressive (AR) model of order one, our analysis indicates that the Wald test of whether the AR parameter is one has superior power to the corresponding GAR test which, in turn, dominates the KLM and LM tests. For the conditionally heteroskedastic factor model, we compare Kleibergen’s (2005) J and the GAR statistics to Hansen’s (1982) overidentifying restrictions test (previously analyzed in this context by Dovonon and Renault, 2013) and find the power ranking depends on the sample size. Collectively, our results suggest that tests with meaningful power can be conducted in second-order identified models.


Introduction
Generalized Method of Moments (GMM) is a popular method for estimating the parameters of econometric models based on the information in population moment conditions. In his seminal article introducing GMM, Hansen (1982) proves the consistency of the estimator and provides a framework for inference based on first-order asymptotic statistical arguments. This original framework includes confidence intervals for the parameters and the overidentifying restrictions statistic that can be used to test the model specification, and it has been subsequently extended to a wide variety of inference procedures, similarly based on first-order asymptotic arguments. However, the statistical arguments that justify these inference techniques are predicated on certain regularity conditions among which are the assumptions that the population moment condition is valid and identifies the parameters both globally and also locally at first order.
Over the last 25 years, there has been a growing awareness that this first-order asymptotic theory may provide a poor approximation to the finite sample behaviour of GMM-based statistics in finite samples. Attention has focussed primarily on cases where the assumed identification conditions fail or are close to failure. To derive alternative approximations to the behaviour of GMM-based statistics under this scenario, Staiger and Stock (1997) introduced the concept of weak identification. Within this framework, parameters are globally and first-order locally identified in finite samples but the information provided by the population moment declines (at a prescribed rate) as the sample size increases resulting in the parameters being globally unidentified in the limit. Under weak identification, the large sample properties of the conventional GMM-based statistics are different from those derived in Hansen's (1982) analysis, see Staiger and Stock (1997) and Stock and Wright (2000). Furthermore, once the possibility of weak identification is admitted, the conventional approach to constructing confidence intervals based on GMM estimators -"estimator plus/minus a multiple of the standard error" -is invalid, see Dufour (1997). This has led to a focus on inferences based on so-called "identification robust" statistics whose distribution is invariant to the quality of the identification. Leading examples of such statistics are the generalized Anderson-Rubin (GAR) statistic (Anderson and Rubin, 1949;Dufour, 1997;Staiger and Stock, 1997;Stock and Wright, 2000), the KLM statistic (Kleibergen, 2002(Kleibergen, , 2005, the J statistic (Kleibergen, 2005), and the conditional likelihood ratio statistic ( Moreira, 2003;Kleibergen, 2005). In each case, inferences are performed by inverting the statistic in question to calculate parameter values consistent with the null hypothesis at the chosen level of confidence/significance. However, weak identification and its variants are not the only way in which first order local identification can fail. 1 In linear models, first-order local and global identification are the same, but in nonlinear models, they are not: identification can fail at first order locally but hold at a higher order. In this paper, we focus on the case where parameters are globally identified, identification fails locally at first order but holds at second order. This pattern of identification has been shown to arise in a number of situations in statistics and econometrics such as: ML for skew-normal distributions, Azzalini (2005); ML for binary response models based on skew-normal distributions, Stingo, Stanghellini, and Capobianco (2011); ML for missing not at random (MNAR) models, Jansen and et al (2006); GMM estimation of conditionally heteroskedastic factor models, Renault (2009, 2013); GMM estimation of panel data models using second moments, Madsen (2009), Bun and Kleibergen (2016); ML estimation of panel data models, Kruiniger (2014).
Within this second-order identification framework, GMM estimators are consistent but the limiting distribution of statistics based on the estimator is both different from its first-order asymptotic counterpart and also sensitive to the nature of the first-order identification failure. Local identification relates to the behaviour of the population moment condition as the parameter moves away from the true value. First order identification can fail in some or all directions, and the large sample behaviour of GMM-based statistics is sensitive to the number of directions in which local identification is at second order and not first order. For the case where first order identification only fails in one direction, the limiting distribution of the GMM estimator has been characterized by Dovonon and Hall (2016), extending earlier results by Sargan (1983) and Rotnitzky, Cox, Bottai, and Robins (2000) for estimators obtained respectively by IV in a nonlinear in parameters model and Maximum Likelihood. Renault (2009, 2013) derive the limiting distribution of the overidentifying restrictions statistic for an arbitrary number of directions in which local identification is at second and not first order.
In this paper, we study the power of commonly used test procedures when the parameter of interest is only locally second order identified. We analyze tests on the value of the parameter itself and the specification of the moment function. To conduct tests on the parameter of interest, we employ the traditional Wald and Lagrange multiplier (LM) statistics as well as the identification robust GAR and KLM statistics. For tests on the specification of the moment function, we use the GAR statistic and Kleibergen's (2005) J statistic (hereafter denoted as the K-J statistic). For each type of test, we define the appropriate local alternatives and derive the limiting distributions of all tests under both null and local alternatives. We also illustrate the power properties of the tests in two empirically relevant models: the panel autoregressive model of order one and the conditionally heteroskedastic factor model. For the panel data model, it is well known that the autoregressive parameter is plagued by identification issues if the autoregressive parameter is one. Bun and Kleibergen (2016) construct a specific moment equation which second order identifies the autoregressive parameter at this value. For the conditionally heteroskedastic factor model, Dovonon and Renault (2013) establish that the parameters are second-order identified by a moment condition used as a basis for testing for a common factor structure. Because of the second order identification, GMM estimators have a quartic root convergence rate and so we observe a very slow convergence of the finite sample distributions of the tests towards their limiting distributions under local alternatives. We therefore focus on the finite sample distributions of the tests for varying numbers of observations. For the panel autoregressive model, the Wald statistic has a surprising amount of discriminatory power and dominates the other tests, although the GAR statistic exhibits comparable power in large samples. The powers of the KLM and LM statistics are much less than that of the GAR statistic which is explained by the second-order identification. Because of it, the parameter of interest is not well identified and it is known that the GAR statistic compares favorably to the KLM statistic in such settings in terms of power. For the conditionally heteroskedastic factor model, we compare the power properties of K-J and the GAR tests with those of Hansen's (1982) overidentifying restrictions test, previously analyzed in this context by Dovonon and Renault (2013). Our results indicate that the power ranking is sensitive to the sample size: in small to moderate sample sizes the K-J test dominates the other two, which have comparable power; but in large sample sizes this ranking is reversed.
The paper is organized as follows. In the second section, we set up notation, introduce the concept of second order identification and the two running examples of the panel autoregressive model and the conditionally heteroskedastic factor model. In the third section, we introduce the different test statistics and their limiting distributions under the null hypothesis. In the fourth section, we discuss these distributions under appropriate local alternatives. The fifth section explores the finite sample power properties of the tests. Finally the sixth section concludes. All proofs are relegated to a mathematical appendix.

Second-order identification: definition and examples
Suppose it is desired to estimate a parameter vector θ 0 ∈ Θ ⊂ R p that indexes an econometric model. This model may explain behaviour of individual economic agents in a population and so be estimated from a random sample from that population or the model may explain the behaviour of economic variables over time and be estimated from time series data. Second-order identification can arise in either case, as demonstrated by our two examples below, and our results apply equally in both scenarios. However, certain definitions are different in the two cases. For ease of presentation, we first describe GMM estimations for the case where the data are obtained from a random sample, and then briefly note how those definitions need to be adapted for time series in footnote 3 below.
To this end, let X denote a random vector with probability distribution P and sample space X modeling the variables in the econometric model. We consider the case where this model implies the following population moment condition: where f : X × Θ → R k is twice continuously differentiable in θ almost everywhere and k ≥ p. Associated with this population moment condition is a matrix G(θ 0 ) known as the Jacobian and defined via: . . , N } be a random sample of observations for X, and define the sample moment function to bef Following Hansen (1982), we define a GMM estimator of θ 0 based on (1) as: where W N is k×k weighting matrix that converges in probability to W , a symmetric positive definite matrix W . As emphasized by the notation, the GMM estimator depends on the choice of weighting matrix. Hansen (1982) shows that the optimal choice of weighting matrix is one that satisfies , assumed nonsingular throughout. This optimal choice is implemented via a two-step procedure in which a first-step GMM estimation is used to obtain a preliminary -"first-step GMM" -estimator,θ 1,s =θ(W N ), based on a sub-optimal choice of W N . This first-step GMM estimator is used to construct a consistent estimator of V ar[f(X, θ 0 )], the inverse of which is used as weighting matrix on a second-step estimation. Defininĝ Within this framework, two statistics are naturally of interest:θ N and the overidentifying restrictions test statistic Q(θ N ,θ 1,s ). The former is the basis for inference about θ 0 and the latter can be used to assess if the data are consistent with (1) being true in the population, often thought of as a test of the model specification. 2 Hansen (1982) establishes the limiting properties of both these statistics under a set of regularity conditions. 3 Specifically, he shows thatθ N is consistent for For our purposes here, it suffices to highlight three of these regularity conditions. To this end, it is useful to condense our notation and write m(θ) = E [f(X, θ)]. The aforementioned three conditions are then: (i) m(θ 0 ) = 0 so that the estimation is based on valid information; (ii) m(θ) = 0 for allθ = θ 0 so that θ 0 is globally identified; (iii rank{G(θ 0 )} = p so that θ 0 is first-order locally identified. 4 Of these three, the consistency of the GMM estimator only requires (i) and (ii) to hold; but the distributional results in (4) and (5) require all three conditions to hold.
As noted in the introduction, first-order local identification is not a necessary condition for global identification in nonlinear models. In this paper we focus on the case where first-order local identification fails but the parameters are identified at second order. To formally introduce this scenario, we let where f s (X, θ) is the s-th element of f(X, θ). The following assumption defines the identification configuration maintained throughout our analysis.
Assumption 1. (a) ∀θ ∈ Θ, m(θ) = 0 ⇔ θ = θ 0 ; (b) For all u in the range of G(θ 0 ) and all v in the null space of G(θ 0 ), Assumption 1(a) combines conditions (i) and (ii) above, and provides the necessary and sufficient identification condition for consistent estimation of θ 0 . Assumption 1(b) is the second-order local identification condition introduced by Dovonon and Renault (2009). This is a sufficient condition for local identification that extends the standard first-order local identification (property (iii) above). If rank {G(θ 0 )} = p, then the null space of G(θ 0 ) is the null vector and Assumption 1(b) holds 2 Although some caution needs to be exercised in interpreting the outcome of this test, see Newey (1985) and Hall (2005) 3 If the model involves (stationary ergodic) time series then X is replaced by Xt in (1) with t denoting the time index, and replacing i in the definitions above. In this case the optimal choice of weighting matrix is i andV f f (θ) by a member of the class of Heteroskedasticty Autocorrelation Covariance (HAC) estimators, for example see Andrews (1991). 4 Sometimes referred to as the rank condition for identification.
trivially. If G(θ 0 ) is rank deficient, this assumption ensures that the direction of the parameter that belongs to the range of G(θ 0 ) is identified by the first order approximation of the moment function whereas the direction in the null space of the Jacobian is identified by the second-order approximation. In the extreme case where G(θ 0 ) = 0, the whole parameter vector is identified by the second-order terms in the expansion of the moment function. Dovonon and Renault (2009) establish that the components of the GMM estimator in the direction of the range of G(θ 0 ) have the standard rate of convergence ( √ N ) while the components in the direction of the null space of G(θ 0 ) have a non-standard rate of convergence (N 1/4 ) and those rates are sharp. It is thus evident that the distributional result in (4) does not apply if local identification holds at second but not first order, and Dovonon and Renault (2013) show that (5) is similarly invalid. We return to this issue in the next section. To conclude this section, we consider two examples where first-order identification fails but Assumption 1 holds.

Panel data example
Consider the first-order linear dynamic panel data model where c i denotes the (unobserved) fixed effect, T equals the number of time periods and N equals the number of cross section observations. The assumptions commonly used to identify the parameters of this model are that the error terms are independently distributed from each other and the fixed effect so that Based on these assumptions, different moment functions have been proposed to identify the autoregressive parameter of which the most commonly used are, perhaps, those proposed by Anderson and Hsiao (1981), Arellano and Bond (1991), Ahn and Schmidt (1995) and Blundell and Bond (1998). All these moment conditions have difficulty identifying the autoregressive parameter when its true value is close to one and the variance of the initial observations and/or fixed effects becomes large, see Bun and Kleibergen (2016). Bun and Kleibergen (2016) show that a non-linear combination of these moment conditions does, however, identify the autoregressive parameter in such settings. This non-linear combination leads to so-called robust moments that do not depend on the initial observations and fixed effects. Bun and Kleibergen (2016) show that for T = 4 the specification of the sample moment function associated with these robust moments is: Under the assumptions above, the expectation of these terms is given by: with σ 2 t = E[u 2 it ]. If we assume mean-stationarity 5 -so that E[(c i − (1 − θ)y i1 ) 2 ] = 0 -and the errors are homoskedastic -σ 2 t = σ 2 -then these expected values simplify to (10) From (8) and (10), it follows that if θ 0 = 1 then: where we have emphasized the dimensions of the null vectors for clarity. It can be seen from (11) that if θ 0 = 1 then this model is not first-order locally identified but satisfies Assumption 1 and so is second-order locally identified. In our subsequent analysis of this model, we focus on the inference about whether or not θ 0 = 1.

Conditionally heteroskedastic factor models
Conditionally heteroskedastic factor (CHF) models are widely used to study the volatility of financial asset returns. 6 Within this approach, the volatility of a vector of assets is assumed to derive from two sources: a latent common factor that exhibits conditional variation and an idiosyncratic component that is conditionally homoskedastic. In practice, the number of latent factors is assumed to be smaller than the number of assets and thus the CHF model provides a relatively parsimonious way of capturing the conditional variances and covariances of the assets. Before basing inferences on the model, it is important to assess whether the sample covariance structure is consistent with this type of specification. Engle and Kozicki (1993) propose a general methodology for testing for common features in economic time series based on the GMM overidentifying restrictions test, and propose using it to test the valdity of the CHF model. However, they base their decision rule on standard first-order asymptotic behaviour of the overidentifying restrictions test. Dovonon and Renault (2013) show that this theory is invalid in this case because the moment condition in question only identifies the parameters locally to second order.
To elaborate, consider the following CHF model for the p × 1 vector of asset returns Y t+1 : where D t is a L × L diagonal matrix with th diagonal element equal to σ 2 ,t for = 1, 2, . . . , L, Λ is a p × L matrix, and Ω is a p × p symmetric positive semi-definite matrix. The stochastic processes {Y t } t≥0 and σ 2 ,t 1≤ ≤L,t≥0 are adapted with respect to the increasing filtration {F t } t≥0 . It is assumed that rank(Λ) = L and V ar[σ 2 ,t ] > 0 for all = 1, 2, . . . , L. If L < p then the factors can be viewed as "common features" in the sense that there are fewer sources of conditional variation than the number of assets. Engle and Kozicki's (1993) test for common features can be motivated as follows. If L < p then there exists θ 0 = 0 such that E[(θ 0 Y t+1 ) 2 | F t ] = µ, for some constant µ, and so for any k × 1 vector z t ∈ F t , with k > p, θ 0 satisfies m(θ 0 ) = 0 where and Clearly (14) only identifies θ up to some normalizing constant, and so in practice some normalization needs to be adopted. However for our purposes here, we can sidestep this issue. 7 The population moment condition in (14) can be used as a basis for estimation of θ 0 , and the existence of the common feature can be tested by testing whether (14) holds using the overidentifying restrictions statistic. However, the population moment condition in (14) does not locally identify θ 0 at first order. Dovonon and Renault (2013) show that and that under the assumptions above, Therefore, G(θ 0 ) is the null matrix by construction under the null hypothesis of the test. However, θ 0 is second-order locally identified under plausible conditions because where C s is the L × L diagonal matrix with th main diagonal element equal to Cov[z s,t , σ 2 ,t ]. Dovonon and Renault (2013) argue this rank condition can be ensured by picking a sufficiently broad group of instruments z t such that at least one instrument is correlated with every possible linear combination of the volatilities σ 2 ,t . 8 Finally, we emphasize that in this model, the value of θ 0 is not of primary interest: the key issue is whether m(θ 0 ) = 0.

Test statistics and their limiting distributions under their null hypotheses
In this section, we consider methods for testing two types of hypotheses in models that satisfy Assumption 1. In the first type, the null hypothesis takes the form: H 0 : θ 0 = θ * . Notice that 7 See Dovonon and Renault (2013) for further discussion and also Section 5.2 for an example. 8 Specifically, they assume rank{Cov [zt , dt] under this H 0 the value of θ 0 is completely specified. In the second type of hypothesis, the null takes the form H 0 : m(θ 0 ) = 0; tests of this hypothesis are often interpreted as tests of whether the model specification is correct. We first present all the test statistics and then provide their limiting distributions under their respective null hypotheses.

Test statistics and their null hypotheses
To present the statistics, we introduce the following notation: Test statistics for H 0 : θ 0 = θ * : Newey and West (1987) propose a number of statistics for testing whether θ 0 satisfies a set of nonlinear restrictions based on GMM estimators. Here we consider two: the Wald and Lagrange Multiplier (LM) statistics. Specializing to our null hypothesis, the Wald statistic is: and the LM statistic is, Under certain regularity conditions which include global identification and first-order local identification, Newey and West (1987) show that the Wald and LM statistics both converge to a χ 2 ρ where ρ is the number of restrictions which is p in our case here. Kleibergen (2005) introduces a modified version of the LM statistic: . Kleibergen (2005) shows that KLM (θ * ) converges to a χ 2 p distribution under H 0 regardless of whether θ 0 is first order locally identified or weakly identified. Stock and Wright (2000) propose using the GAR statistic: 9 Stock and Wright (2000) show that GAR(θ * ) converges to χ 2 k distribution under H 0 regardless of whether θ 0 is first order locally identified or weakly identified. However, the implicit null of the GAR statistic is larger than H 0 : θ 0 = θ * as we discuss below.
Test statistics for H 0 : m(θ 0 ) = 0: Kleibergen (2005) proposes testing this null using the statistic where Kleibergen (2005) shows that under H 0 the limiting distribution of J(θ 0 ) is χ 2 k−p irrespective of whether θ 0 is first-order locally or weakly identified. The test is performed by searching to see if there are any values of θ 0 for which J(θ 0 ) is less than the appropriate critical value.
As noted by Kleibergen (2005), and so the GMM-AR can be viewed as a joint test of θ 0 = θ * and m(θ 0 ) = 0.

Limiting distributions under the null
For our analysis of both types of statistics, the structure of the Jacobian is important. We define r = rank{G(θ 0 )}. Since our focus is on cases where θ 0 is globally identified and only locally identified at second order, we assume r < p and that the model satisfies Assumption 1. Note that The matrices R 1 and R 2 are key to our analysis below because they give respectively the directions of possible fast convergence estimation and the directions of slower convergence estimation. If r = 0 (as in the CHF example) then we set R = R 2 = I p and R 1 = 0. In the subsequent analysis, we set D = G(θ 0 )R 1 . We also impose the following conditions.
Let N denote an -neighbourhood of θ 0 .
Assumption 4 is a high-level condition that can apply whether the model involves random vector X or a time series process X t , in the latter case V is the long run variance of the relevant random vector.
Under Assumptions 1(a) and certain other regularity conditions,θ 1,s andθ N are consistent. Since this is not the focus of our analysis, we do not document the required conditions here, and instead adopt the following high-level assumption. 10 We now present the limiting distributions of the test statistics presented in Section 3.1.
Test statistics for H 0 : θ 0 = θ * : For the Wald statistic, we consider only the case where r = p − 1 because to our knowledge this is the only case for which the limiting distribution of the GMM estimator is tractable. For what follows, it is useful to introduce the following additional notation: Theorem 1. If Assumptions 1-5 hold, r = p − 1 and θ 0 = θ * then and: S 1 ∼ N (0, I k ), S ∼ N (0, 1), S 1 and S are independent and I(·) is the usual indicator function.
The limiting distribution is evidently non-standard, reflecting the non-standard behaviour of the GMM estimator in this case (see Dovonon and Hall (2016)[Theorem 1]). Although non-standard this distribution can easily be simulated, along similar lines to the method proposed for simulating the distribution of the GMM estimator in Dovonon and Hall (2016). 11 In the special case when r = 0 and p = 1 then the distribution simplifies. In this case, we set D = 0, P = 0 and B = (H s (θ 0 )) 1≤s≤k , and the distribution of the Wald test is as follows.
Corollary 1. If the conditions of Theorem 1 hold and in addition r = 0 and p = 1 Corollary 1 provides the limiting distribution of the Wald test for the test of H 0 : θ 0 = 1 in our panel data example in Section 2.1. Notice that this limiting distribution involves a point mass of 0.5 for the event W ald N (θ * ) = 0. We can use our panel data example to provide some intuition for why the distribution takes the form it does. In this setting, the Wald statistic is: Using a Mean Value expansion ofq N (θ) aroundq N (1), it can be shown that 12 where (1) and set e = V −1 1,1 then it is shown in the mathematical appendix that, under H 0 , the first order conditions of the GMM estimation imply that ζ satisfies the following condition: If S > 0 then there is no real value of ζ that can set the term in parentheses to zero, and so the solution must be ζ = 0. However, if S < 0 then ζ 2 = 1 e 1/2 σ 2 |S|, sets the term in parentheses to zero. Thus, we have Using (29) in (27), it follows that The Wald test principle is based on testing whether the unrestricted estimator satisfies the restrictions in question. In contrast, the test principles behind the LM, KLM and GAR statistics are based on the restricted model. In our case here, the null hypothesis completely specifies the value of θ 0 and so calculation of these statistics does not involve a GMM estimation per se. Therefore, while our analysis assumes identification fails locally at first order in an arbitrary number of directions, it does not require the parameters to be locally identified at second order -although the results still hold if that is the case.
The following theorem gives the limiting distribution of the LM statistic in (20).
Theorem 2 gives the asymptotic distribution of the LM statistic under H 0 when the first order local identification condition is violated. Only in the special case where √ Nq N (θ 0 )R 2 and √ Nf N (θ 0 ) are asymptotically uncorrelated (and hence independent) is this distribution χ 2 p and so the same as would be the case if θ 0 is identified locally at first order. A comparison of Theorems 1 and 2 indicates that the limiting distributions of the Wald and LM statistics are different if identification fails locally at first order but holds at second order. In contrast, Newey and West (1987) show the two statistics are asymptotically equivalent under the null when θ 0 is first order locally identified.
The following theorem gives the limiting distributions of the KLM and GAR statistics in (21) and (23) respectively. We first introduce some notation. Letψ q be the k × p matrix with its (l, m)-entry given byψ as defined in Assumption 4. We have: From Theorem 3 it follows that the limiting distributions of the KLM and GAR statistics under second-order local identification are the same as under first-order local identification and weak identification. Therefore both statistics are robust to all three forms of identification.
From Theorem 4 it follows that the limiting distribution of the K-J statistic under second-order local identification is the same as under first-order local identification and weak identification, and so it is robust to all three forms of identification. This contrasts with Hansen's (1982) overidentifying restrictions test statistic which Dovonon and Renault (2013) show converges in distribution to a mixture of χ 2 k−q , q = 0, 1, . . ., p, distributions if θ 0 is only locally identified at second order. The limiting distribution of the GAR(θ 0 ) follows trivially from the asymptotic normality of √ Nf N (θ 0 ).

The large sample behaviour of the test statistics under local alternatives
In this section, we explore the local power properties of the tests. To this end, we index the data generation process by N and so now replace X by X N . The distribution of X N is denoted by P N and this distribution implies the population moment condition where E N [ · ] denotes expectation under P N , {θ N } is a sequence of parameter values and {µ N } is a sequence of k × 1 vectors. It is assumed that as N → ∞ the following all hold: P N → P , θ N → θ 0 and µ N → 0 k×1 . Recall that P is the probability distribution of X in Section 2, and so the limit process satisfies the population moment condition (1). As in Section 2, it is assumed further that under P , θ 0 is identified locally at second order.
To analyze the behaviour of the tests under local alternatives, we must also modify certain of the assumptions. To this end, we introduce the following definitions: We replace Assumption 3 by the following condition.
converge uniformly (in probability P N for the former) to H(θ).
We must also modify our assumptions about the behaviour of the Jacobian. It is worth mentioning that, even if the rank property of the Jacobian at θ 0 under P (the data distribution under the null) is known, this does not necessarily imply the rank property under θ N because of the lack of continuity of the rank function.
is the nonsingular p × p matrix partitioned into r and (p − r)-column matrices R 1 and R 2 as defined by (25). D is a p × r matrix of rank r, A is a k × p − r matrix and ξ > 0.
Under this assumption, the Jacobian is local to zero in the directions of the parameter that are identified locally only at the second order. The specific choice of ξ likely depends on the model in question. We show below that ξ = 1/2 is the appropriate choice in both our examples in Section 2. For our analysis of tests of H 0 : θ 0 = θ * , we restrict ξ > 1/4 to ensure that the drift in the Jacobian decreases faster than the rate of convergence of the second order identified parameters. Such a restriction is particularly useful to derive the asymptotic distribution of the Wald test statistic. Finally, we replace Assumption 4 by the following condition.
under P N , with V given in Assumption 4. For this null hypothesis, the natural sequence of local is given by (31) with µ N = 0 for all N . In this case, the population moment condition is satisfied at a different parameter value for each N that is, To explicitly define the sequence of parameter θ N under the local alternative, we take into account the rate of convergence of estimators under the null. Under the second-order identification condition, we know that the directions of the parameters that are identified at the first order are estimated at the standard √ N -rate whereas the directions that are identified only at the second order are estimated at a slower N 1/4 -rate. In particular, considering R as defined by Equation (25), we know that the first r components of R −1 θ are estimated at √ N -rate whereas the remaining components are estimated at the N 1/4 -rate. In the light of this, we define θ N such that: where the first r and the last (p − r) components of e N ∈ R p , denoted respectively e N,1 and e N,2 are such that: with e 1 and e 2 are nonzero vectors of size r and p − r, respectively. Before presenting the limiting distributions of our test statistics, it is instructive to use our panel data example to motivate the behaviour of the Jacobian specified in Assumption 7. Recall from Section 2.1 that θ is a scalar and is only locally identified at second order. Therefore, in view of the remarks in the preceding paragraph, we set θ N = 1 − c 2 4 √ N . In this case, it can be shown that 13 This setting is covered by Assumption 7 with ξ = 1/2 and A = −(σ 2 c 2 /4)[1, 0] .
For the case in which r = 0 and p = 1, this results specializes as follows.
Corollary 2. If r = 0 and p = 1, then where S ∼ N (0, 1), X(s) 2 = −(2/a)SI(S ≤ 0), a = G G , and W 0 (S) is given in Corollary 1 Notice that in this case the power against local alternatives is capped at 0.5 asymptotically; we return to this issue in Section 5.1.
To present the limiting behaviour of the LM, KLM and GAR tests, we define C(θ) to be the k × p 2 matrix: Theorem 6. If Assumptions 1(b), 2, 6-8 (with µ N = 0 and ξ > 1/4) hold, and θ 0 = θ * then: (a) If the k × p matrix Q(e 2 ) defined by: is full column rank, then: This theorem shows that the LM and KLM statistics have the same limiting distribution under this sequence of local alternatives. Since λ θ > 0, it follows automatically from Theorems 3 and 6 and the properties of the chi-squared distribution that both the KLM and GAR statistics have non-trivial power against this alternative and also that the KLM statistic is the more powerful. The relative performance of the LM statistic is less clear. Theorem 2 indicates that in general the LM statistic has a non-standard limiting distribution under the null, but does have the (standard) limiting χ 2 p distribution in the special case where √ Nq N (θ 0 )R 2 and √ Nf N (θ 0 ) are asymptotically independent. In the former case, it is not possible to make a power comparison with the KLM and GAR statistics analytically. It is worth noting that the differences in the distributions of the LM statistic under null and local alternative can be rationalized as follows. Under the null, the large sample behaviour of LM (θ * ) depends on √ Nq N (θ 0 )R 2 which is random in the limit, and may or (most likely) may not be asymptotically independent of √ Nf N (θ 0 ). Under the local alternative, the large sample behaviour of LM (θ * ) depends on N 1/4q N (θ 0 )R 2 which converges in probability to a constant, and so is trivially independent of √ Nf N (θ 0 ).

Local power of tests of H
For this null hypothesis, the natural sequence of local is given by (31) However, as noted above, the appropriate choice of ξ in (35) depends on the model in question. To illustrate, we consider the CHF model in Section 2.2 with two assets.
Under the alternative of no-common conditionally heteroskedastic factors structure, each asset brings a specific dimension for conditional heteroskedasticity so that two factors are present. The volatility factor model in (13) can then be written as: A natural way to create a local alternative to a single common factor is to assume that the return process is generated for a given sample size N from a probability distribution P N such that, as N → ∞, λ 2,N → 0. Therefore, the common conditionally heteroskedastic factor structure holds in the limit but not in finite samples. Let θ 0 be the co-feature vector associated to the limit model. Then θ 0 λ 1 = 0 and under P N , we have: 14 where Cov[ ·, · ] here denotes the covariance operator relative to P N . Suppose now that λ 2N = λ/N δ , with λ ∈ R 2 . The right hand side of (36) may be of order O N −2δ so long as θ 0 λ 2,N = 0 and Cov[σ 2 2,t , z t ] = 0. However, the order of magnitude of this latter term depends on that of λ 2,N through the choice of the vector of instruments z t . The most common choice of instruments is z t = vech(Y t−τ Y t−τ ) : τ = 0, . . . , h , for some h ∈ N. To simplify, let us consider z t = (Y 2 1t , Y 2 2t ) . Under certain commonly invoked assumptions about the asset return process, it is shown in the mathematical appendix that: 15 14 See mathematical appendix 15 Note that due to the necessary normalization only one element of θ has to be estimated; see discussion in Section 2.2.
While the √ N -rate for the drifting sequence in (35) is convenient to obtain a non-trivial behaviour of the test statistics of interest under local alternatives as we shall see, the following result allows for the Jacobian of the moment function at θ 0 under P N to converge to 0 in some directions at any rate N ξ , ξ > 0. To derive the asymptotic distribution of the specification test statistics J(θ 0 ) and GAR(θ 0 ) under local alternatives, we introduce some notation.
Letψ a q be the k × p matrix with its (l, m)-entry given bŷ Letting M denote the column span of M , We have: Theorem 7. (i) Assume that G N (θ 0 ) → G(θ 0 ) as N → ∞ and rank(G(θ 0 )) = r < p. If Assumptions 7 and 8 (with θ N = θ 0 , µ N = c/N 1/2 , c ∈ R k ) hold,V 2f (θ 0 ),V ff (θ 0 ) andq N (θ 0 )R 1 converge in probability (under P N ) to V 2f (θ 0 ), V ff (θ 0 ) and D, respectively,ψ a q is full column rank with probability one and P (c ∈ ψa q ) = 0, then: The first part of this theorem shows that the K-J statistic is asymptotically distributed as a noncentral chi-squared with k − p degrees of freedom and non-centrality λ m which is random if ξ ≥ 0.5. The randomness of λ m stems from the fact that the estimated Jacobian matrix of the estimating function in the parameter directions that are not (locally) identified at first order is asymptotically random. This non-centrality parameter is almost surely positive and therefore warrants non trivial power for the test under local alternatives if the drift parameter c does not fall into the column-span of the limiting distribution of the Jacobian with positive probability. The second part of the theorem establishes that the GAR test also has non trivial power against local alternatives since ν > 0 so long as c = 0. A power ranking of the two tests is possible if V ff (θ 0 ) −1/2 c is an element of the orthogonal complement of V ff (θ 0 ) −1/2ψa q almost surely then λ m = ν and so the K-J statistic is unambiguously more powerful than the GAR statistic against the local alternative considered here.

Simulation evidence
In this section we explore the finite sample power properties of the tests analyzed in Section 3 and 4. Section 5.1 explores the power properties of the Wald, LM, KLM and GAR statistics for testing H 0 : θ 0 = 1 in the panel data example in Section 2.1. Section 5.2 explores the power properties of the K-J and GAR statistics for testing H 0 : m(θ 0 ) in the CHF model in Section 2.2, and also compares their properties to those of Hansen's (1982) overidentifying restrictions statistic.

Testing for a unit root in the panel data model
We study inference on the autoregressive parameter of a panel autoregressive model of order one identified by the moment conditions from Section 2.1 under local alternatives to θ 0 = 1, the point of second order identification. We specify the local alternative as with c > 0. Recall from the discussion following Corollary 1 that under the null hypothesis that θ 0 = 1, the first order conditions imply a solution N 1/4 (θ N − 1) if S ≥ 0 and a solution for N 1/2 (θ N − 1) 2 if S < 0. Under P N , the situation becomes more complicated. In this case, if S ≥ 0 then the first order conditions imply a solution for N 1/4 (θ N − 1), but if S < 0 then N 1/4 (θ N − 1) satisfies a quadratic equation the roots of which do not imply a unique value of N 1/2 (θ N − 1) 2 . Here we consider the local power curve implied by choosing the smallest root of the aforementioned quadratic equation as this maximizes N 1/2 (θ N − 1) 2 and hence the limiting value of the Wald statistic, making it the root with the largest asymptotic power. Let Wald * N (1) denote the Wald statistic evaluated at the solution for θ just described. It is shown in the mathematical appendix that under the local alternative in (39) the distribution of Wald * N (1) is given by: with S a standard normal random variable and As noted in Section 4.1, the maximal local asymptotic power is 50%. We therefore compute the rejection frequency of H 0 under local alternatives for different sample sizes to determine if they are also at most 50%. Figure 1 shows the distribution of the Wald statistic for different sample sizes as a function of the localizing parameter c. It uses 10 4 simulations and a value of σ 2 equal to one with normal errors.
The local power curves of the Wald statistic in Figure 1 show that the finite sample discriminatory power can be much larger than 50% and even equal to one. Figure 1 also shows that the power curves slowly move to the right when the sample size increases so they might eventually coincide with the asymptotic local power curve from Corollary 2. This moderate convergence of the finite sample distributions of the Wald statistic results from the quartic root convergence rate. Interestingly, the convergence towards the limiting distribution when the null hypothesis holds is much faster since we do not observe any size distortions. The power curves are all very similar and show that the Wald statistic has adequate power at small sample sizes. This can be further inferred from the values of θ when the drifting parameter c equals two. The power then exceeds 50%. A value of c equal to two corresponds with a value of θ of 0.6239 (N = 50), 0.6838 (100), 0.7885 (500), 0.8222 (1000), 0.8811 (5000), 0.9000 (10000) and 0.9159 (20000). This suggests that -as emphasized by the name -the local power results are only a guide to behaviour in a small neighbourhood around the null hypothesis value.
Specializing Theorem 6 to the model here, it follows that the KLM and LM statistics both converge to the χ 2 1 (λ θ ) distribution, and the GAR statistic converges χ 2 2 (λ θ ) distribution. In the mathematical appendix, it is shown that the non-centrality parameter is given by:  Figure 5 shows local power curves of the GAR and Wald tests. The power curves of the GAR, KLM and LM tests all move to the left when the number of observations increases. It shows again the slow convergence rates of the statistics towards their limiting distributions under the local alternative. All statistics are size correct under the null hypothesis where their limiting distributions are standard χ 2 1 or χ 2 2 , in case of the GAR statistic, distributions. The power curve of the GAR statistic shows that it has decent power while the power of the KLM and LM statistics only becomes reasonable when there are many observations. This is unlike the power of the Wald statistic which already has adequate power for small numbers of observations. It is interesting to relate the behaviour of the KLM and GAR statistics to previous analyses of the these tests in other identification scenarios. If identification is weak then it has been found that the KLM statistic is size correct but has low power, and the GAR statistic is both size correct and also has good power compared to other weak identification robust procedures, see e.g. Andrews, Moreira, and Stock (2006) and Kleibergen (2005). However, if identification is strong then the KLM test dominates. Therefore, the relative performance of the KLM and GAR tests under second-order identification is more in line with what has been observed under weak identification.
To our reading, the most striking feature of these results is the superior performance of the Wald test as further reflected by Figure 5. It not only dominates the others but exhibits reasonable power as a test for a unit root in this model. These results also show an advantage to basing inference about a unit root value of the AR parameter on the moment conditions in Bun and Kleibergen (2016) as opposed to more popular choices of moments such as those proposed by Arellano and Bond (1991) or Blundell and Bond (1998) with which identification either fails or is problematic at θ 0 = 1.

Testing for common conditionally heteroskedastic factors
In this section, we explore the finite sample performance of the K-J statistic under the null of correct model specification and under local alternatives. We also consider the Hansen-Sargan's overidentification test (HS-J test, hereafter) and the GAR test. Example 2 on conditionally heteroskedastic factor models offers a suitable framework for this investigation. We consider a bi-variate vector Y t of two asset return processes with the representation where Λ N is the 2 × 2 matrix of factor loadings, F t+1 is the bivariate vector of conditionally heteroskedastic and mutually independent factors and U t+1 , the bivariate vector of idiosyncratic shocks. We let U t+1 ∼ i.i.d.N (0, 0.5I 2 ), where I 2 denotes the identity matrix of size 2. The generic component f t+1 of F t+1 follows a Gaussian-GARCH model, N (0, 1).
The processes ε t and U t are mutually independent and independent of {F τ , Y τ : τ ≤ t}. We set (ω, α, β) = (0.2, 0.2, 0.6) and ( c = 0 corresponds to the null hypothesis of the existence of a common conditionally heteroskedastic factor structure for the components of Y t that can be tested by either of the three tests under consideration when applied to the moment restriction (14). We use z t = (Y 2 1,t , Y 2 2,t ) as vector of instruments in the simulations. The local approximation to the null value is given by λ N = c/N 1/8 ; c = 0. The rate N 1/8 is chosen such that the resulting moment function under local alternatives is proportional to N −1/2 , the local approximation of the moment function under which the local alternative distribution of K-J test statistic is derived in Theorem 7.
For global identification of the moment condition model, we follow Dovonon and Renault (2013) and re-parameterize the co-feature vector as (θ 0 , 1 − θ 0 ), θ 0 ∈ R. Under H 0 in our simulations, θ 0 = −1 . The test statistics considered are specifically: J(θ 0 ) for the K-J test, the two-step GMM overidentification test statistic for HS-J test and min θ GAR(θ) for the GAR test that we denote min-GAR. From Dovonon and Renault (2013), the last two test statistics are asymptotically distributed as a 50-50 mixture of χ 2 1 and χ 2 2 under the null whereas Theorem 4 states that the first one is asymptotically distributed as a χ 2 1 . Figure 6 shows the simulated rejection rates for the three tests under the null while Figure 7 plots the power curves of these tests for sample sizes N = 100; 200; 500; 1000; 5000; 10000; 20000 and 50000. Rejection rates are obtained for 10000 Monte Carlo replications.
It appears from the display in Figure 6 that if the null hypothesis is true then all the three tests have rejection rates closer to nominal (α = 0.05) as the sample size increases. The HS-J and min-GAR tests are significantly below the nominal rejection level for small sample sizes but the HS-J test seems to converge to nominal rejection rate faster than the min-GAR. For instance, for N = 1, 000 and 5, 000, the rejection rate of the HS-J test is 3.9% and 4.88%, respectively whereas that of the min-GAR test is 0.064% and 1.79%, respectively. For N as large as 100, 000, the rejection rate of the min-GAR is about 4.0%. The reality is different for the K-J test which has rejection rates closer to 5% across the sample sizes considered. For N = 50 and 100, this rate is at 6.31 and 6.22%, respectively and falls below 6% from N = 500 onwards.
The power curves of these tests displayed by Figure 7 show contrasting performance of the three tests depending on sample sizes. For sample sizes equal or below 200, the power curves of the HS-J and min-GAR tests are flat and even below nominal level (recall that these two tests barely reject the null under H 0 for such sample sizes) whereas the K-J test shows some moderate power. For N = 500 and 1000, the K-J test seems to outperform the other tests which now show some power for large values of c even though the rejection rates do not exceed 50%. From N = 5000 the performance ranking is reversed with the HS-test performing slightly better than the min-GAR test, and both having higher rejection rates than the K-J test. For c = 10, with N = 5000 and 50000, this latter test has 84.0% and 90.84% rejection rates, respectively while the HS-J test has 98.93% and 99.95%, respectively and the min-GAR 93.6% and 97.43%, respectively.
These results suggest that in small samples, these tests are not reliable and even more so for the HS-J and min-GAR tests compared to the K-J test evaluated at the true value. This may be connected to the local identification pattern of the model under the null. As the sample size increases, all the three tests show evidence of power against local alternatives as expected from our asymptotic theory in Section 4.2 for the K-J test. It is worth mentioning that the powers of the HS-J and min-GAR tests seem to converge to one faster than that of the K-J test.

Concluding remarks
In this paper, we explore how to perform inference in moment condition models that only identify the parameters locally to second order. For hypotheses about the parameters, we consider inference based on conventional Wald and LM statistics, and also the identification robust GAR and KLM statistics. For inference about the model specification, we consider the identification-robust K-J statistic and the GAR statistic. In each case, we derive the limiting distribution of statistics under both null and local alternative hypotheses. The Wald statistic is shown to have a non-standard distribution under both null and local alternatives, but the distribution under the null is easily simulated making inference practicable. The LM statistic also has a non-standard distribution under the null in the general case, but has a non-central chi squared distribution under local alternatives. Unlike in the case of strong (first-order) local identification, the Wald and LM statsitics have different distributions in the limit. The GAR, KLM and K-J statistics have a chi-squared distribution and non-central chi squared distribution under the null and alternatives respectively. These distributions are exactly the same as those obtained under weak or strong identification, and thus the identification robustness of these tests extends to second-order identified models.
We also explore the finite sample behaviour of the tests in detail in two empirically relevant models with second-order identification: the panel autoregressive (AR) model of order one estimated from a set of non-linear moment conditions, and the conditionally heteroskedastic factor model. In the panel AR model with a unit root, the AR parameter is only identified at second order, and we consider the use of Wald, LM, KLM and GAR statistics to test whether the AR coefficient is one. Our results indicate that the Wald test has the best power properties, being matched by the GAR statistic only in large samples and with both these tests exhibiting greater power than the KLM and LM. In the conditionally heteroskedastic factor model, the moment condition in question only identifies the parameters at second order over the entire parameter space. In this context, the key issue is testing whether the moment condition is valid. In this context, we examine the power properties of the K-J and GAR statistics, and compare them to those of Hansen's (1982) overidentifying restrictions test (previously analyzed in this setting by Dovonon and Renault, 2013). Here the ranking of the tests is sensitive to the sample size: the K-J test dominates in moderate sized samples, but the overidentifying restrictions test dominates in large samples.
Comparing our theoretical results with the simulations, we find that the analytical local power curves are not always very indicative of the power in finite sample settings. For example, we find that, in our panel data model, the Wald statistic has much better finite sample power than is suggested by its limiting distributions under the local alternative. Similarly, we find that under the local alternative the finite sample distributions of the GAR, KLM and LM statistics only converge very slowly to their limiting distributions. We conjecture this results from the quartic root convergence rate that occurs in second-order locally identified models. Nevertheless, our results show that it is possible to conduct tests with meaningful power in second-order locally identified models.

A Mathematical appendix
Proof of Theorem 1. Consider model (1) with the re-parameterization θ = Rη, with parameter η: The true parameter value is clearly η 0 = R −1 θ 0 . Also, so long as the same weighting matrix is used at the first step, the two-step GMM estimators satisfy the relation :η = R −1θ , where for notational brevity we have setθ =θ N . Note that Partitioning η into η 1 and η 2 , its first r and last p − r components, we have: Using Assumption 1(b), it is not hard to verify that (42) identifies η 0 at the second order. If r = p − 1, we can apply Theorem 1(b) of Dovonon and Hall (2016) and claim that: We can write: By first-order mean-value expansions, we have: whereθ ∈ (θ, θ 0 ) and may differ from row to row andC N (θ) is the k × p 2 matrix defined by: Under Assumption 3,C N (θ) converges in probability to C(θ 0 ) where C(θ) is defined likeC N (θ) but with sample means replaced by population means. Using (43), the expression ofq N (θ) in (45) can be written as:q By the law of large number and also noting that [C(θ 0 ) (I p ⊗ R 2 )] R 2 = B, we have: Substituting the latter results into (44) and after some simple calculations, we obtain: (43), this converges in distribution to After some simple algebra, we have It is easily verified that Thus, we have it is also independent of S and we can claim that: Proof of Theorem 2. Notice that the value of LM N (θ * ) is unchanged by replacingq N (θ * ) bȳ q N (θ * )A with A any nonsingular matrix. In particular, this statistic stays the same when this quantity is replaced byq N (θ * ) R 1 . . . √ N R 2 . Note also that, by Assumption 4, we have: where D is constant and ψ q is a Gaussian matrix defined in Assumption 4. The result then follows directly.

Proof of Theorem 3. (i) Similarly to the LM test statistic, KLM
From Assumption 4, we have: Since (ψ q , ψ f ) is Gaussian, ε q is independent of ψ f . Under the non-singularity assumption forψ qψq , N (θ * ) V ff (θ * ) −1/2 is well-defined in large samples and the continuous mapping theorem ensures that KLM (θ * ) converges in distribution to Conditionally onψ q , this limit follows χ 2 p distribution and the independence ofψ q and ψ f implies that this limit is unconditionally distributed as χ 2 p . (ii) The result for the GAR statistic is immediate under the stated conditions.
Proof of Theorem 4. (i) Similarly to the proof of Theorem 3, we can claim that J(θ 0 ) converges in distribution to Conditionally onψ q , this limit follows χ 2 k−p distribution and the independence ofψ q and ψ f implies that this limit is unconditionally distributed as χ 2 k−p . (ii) See the proof of Theorem 3(ii). (34). If θ N = 1 − c 2 4 √ N then it can be shown that

Derivation of equation
where a, b and d are defined in Section 2.1, and so It is also instructive to explore the population moment, Jacobian and Hessian evaluated at θ 0 under P N . Using similar arguments, it can be shown that so which shows that the rate is too fast as it sits below the rate of the random component of the sample moment.
Proof of Theorem 5. Similarly to the proof of Theorem 1, we consider the re-parameterization Rη = θ. Let η 0 = R −1 θ 0 and η N = R −1 θ N . We have: E N [f(X N , Rη N )] = 0 and, from Assumption 7, and assuming that we can interchange E N and derivatives freely, we have The fact that the Jacobian in the direction of η 2 is O(N −ξ ) and not exactly 0 make the current configuration slightly different than the assumptions of Theorem 1 of Dovonon and Hall (2016). However, the fact that ξ > 1/4 allows the conclusions of the parts (a) and (b) of that theorem to stand with φ 0 replaced by η N as we now show. Letη be the GMM estimator of η N . First, we observe using Assumption 8 (with µ N = 0) that, under P N ,f N (Rη N ) = O P (N −1/2 ) and Via similar expansions as those in the proof of Theorem 1(a) of Dovonon and Hall (2016) and leading to their equations (34) and (35), we have: where by an abuse of notation, we setf N (η) . =f N (Rη) and V ff . = V ff (θ 0 ) and use the fact that f N (η) = O P (N −1/2 ) under P N . This latter follows from the fact that the GMM norm off N (η) is smaller or equal to that off N (η N ) by definition. Hence, By the definition of the GMM estimator, this quantity is less or equal tof N (η N ) V ff (θ 1,s ) −1f N (η N ). Let z N = N 1/4 |η 2 − η N,2 | and γ = 1 4B M dB . After multiplying each side of the previous equation by N and since ξ > 1/4, we can claim that: Since γ > 0, this shows that N 1/4 (η 2 − η N,2 ) = O P (1) under P N and using the analogue of Equation (35) of Dovonon and Hall (2016), we obtain √ N (η 1 − η N,1 ) = O P (1). Using these rates of convergence, the steps of the proof of Theorem 1(b) of Dovonon and Hall (2016) follow readily (only taking Taylor expansions around η N ) and we obtain: under P N with asymptotic distribution as given by (43). Note that, since √ N (η 1 − η N,1 ), N 1/4 (η 2 − η N,2 ) = O P (1) under P N , by the Prokhorov theorem, any of its subsequence has a further subsequence, indexed say by s(N ), that converges in distribution under P N to say, (X 1 (s), X(s)) which, by (52), are such that X 1 (s) = X 1 and X(s) 2 = V for any converging subsequence s(N ). Similar derivations as those in Theorem 1 yield: We have: Similar calculation show that: and W N,c d −→ (c) ≡B B X(s) 2 X(s) 2 + 2e 2 X(s) + e 2 2 . The convergence of W N,a , W N,b and W N,c holds jointly and as a result, W N (θ 0 ) converges in distribution to (a) + (b) + (c) under P N . Note that simple expansions yield: To obtain the form of the asymptotic distribution given in the theorem, write this limit as π A + π B + π C + π F with: Some simple calculations yields: +2e 1D α (−2SI(S ≤ 0) + aX(s)e 2 ) π C = −2ae 2 α α(2X(s) + e 2 )SI(S ≤ 0).
independent of S which gives the stated result.
A first-order mean value expansion ofq N (θ * ) around θ N similar to (45) gives: whereθ ∈ (θ * , θ N ) and may differ by entry ofq N (θ * ) and withC N (θ) defined as in (45). Under Assumption 6,C N (θ) converges in probability P N to C(θ * ) and thanks to Assumptions 6 and 7, we have:q where the stochastic orders are with respect to P N . As a result, we also have, with respect to P N , By a second-order mean-value expansion off N (θ * ) around θ N , we have: whereθ ∈ (θ * , θ N ) and may differ by equation. We use in this expansion the fact thatH N (θ) converges in probability P N to H(θ * ) and the fact thatq Thanks to the identity: (e ⊗ I k )H(θ * )e = C(θ * )(I p ⊗ e)e, for all e ∈ R p , we have µ θ = −Q(e 2 ) × e 1 1 2 e 2 which then belongs to the range of Q(e 2 ). Note also that thanks to the second-order identification condition in Assumption 1(b), (e 1 , e 2 ) = 0 implies that µ θ = 0.

Derivation of equations
have a normal distribution with mean zero. Define ψ = ψ a + ψ b + ψ d and let ψ i denote the i th element of ψ. For brevity but with an abuse of notation, let V ff (θ 0 ) = V .
Combining (58)-(59) with the first order conditions andV p → V , it can be seen that ζ is implicitly characterized by: which can be re-written as 2σ 4 eζ 3 + 3σ 4 ceζ 2 + σ 2 2V −1 1 ψ + c 2 eσ 2 3 2 ζ + cσ 2 [V −1 1 ψ + 1 4 eσ 2 c 2 ] = 0, where e is the (1, 1) element of V −1 and V −1 1 is the first row of V −1 . Equation (60) implies: Using (61) and noticing that ζ is a real-valued root of the above third-order polynomial, we obtain a twofold solution for ζ + c/2. The first solution occurs when the quadratic polynomial contained in the second set of parentheses in (61) only has complex roots: in this case the solution is obtained from the term in the first set of parentheses in (61). The second solution occurs when the quadratic polynomial in the second set of parentheses in (61) has real roots. In the latter case, the two roots imply different values for ζ 2 -unless c = 0 -and so for c = 0 we choose the root that maximizes ζ 2 and therefore leads to the largest asymptotic power. The solution to (61) just described is: Define h * via ζ * = N 1/4 (h * − 1), and consider UsingV p → V (as h * p → 1), we obtain from (58) that It therefore follows that and V −1 1 V V −1 1 = e, it follows that e −1/2 V −1 1 ψ ∼ N (0, 1), and so the limiting distribution of Wald * N (1) can be written as in (40). Equation (30) follows by setting c = 0 in the above analysis.
To derive, λ θ in (41), it suffices to consider the GAR statistic (as from Theorem 6 the noncentrality parameter is the same for all three tests), Using (57), it follows that with λ θ given in (41).