A note on the local power of the LR, Wald, score and gradient tests

This paper examines the local power of the likelihood ratio, Wald, score and gradient tests under the presence of a scalar parameter, � say, that is orthogonal to the remaining parameters. We show that some of the coefficients that define the local powers remain unchangedregardless of whetheris known or needs to be estimated, whereas the others can be written as the sum of two terms, the first of which being the corresponding term obtained as ifwere known, and the second, an additional term yielded by the fact thatis unknown. The contribution of each set of parameters on the local powers of the tests can then be examined. Various implications of our main result are stated and discussed. Several examples are presented for illustrative purposes.


Introduction
The likelihood ratio (LR), Wald and Rao score tests are usually employed for testing hypotheses in parametric models. These tests have been widely used in economics, engineering and biology, among other fields. Recently, Terrell (2002) proposed a new criterion for testing hypotheses, referred to as the gradient test. As we will see below, the gradient statistic is very simple to compute when compared with the Wald and the score statistics. Due to its simplicity, Rao (2005) wrote: "The suggestion by Terrell is attractive as it is simple to compute. It would be of interest to investigate the performance of the [gradient] statistic." An interesting result about the gradient statistic is that it shares the same first order asymptotic properties with the LR, Wald and score statistics. That is, to the first order of approximation, these statistics have the same asymptotic distributional properties either under the null hypothesis or under a sequence of Pitman alternatives, i.e. a sequence of local alternatives that shrink to the null hypothesis at a convergence rate n −1/2 , n being the sample size. Additionally, it is known that, up to an error of order n −1 , the LR, Wald, score and gradient tests have the same size properties but their local powers differ in the n −1/2 term. Therefore, a meaningful comparison among the criteria can be performed by comparing the nonnull asymptotic expansions to order n −1/2 of the distribution functions of these statistics under a sequence of Pitman alternatives.
The nonnull asymptotic expansions up to order n −1/2 for the distribution functions of the LR and Wald statistics for testing a composite hypothesis in the presence of nuisance parameters were derived by Hayakawa (1975). Harris and Peers (1980) obtained an analogous result for the score statistic. The asymptotic expansion up to order n −1/2 for the distribution function of the gradient statistic was derived by Lemonte and Ferrari (2012). The null asymptotic expansion up to order n −1 for the distribution function of the likelihood ratio statistic for testing a composite hypothesis in the presence of nuisance parameters is given in Hayakawa (1977Hayakawa ( , 1987, while an analogous result for the score statistic was obtained by Harris (1985); see also Hayakawa and Puri (1985). The derivation of the null asymptotic expansion up to order n −1 for the distribution function of the gradient statistic is in progress and will be published in a future article.
Let π(θ) be a continuous parametric model and ℓ(θ) be the corresponding total log-likelihood function, where θ = (β ⊤ 1 , β ⊤ 2 , φ) ⊤ is a (p + 1)-vector of unknown parameters. The dimensions of β 1 and β 2 are q and p − q, respectively, and φ is a scalar parameter. We focus on testing the composite null hypothesis H 0 : β 2 = β 20 against the two-sided alternative hypothesis H 1 : β 2 = β 20 , where β 20 is a specified vector and β 1 and φ are nuisance parameters. Let U θ = ∂ℓ(θ)/∂θ and K θ = E(U θ U ⊤ θ ) be the score function and the Fisher information matrix for θ, respectively. Also, let θ = ( β ⊤ 1 , β ⊤ 2 , φ) ⊤ and θ = ( β ⊤ 1 , β ⊤ 20 , φ) ⊤ denote the unrestricted (under H 1 ) and restricted (under H 0 ) maximum likelihood estimators of θ = (β ⊤ 1 , β ⊤ 2 , φ) ⊤ . The likelihood ratio (S 1 ), Wald (S 2 ), score (S 3 ) and gradient (S 4 ) statistics for testing H 0 versus H 1 are given, respectively, by . The limiting distribution of S 1 , S 2 , S 3 and S 4 is χ 2 p−q under H 0 . Under H 1 , these statistics have a χ 2 p−q,λ , i.e. a noncentral chi-square distribution with p − q degrees of freedom and an appropriate noncentrality parameter λ. The null hypothesis is rejected for a given nominal level, γ say, if the test statistic exceeds the upper 100(1 − γ)% quantile of the χ 2 p−q distribution. In this paper, we shall assume that β = (β ⊤ 1 , β ⊤ 2 ) ⊤ is globally orthogonal to φ in the sense of Cox and Reid (1987). In other words, the Fisher information matrix for θ and its inverse are block-diagonal, say. Here, K β is the Fisher information matrix for β and K φ is the information relative to φ. There are numerous statistical models for which global orthogonality holds; see Section 3. The global orthogonality of the parameters will be exploited and we will show an interesting decomposition of the n −1/2 term of the expansion of the nonnull distribution function of the four statistics. From the partition of θ and the global orthogonality between β and φ, we have the corresponding partitions: Hence, the statistics S 2 , S 3 and S 4 can be rewritten as Notice that S 4 , the gradient statistic, has a very simple form and does not involve the information matrix, neither expected nor observed, unlike S 2 and S 3 . Terrell (2002) points out that the gradient statistic "is not transparently non-negative, even though it must be so asymptotically." His Theorem 2 implies that if the log-likelihood function is concave and is differentiable at θ, then S 4 ≥ 0.
The subject matter of this note is the local power of the LR, Wald, score and gradient tests for testing the null hypothesis H 0 : β 2 = β 20 under a sequence of Pitman alternatives, when global orthogonality between β and φ holds. The nonnull distribution function of the statistics S 1 , S 2 , S 3 and S 4 under Pitman alternatives for testing H 0 : β 2 = β 20 takes the form for i = 1, 2, 3, 4, where G f,λ (x) is the cumulative distribution function of a noncentral chi-square variate with f degrees of freedom and an appropriate noncentrality parameter λ. Here, f = p − q. Clearly, the local power (up to order n −1/2 ) of the four corresponding tests are given by 1 − Pr(S i ≤ x), where x is replaced by the appropriate quantile of the χ 2 p−q distribution according to the chosen nominal level. The coefficients b ik (i = 1, 2, 3, 4 and k = 0, 1, 2, 3) and λ are given in Hayakawa (1975), Harris and Peers (1980) and Lemonte and Ferrari (2012), and are reproduced in Section 2.
We will show that the coefficients b i2 and b i3 , for i = 1, 2, 3, 4, in (1) remain unchanged regardless of whether φ is known or needs to be estimated, whereas the coefficients b i1 , for i = 1, 2, 3, 4, can be written as the sum of two terms, the first of which being the corresponding term obtained as if φ were known, and the second, an additional term yielded by the fact that φ is unknown. A sufficient condition under which this additional term is zero will be given. The general result derived in this paper allows one to explicitly verify the contribution of each parameter on the local power of the LR, Wald, score and gradient tests for testing the null hypothesis H 0 . We also discuss on the local power of the tests for testing the null hypothesis H 0 : φ = φ 0 , where φ 0 is a specified scalar value, when β acts as a vector of nuisance parameters. Some examples which include probability density functions and regression models are considered to illustrate our result.
The general expressions for the coefficients b ik 's (i = 1, 2, 3, 4 and k = 0, 1, 2, 3) that define the nonnull expansions of the distribution function of the statistics S 1 , S 2 , S 3 and S 4 under Pitman alternatives for testing H 0 : β 2 = β 20 , which are given in (1), can be written as and ǫ * r is the rth element of the vector ǫ * . Here, the non-centrality parameter is given by λ = ǫ * ⊤ K θ ǫ * . The coefficients b ik 's are of order n −1/2 and all quantities except ǫ are evaluated under the null hypothesis.
Based on the general expressions of the coefficients presented above and exploiting the orthogonality between β and φ, we arrive, after long and tedious algebraic manipulations, at the following general result.
Theorem 1 Let θ = (β ⊤ 1 , β ⊤ 2 , φ) ⊤ be the parameter vector of dimension p + 1, where the dimensions of β 1 and β 2 are q and p−q, respectively, and φ is a scalar parameter. Assume that β = (β ⊤ 1 , β ⊤ 2 ) ⊤ and φ are globally orthogonal. The nonnull asymptotic expansions of the distribution functions of the LR, Wald, score and gradient statistics for testing the null hypothesis H 0 : β 2 = β 20 under a sequence of Pitman alternatives are given by (1) Notice that b 0 i1 (i = 1, 2, 3, 4) and b ik (i = 1, 2, 3, 4 and k = 2, 3) represent the contribution of the parameter vector β to the local power of the LR, Wald, score and gradient tests for testing the null hypothesis H 0 : β 2 = β 20 , since these expressions are only obtained over the components of β, i.e. as if φ were known. On the other hand, the quantity ξ, which depends on third-order mixed cumulants involving φ and β, can be regarded as the contribution of the parameter φ to the local power of the LR, Wald, score and gradient tests when it is unknown, that is, when it needs to be estimated. It is interesting to note that the contribution yielded by the fact that φ is unknown is the same for the four tests. Additionally, the contribution of the parameter φ to the local power of the tests only appears in the coefficient b i1 (i = 1, 2, 3, 4) and, of course, in b i0 (i = 1, 2, 3, 4).
Theorem 1 implies that the limiting distribution of the four statistics, namely a non-central chi-square distribution with non-centrality parameter λ, is the same regardless of whether φ is known or estimated from the data. Notice that ξ is the only term that involves cumulants of log-likelihood derivatives with respect to φ, and it decreases with the Fisher information for φ and vanishes if φ is known. By using the Bartlett identity κ φ,φt = κ (φ) φt − κ φφt , for t = 1, . . . , p, where κ (φ) φt = ∂κ φt /∂φ, we have κ φ,φt = −κ φφt since the orthogonality between β and φ implies that κ φt = 0. Therefore, we can write Theorem 1 has a practical application when the goal is to obtain explicit formulas to the nonnull distribution function of any of the four tests for special models in which orthogonality holds. It suggests that the coefficients b ik 's should be obtained as if the scalar orthogonal parameter φ were known, and the extra contribution due to the estimation of φ should be obtained from (2). Now, let Π 0 i and Π i , for i = 1, 2, 3, 4, be the local powers (ignoring terms of order smaller than n −1/2 ) of the test that uses the statistic S i when φ is known and when φ is unknown, respectively. It is well known that G m,λ (x) − G m+2,λ (x) = 2g m+2,λ (x), where g f,λ (x) is the probability density function of a non-central chi-square variate with f degrees of freedom and non-centrality parameter λ. We can then write Π i − Π 0 i = c ξ for i = 1, 2, 3, 4, where c = 2g p−q+2,λ (x) > 0 and x represents the appropriate quantile of the reference distribution for the chosen nominal level. Therefore, the difference between the local powers can be zero, or it can increase or decrease when φ needs to be estimated, depending on the sign of the components of ǫ. If κ φφt = 0, for t = 1, . . . , p, we have ξ = 0 and hence the local powers of the four tests do not change when a scalar parameter, which is globally orthogonal to the remaining parameters, is included in the model specification. In the next section we will present various examples for which this happens.
Corollary 1 Let θ = (β, φ) ⊤ be the parameter vector with β and φ being globally orthogonal parameters. The nonnull asymptotic expansions of the distribution functions of the LR, Wald, score and gradient statistics for testing the null hypothesis H 0 : β = β 0 under a sequence of Pitman alternatives are given by (1) with f = p and λ = ǫ ⊤ K β ǫ, and the coefficients are b If q = 0 and p = 1, the null hypothesis is H 0 : β = β 0 , where β 0 is a specified scalar, and hence we have the corollary.
If we wish to test H 0 : φ = φ 0 , where φ 0 is a specified scalar and β acts as a vector of nuisance parameters, then the expressions for the coefficients b ik (i = 1, 2, 3, 4 and k = 0, 1, 2, 3) that define the nonnull asymptotic expansions of the distribution functions of the LR, Wald, score and gradient statistics for testing the null hypothesis H 0 : φ = φ 0 under a sequence of Pitman alternatives H 1n : φ = φ 0 + ǫ, where ǫ = φ − φ 0 is assumed to be of order n −1/2 , are given by Here, the additional contribution on the local power of the tests when β needs to be estimated takes the form ξ = 1 2 p r,s=1 (κ rsφ + 2κ r,sφ )κ −1 r,s ǫ. Also, if p = 1, then ξ = 1 2 K −1 β (κ ββφ + 2κ β,βφ )ǫ. As a final remark, we shall point out that our results are also valid when the statistics are modified via a Bartlett or a Bartlett-type correction (see, for example, Cribari-Neto and Cordeiro, 1996) since the corrections have no effect on the n −1/2 term of the local power of the corresponding tests. The advantage of the corrected tests is that they are less size distorted than the tests in the original form in small and moderate-sized samples.

Examples
In this section, we discuss some examples to illustrate our results. We focus on ξ, which determines whether the local power changes if a parameter that is globally orthogonal to the remaining parameters is introduced in the model. It is evident that several other special cases could be considered.
Birnbaum-Saunders distribution. The Birnbaum-Saunders (BS) distribution introduced by Birnbaum and Saunders (1969a,b) is also known as the fatigue life distribution. It describes the total time until the damage caused by the development and growth of a dominant crack reaches a threshold level and causes a failure. The BS distribution has been used quite effectively to model times to failure for materials subject to fatigue and for modeling lifetime data. Let y 1 , . . . , y n be i.i.d. random variables with a BS distribution with p.d.f. given by π(y; α, η) = κ(α, η)y −3/2 (y + η) exp{−τ (y/η)/(2α 2 )}, where y > 0, α > 0 (shape parameter), η > 0 (scale parameter), κ(α, η) = exp(α −2 )/(2α √ 2πη) and τ (z) = z + z −1 . We first consider the null hypothesis H 0 : α = α 0 , where α 0 is a specified positive scalar. It is possible to show that ξ = −ǫ(2 being the cumulative distribution function of the standard normal distribution. It is interesting to note that ξ does not involve the parameter η. For testing the null hypothesis H 0 : η = η 0 against H 1 : η = η 0 , where η 0 is a specified positive scalar, ξ reduces to ξ = 0 and hence the coefficients that define the nonnull asymptotic expansions of the distribution functions of the LR, Wald, score and gradient statistics do not change when the parameter α needs to be estimated. von Mises distribution. Let y 1 , . . . , y n be i.i.d. random variables with a von Mises distribution with mean direction µ and concentration parameter φ and p.d.f. π(y; µ, φ) = {2πI 0 (φ)} −1 exp{φ cos(y−µ)}, where 0 ≤ y < 2π, 0 ≤ µ < 2π, φ > 0 and I 0 (·) is the modified Bessel function of the first kind and order 0. The positive parameter φ measures the concentration of the distribution: as φ → 0 the von Mises distribution converges to the uniform distribution around the circumference, whereas for φ → ∞ the distribution tends to the point distribution concentrated in the mean direction. This distribution is particularly useful for the analysis of circular data. For testing H 0 : µ = µ 0 , where µ 0 is a specified scalar, we have ξ = 0 and hence when one introduces unknown concentration the coefficients that define the nonnull asymptotic expansions of the distribution functions of the LR, Wald, score and gradient statistics do not change. The additional contribution on the local powers of the LR, Wald, score and gradient tests (up to order n −1/2 ) for testing H 0 : φ = φ 0 , where φ 0 is a specified positive scalar, when the parameter µ unknown, reduces to ξ = ǫ/(2φ 0 ), where ǫ = φ − φ 0 = O(n −1/2 ).
Generalized linear models. In the following consider the problem of testing hypothesis in the class of generalized linear models (McCullagh and Nelder, 1989). Suppose that the random variables y 1 , . . . , y n are independent and each y l has a probability density function of the form π(y; ζ l , φ) = exp φ{yζ l − b(ζ l )} + c(y, φ) , where b(·) and c(·, ·) are known appropriate functions. The function c(·, ·) admits a factorization of the form c(y, φ) = c 1 (φ) + c 2 (y). The mean and the variance of y l are E(y l ) = µ l = db(ζ l )/dζ l and var(y l ) = φ −1 V l , where V l = dµ l /dζ l is called the variance function and ζ l = q(µ l ) = V −1 l dµ l is a known one-to-one function of µ l . The choice of the variance function V l as a function of µ l determines q(µ l ). We have V l = 1 [q(µ l ) = µ l ], V l = µ 2 l [q(µ l ) = −1/µ l ] and V l = µ 3 l [q(µ l ) = −1/(2µ 2 l )] for the normal, gamma and inverse Gaussian models, respectively. The parameters ζ l and φ > 0 are called the canonical and precision parameters, respectively. The systematic part of the model is defined by d(µ l ) = η l = x ⊤ l β (l = 1, . . . , n), where d(·) is a known one-to-one differentiable link function, x ⊤ l = (x l1 , . . . , x lp ) is a vector of known variables associated with the lth observable response and β = (β 1 , . . . , β p ) ⊤ is a set of unknown parameters to be estimated (p < n). Let X = (x 1 , . . . , x n ) ⊤ be the model matrix with full column rank, i.e. rank(X) = p. The hypothesis of interest is H 0 : β 2 = β 20 , which will be tested against the alternative hypothesis H 1 : β 2 = β 20 , where β is partitioned as β = (β ⊤ 1 , β ⊤ 2 ) ⊤ , with β 1 = (β 1 , . . . , β q ) ⊤ and β 2 = (β q+1 , . . . , β p ) ⊤ . We have ξ = 0 and hence the coefficients that define the nonnull asymptotic expansions of the distribution functions of the LR, Wald, score and gradient statistics do not change when the parameter φ needs to be estimated. On the other hand, for testing H 0 : φ = φ 0 , where φ 0 is a specified positive scalar, we have ξ = pǫ/(2φ 0 ) with ǫ = φ − φ 0 = O(n −1/2 ). Notice that the additional contribution on the local powers of the LR, Wald, score and gradient tests (up to order n −1/2 ) for testing H 0 by considering the parameter vector β unknown depends on the rank of the matrix X. Also, ξ does not involve the unknown β. For the class of exponential family nonlinear models (Cordeiro and Paula, 1989), which is a natural extension of the generalized linear models for allowing nonlinear systematic component, we arrive exactly at the same conclusions.