Admissibility of the usual confidence interval in linear regression

Consider a linear regression model with independent and identically normally distributed random errors. Suppose that the parameter of interest is a specified linear combination of the regression parameters. We prove that the usual confidence interval for this parameter is admissible within a broad class of confidence intervals.


Introduction
Consider the linear regression model Y = Xβ + ε, where Y is a random nvector of responses, X is a known n × p matrix with linearly independent columns, β is an unknown parameter p-vector and ε ∼ N(0, σ 2 I n ) where σ 2 is an unknown positive parameter. Letβ denote the least squares estimator of β. Also, definê Suppose that the parameter of interest is θ = a T β where a is a given p-vector (a = 0). We seek a 1 − α confidence interval for θ. Define the quantile t(m) by the requirement that P − t(m) ≤ T ≤ t(m) = 1 − α for T ∼ t m . LetΘ denote a Tβ , i.e. the least squares estimator of θ. Also let v 11 denote the variance ofΘ divided by σ 2 . The usual 1 − α confidence interval for θ is where m = n − p. Is this confidence interval admissible? The admissibility of a confidence interval is a much more difficult concept than the admissibility of a point estimator, since confidence intervals must satisfy a coverage probability constraint.
Also, admissibility of confidence intervals can be defined in either weak or strong forms (Joshi, 1969(Joshi, , 1982. Kabaila & Giri (2009, Section 3) describe a broad class D of confidence intervals that includes I. The main result of the present paper, presented in Section 3, is that I is strongly admissible within the class D. An attractive feature of the proof of this result is that, although lengthy, this proof is quite straightforward and elementary.
Section 2 provides a brief description of this class D. For completeness, in Section 4 we describe a strong admissibility result, that follows from the results of Joshi (1969), for the usual 1 − α confidence interval for θ in the somewhat artificial situation that the error variance σ 2 is assumed to be known.

Description of the class D
Define the parameter τ = c T β − t where the vector c and the number t are given and a and c are linearly independent. Letτ denote c Tβ − t i.e. the least squares estimator of τ . Define the matrix V to be the covariance matrix of (Θ,τ ) divided by σ 2 . Let v ij denote the (i, j) th element of V . We use the notation [a ± b] for the (1) where the functions b and s are required to satisfy the following restrictions. The (a) Suppose that we carry out a preliminary hypothesis test of the null hypothesis τ = 0 against the alternative hypothesis τ = 0. Also suppose that we construct a confidence interval for θ with nominal coverage 1−α based on the assumption that the selected model had been given to us a priori (as the true model). The resulting confidence interval, called the naive 1−α confidence interval, belongs to the class D (Kabaila & Giri, 2009, Section 2).
(b) Confidence intervals for θ that are constructed to utilize (in the particular manner described by Kabaila & Giri, 2009) uncertain prior information that τ = 0.
Let K denote the usual 1 − α confidence interval for θ based on the assumption that τ = 0. The naive 1 − α confidence interval, described in (a), may be expressed in the following form: where h : [0, ∞) → [0, 1] is the unit step function defined by h(x) = 0 for all x ∈ [0, q] and h(x) = 1 for all x > q. Now suppose that we replace h by a continuous increasing function satisfying h(0) = 0 and h(x) → 1 as x → ∞ (a similar construction is extensively used in the context of point estimation by Saleh, 2006). The confidence interval (2) is also a member of the class D.

Main result
As noted in Section 2, each member of the class D is specified by (c, t, d, b, s).
The following result states that the usual 1 − α confidence interval for θ is strongly admissible within the class D.
Theorem 1. There does not exist (c, t, d, b, s) ∈ D such that the following three conditions hold: (c) Strict inequality holds in either (3) or (4) for at least one (β, σ 2 ).
The proof of this result is presented in Appendix A.
An illustration of this result is provided by Figure 3 of Kabaila & Giri (2009).
We call this the scaled expected length of J(b, s). Theorem 1 tells us that for any confidence interval J(b, s), with minimum coverage probability 1 − α, it cannot be the case that e(γ; s) ≤ 1 for all γ, with strict inequality for at least one γ. This fact is illustrated by the bottom panel of Figure 3 of Kabaila & Giri (2009).
Define the class D to be the subset of D in which both b and s are continuous functions. Strong admissibility of the confidence interval I within the class D implies weak admissibility of this confidence interval within the class D, as the following result shows. Since (β,σ 2 ) is a sufficient statistic for (β, σ), we reduce the data to (β,σ 2 ).
Corollary 1. There does not exist (c, t, d, b, s) ∈ D such that the following three conditions hold: (c ′ ) Strict inequality holds in either (5) or (13) for at least one (β, σ 2 ).
This corollary is proved in Appendix B.

Admissibility result for known error variance
In this section, we suppose that σ 2 is known. Without loss of generality, we assume that σ 2 = 1. As before, letβ denote the least squares estimator of β.
Sinceβ is a sufficient statistic for β, we reduce the data toβ. Assume that the parameter of interest is θ = β 1 / Var(β 1 ). Thus the least squares estimator of θ iŝ Note that (Θ,∆) is obtained by a one-to-one transformation fromβ. So, we reduce the data to (Θ,∆). Note thatΘ and∆ are independent, withΘ ∼ N(θ, 1) and∆ with a multivariate normal distribution with mean δ and known covariance matrix.

Appendix A: Proof of Theorem 1
Suppose that c is a given vector (such that c and a are linearly independent), t is a given number and d is a given positive number. The proof of Theorem 1 now proceeds as follows. We present a few definitions and a lemma. We then apply this lemma to prove this theorem.
Define W =σ/σ. Note that W has the same distribution as Q/m where Q ∼ χ 2 m . Let f W denote the probability density function of W . Also let φ denote the N(0, 1) probability density function. Now define It follows from (7) of Kabaila & Giri (2009 Thus, for each (b, s) ∈ F (d), R 1 (b, s; γ) is a continuous function of γ.
It follows from (7) that Thus ∞ −∞ R 1 (b, s; γ) dγ exists for all (b, s) ∈ F (d). Since k(wx, w, γ, ρ) and k † (wx, w, γ, ρ) are probabilities, Thus, we may define for each (b, s) ∈ F (d), where 0 < λ < 1. Kempthorne (1983Kempthorne ( , 1987Kempthorne ( , 1988 presents results on what he calls compromise decision theory. Initially, these results were applied only to the solution of some problems of point estimation. Kabaila & Tuck (2008) develop new results in compromise decision theory and apply these to a problem of interval estimation. The following lemma, which will be used in the proof of Theorem 1, is in the style of these compromise decision theory results.
Lemma 1. Suppose that c is a given vector (such that c and a are linearly independent), t is a given number and d is a given positive number. Also suppose that λ is given and that (b * , s * ) minimizes g(b, s; λ) with respect to (b, s) ∈ F (d). Then there

(c) Strict inequality holds in either (a) or (b) for at least one γ.
Proof. Suppose that c is a given vector (such that c and a are linearly independent), t is a given number and d is a given positive number. The proof is by contradiction.
Suppose that there exist (b, s) ∈ F (d) such that (a), (b) and (c) hold. Now, By hypothesis, one of the following 2 cases holds.
Lemma 1 follows from the fact that this argument holds for every given vector c (such that c and a are linearly independent), every given number t and every given positive number d.
We will first find the (b * , s * ) that minimizes g(b, s; λ) with respect to (b, s) ∈ F (d), for given λ. We will then choose λ such that J(b * , s * ) = I, the usual 1 − α confidence interval for θ. Theorem 1 is then a consequence of Lemma 1. (8), it can be shown that R 2 (b, s; γ) is equal to

By changing the variable of integration in the inner integral in
Using this expression and the restriction that b is an odd function, we find that Hence, to within an additive constant that does not depend on (b, s), Thus, to within an additive constant that does not depend on (b, s), Note that x enters into the expression for q(b, s; x) only through b(x) and s(x). To minimize g(b, s; λ) with respect to (b, s) ∈ F (d), it is therefore sufficient to minimize The situation here is similar to the computation of Bayes rules, see e.g. Casella & Berger (2002, pp. 352-353).
Therefore, to minimize g(b, s; λ) with respect to (b, s) ∈ F (d), we simply minimizẽ with respect to (b, s) ∈ R × (0, ∞), to obtain (b ′ , s ′ ) and then set b(x) = b ′ and Let the random variables A and B have the following distribution Note that the distribution of A, conditional on B = y, is N(ρy, 1 − ρ 2 ). Thus Let Φ denote the N(0, 1) cumulative distribution function. For every fixed w > 0 and s > 0, is maximized by setting b = 0. Thus, for each fixed s > 0, (10) is maximized with respect to b ∈ R by setting b = 0. Now let the random variablesÃ andB have the following distribution Note that the distribution ofÃ, conditional onB = y, is N(−ρy, 1 − ρ 2 ). Thus For every fixed w > 0 and s > 0, is maximized by setting b = 0. Thus, for each fixed s > 0, (11) is maximized with respect to b ∈ R by setting b = 0.
Therefore,q(b, s) is, for each fixed s > 0, minimized with respect to b by setting b = 0. Thus b ′ = 0 and so b * (x) = 0 for all x ∈ R. Hence, to find s ′ we need to with respect to s > 0. Therefore, to find s ′ we may minimize with respect to s > 0, where .
Theorem 1 follows from the fact that this argument holds for every given vector c (such that c and a are linearly independent), every given number t and every given positive number d.