Conﬁdence intervals in a regression with both linear and non-linear terms

: We present a simple way for calculating conﬁdence intervals for a class of scalar functions of the parameters in least squares estimation when there are linear together with a small number of non-linear terms. We do not assume normality.


Introduction
Many inferential methods for non-linear models have been developed. For most excellent accounts, we refer the readers to Bates and Watts [1], Carroll and Ruppert [3], Seber and Wild [15], Lindsey [13], Huet et al. [11], Carroll et al. [4] and references therein. Often one wishes to fit models that contain both linear and non-linear terms. Such models arise in many situations. We mention: stiffness and friction in biological rhythmic movements (Beek et al. [2]); data precision in optimal regression models (Shacham and Brauner [16]); local chlorophyll concentration from satellite images (Trentin et al. [19]); and models of total and presumed wildlife sources (Siewicki et al. [17]).
Suppose we are fitting a model involving both linear parameters and one or two non-linear parameters by least squares. A convenient approach is to use a linear least squares program to fit the model for a variety of values of the non-linear parameters and then find confidence intervals for the nonlinear parameters from the graph of the residual sum of squares plotted against the trial values of the non-linear parameters. In this note, we derive an easy way of finding confidence intervals for the other parameters, considered one at a time, or certain functions of them from the graph of the residual sum of squares.
The purpose here is not to achieve high accuracy. There are of course many methods for finding highly accurate confidence intervals for non-linear models. These methods are based on complicated and computationally expensive methods like bootstrapping and Markov Chain Monte Carlo (see, for example, Hu and Kalbfleisch [10] and Tian et al. [18]). Besides, the methods are relative new, and software for their implementation may not be widely available in the public domain -for example, the authors could not find references for Tian et al. [18] in the widely used R statistical package.
Here, we derive a simple method for finding confidence intervals that can be implemented easily in almost any platform, possibly even using a hand calculator. We show that our method is reasonably accurate. We compare our method with the delta method (Casella and Berger [5]), another simple and commonly used method for constructing confidence intervals for non-linear functions. We establish that ours is far superior.
Suppose our data is represented by the n-dimensional vector where f (·, ·) is a real valued n × 1 function; X θ is an n × m matrix function of θ; β is an m-dimensional vector of unknown linear parameters, linear in the sense that f (X θ , β) = f (X ′ θ β), where A ′ denotes the transpose of a matrix or vector A; and ε is an n-dimensional vector of uncorrelated random variables with zero expectation and variance σ 2 . We do not assume normality. The vector, θ, consists of s unknown non-linear parameters, non-linear is the sense that Y is a non-linear function of θ. We wish to find a confidence interval (Λ − , Λ + ) for a function where g(·, ·) is a real valued scalar function; λ θ is an m-dimensional function of θ.
Three possible examples of (1) are now given.
Take g(λ θ , β) = λ ′ θ β and λ θ to be an m × 1 vector of zeros except for a one at the ith position. Then Λ = β i .
. . , n and for a given x, where ǫ i are independent errors. This is a particular case of the two phase regression model due to Hawkins [8] and Davies [6], see Section 3 for details. Here, (x) x ≥ 0 and a, c, θ are unknown scalar parameters, so m = 2 and s = 1. Take g(λ θ , β) = λ ′ θ β, β = (a, c) ′ , and λ θ = (1, (x − θ) + ) ′ . Then Λ = a + c(x − θ) + . Example 1.3. Predicted value, a + bx + c(x − θ) + , of the model . . , n and for a given x, where ǫ i are independent errors. This is the two phase regression model due to Hawkins [8] and Davies [6], see Section 3 for details. Here, a, b, c, θ are unknown scalar parameters, so m = 3 and Often f (·, ·), g(·, ·), X θ and λ θ are highly non-linear and non-differentiable (Kaniovskaya [12]; Manski [14], Chapter 9; [9]). In these cases, we might wish to base our confidence interval directly on the residual sum of squares function R θ,β = Y − f (X θ , β) 2 , where · denotes the l 2 norm of a vector. See Donaldson and Schnabel [7]. Define , where α denotes the confidence level and F p,q (α) denotes the α probability level of an F p,q distribution. Then an approximate confidence set for Λ may be defined using the least squares analogue of the profile likelihood: Typically, this set will be an interval (Λ − , Λ + ), where Λ − and Λ + can be found from (2) if the inequality is replaced by an equality.
In Section 2, we describe the method for finding (Λ − , Λ + ) and illustrate it for Examples 1.1-1.3. In Section 3, some numerical studies are discussed to assess the performance of the confidence interval.
and the union is taken over all θ with R θ < ∆.
If λ θ and X θ are continuous functions of θ and {θ : R θ < ∆} is connected then the confidence set will be an interval (Λ − , Λ + ), where Proof. We want to find the set Using Lagrange multipliers to evaluate the inner minimum: we need to solve

607
That is, So, the inner minimum of (5) becomes where the union is over values of θ with R θ < ∆. This equals ∪ θ I θ as required.
If λ θ and X θ are continuous functions of θ and {θ : R θ < ∆} is connected; then so is {λ ′ θ β θ : R θ < ∆} and hence so is our confidence set. In this case, it is an interval with end points (3) and (4). This completes the proof.
We now return to the three examples discussed in Section 1. We show how the results of Theorem 2.2 apply to each of the examples. Throughout, we denote by A ij the (i, j)th element of A −1 .
All of the terms required for (3) and (4) can be found from linear least squares programs. The minimum and maximum can be found by scanning over values of θ at suitably small intervals. In Section 3, for example, the minimum and maximum are found by taking θ from −9 to 9 at intervals of 0.01. Linear least square programs are widely available in every statistical package, even in many hand calculators. Hence, the method presented by Theorem 2.2 is simple and could have wide spread applicability.
In principle, the confidence sets, ∪ θ I θ , in Theorems 2.1 and 2.2 could be pieces of sets. However, if f (·, ·), g(·, ·), X θ , and λ θ are suitably regular functions -as they will be in most applications -then ∪ θ I θ will be a whole interval. In particular, if λ θ and X θ are continuous functions of θ and {θ : R θ < ∆} is connected then ∪ θ I θ will be a whole interval as shown by Theorem 2.

Numerical studies
Here, we pursue the model discussed by Examples 1.2-1.3 and 2.2-2.3, i.e. the continuous two phase regression model due to Hawkins [8] and Davies [6]: for i = 1, 2, . . . , n, where ǫ i are assumed to be independent standard normal. Take β = (a, b, c) ′ , an unknown parameter vector, and let λ θ = (1, x, (x − θ)) ′ for some given x. We wish to find a confidence interval for the predicted value The results in Example 2.3 can be used to find the confidence interval. Computer simulation shows that actual confidence levels tend to be slightly lower than nominal values. There is no evidence to suggest that the confidence • the top left gives the fraction of trials that the calculated confidence interval covered y in (7). • the top right gives the fraction of successes for the confidence interval found as those values of θ for which R θ > ∆. • the bottom left gives the average length of the confidence interval found as those values of θ for which R θ > ∆. • the bottom right gives the power of the 5% test for testing the hypothesis c is non-zero as given by Davies [6] with the variance of the series being estimated. Also shown in Figures 1-5 are the confidence levels for y, θ and the average length of the confidence interval for θ computed using the delta method.
In the simulations n = 21, x i = (−10, −9, . . . , 10), the residual standard deviation equals 1.0, the formulas were calculated for values of θ from −9 to 9 at intervals of 0.01. In each case, there were 10,000 simulations and the nominal confidence level was 0.95. The choice n = 21 corresponds to a small sample size. Other small sample sizes yielded similar results. Figures 1-5 suggest that the confidence interval for y is reasonably accurate whether the confidence interval for θ is short or not, whether c is well distinguished from zero or not, and whether x and θ are well separated or not.
The bottom right plots in Figures 1-5 show -as expected -that the power increases to 1 as c becomes more distinguished from zero. The bottom left plots show that the average length becomes shorter. The top left, the top right and the bottom left plots in Figures 1-5 suggest that the proposed method has higher confidence levels and shorter lengths than the delta method for all values of c, θ and x considered. The gain in average length does not appear substantial. The gain in confidence level appears substantial. The average confidence level for the proposed method is approximately 0.945. That for the delta method is approximately 0.905.
We now perform a simulation study to verify that the confidence level of the interval given by Theorem 2.2 converges to the nominal value with increasing sample size. We computed the confidence levels for y as in  It is evident from Figure 6 that the confidence levels for y approach the nominal level with increasing sample size for each of the selected values of (c, θ, x). For practical use, one could consider samples of size 40 or higher sufficient.
Next, we wish to verify that calculations using (2) and Theorem 2.2 lead to the same results. For this, we computed the confidence levels of y using (2) for the same ranges of parameters considered by Figure 6. These confidence levels were plotted on the same graphs. As we can see, the values computed using the two formulas appear indistinguishable. This supports correctness of the mathematical derivations for Theorem 2.2.
The above results assume that the residuals in (6) are normal. To show robustness of our methods in Section 2, these results were recalculated for a variety of non-normal and asymmetric errors, including Student's t with 1 degree of freedom, Student's t with 10 degrees of freedom, Student's t with 20 degrees