Adaptive Robust Regression by Using a Nonlinear Regression Program

Robust regression procedures have received considerable attention in mathematical statistics literature. They, however, have not received nearly as much attention by practitioners performing data analysis. A contributing factor to this may be the lack of availability of these procedures in commonly used statistical software. In this paper we propose algorithms for obtaining parameter estimates and their asymptotic standard errors when (cid:12)tting regression models to data assuming normal/independent errors. The algorithms proposed can be implemented in the commonly available nonlinear regression programs. We review a number of previously proposed algorithms. As we discuss, these require special code and are di(cid:14)cult to implement in a nonlinear regression program. Methods of implementing the proposed algorithms in SAS-NLIN is discussed. Speci(cid:12)cally, the two applications of regression with the t and the slash family errors are discussed in detail. SAS NLIN and S-plus instructions are given for these two examples. Minor modi(cid:12)cation of these instructions can solve other problems at hand.


Introduction
It is well-known that statistical inference based on the classical normal theory assumption is sensitive to outliers and errors that come from a high-tailed distribution. Despite this fact and the considerable attention that robust estimation has received in the mathematical statistics literature, most practitioners continue to analyze their data based on the normal model. Lange, Little, and Taylor (1989) attribute the slow adaption of robust procedures to \the bewildering choice of alternative procedures and a lack of published applications to real data." Because data analysts often take t h e c o n venient route of using the available software to analyze their data and avoid the more costly alternative of writing special codes, we believe that the lack of wide availability of robust estimation procedures in the more popular statistical software also is a contributing factor to this slow adaption. To this end, in this paper we attempt to show h o w the widely available nonlinear regression programs may be used for adaptive robust regression.
Robust regression methods attempt to t regression models to data so that the t is less sensitive to the behavior on the tails of the errors and is reasonably stable when the errors come from a high-tailed distribution. Mestimation (Huber 1981), a generalized version of maximum likelihood (ML), is a popular robust procedure used in regression analysis. Lange and Sinsheimer (1993) argue for using adaptive robust regression methods that are based on ML estimation. Adaptive methods provide an attractive compromise between classical normal theory methods and modern robust methods Sinsheimer 1993, Hogg 1974). In adaptive estimation, the distribution of the errors depends on tuning parameters that are estimated or set to achieve robust inference for statistical modeling of data sets involving errors with longer than normal tails.
The best studied adaptive methods are based on the distributional families of normal/independent (N/I) type (Lange and Sinsheimer 1993, Dempster, Laird, and Rubin 1977, 1980. Consider the regression model y i = f i ( ) + e i i = 1 n where y i are the observed dependent v ariables, f i are the theoretical response functions with a known form generally depending on a set of explanatory variables x i and indexed by a set of parameters , and e i denote the errors.
In adaptive robust regression the classical normal theory assumption of e i iid N(0 2 ) is commonly replaced by the assumption that e i are identically and independently distributed (iid) with a distribution from a N/I family. More speci cally, let e i = r i = p u i , where r i iid N (0 2 ) a n d u i iid U with U being a positive random variable. Then y i are said to be N/I with location f i ( ) a n d scale 2 . Often the distribution of U depends on a single tuning parameter , disjoint f r o m and 2 . It is this class of the N/I family that we consider here.
A popular example from the N/I family, is the Student's t distribution. For this U 2 = , a fraction of the 2 with degrees of freedom. Another example is the slash distribution for which U has the density h(u) = u ;1 on 0 1] for > 0. In both of these examples serves as a tuning parameter. Other examples include the contaminated normal, the power exponential, and the stable distribution (Lange and Sinsheimer 1993). A referee has also pointed out that another common method for dealing with outliers is to assume the e i are generated from a mixture of two normal distributions, both with mean 0. One of the normals has variance 2 while the other has variance k 2 for some large k. This method, discussed in Verdinelli and Wasserman (1991), is also a member of the N/I family with the distribution of U being a d i c hotomous.
We focus on computational algorithms to obtain^ = ( ^ 2 ^ ), the ML estimate of = ( 2 ) for the t and the slash families of distributions. Moreover, we give a method for obtaining d acov(^ ), an estimate of the the asymptotic covariance matrix of^ . Of course, the asymptotic standard error of parameters in^ can be obtained by taking the square root of the diagonal elements of d acov(^ ). As we will see in Section 2, a number of algorithms have been proposed for both parameter and standard error estimation. These algorithms either have restrictions or to implement them requires special code (Jamshidian 1997). The algorithms that we propose here di er from the previously proposed algorithms in that they can be implemented in most available nonlinear regression programs, they produce standard errors, and have no restrictions.
In Section 2 we give a brief review of the previously proposed algorithms. In Section 3 we discuss parameter estimation. In Section 4 we discuss methods of obtaining standard errors. Finally, in Section 5 we discuss implementation of the proposed algorithms in a nonlinear regression program and give examples. For our examples we will use SAS NLIN. S-plus implementations are also given in the Appendix. The methods discussed, however, can be used in any other nonlinear regression package that has an iterative r e w eighting option, such as BMDP-3R or SPSS-NLR.
2 Review of algorithms Dempster et al. (1980) proposed a method of iteratively reweighted least squares to estimate = ( 2 ) w h e n f i ( ) are linear, e i are N/I, and is xed. Among the N/I distributions, the t distribution has received special attention. Rubin (1983) gave details of implementing the Dempster et al. (1980) algorithm for the special case of the t distribution. Lange et al. (1989) described EM, Fisher-scoring, and quasi-Newton methods for ML estimation of the t model. In an attempt to accelerate EM, Liu and Rubin (1995) proposed a modi ed version of EM, called ECME, to estimate the regression model with t errors. Both the Lange et al. (1989) and the Liu and Rubin (1995) algorithms can handle nonlinear f i ( ) and estimate as a free parameter. Finally, Lange and Sinsheimer (1993) described an EM algorithm for estimating the regression model with general N/I errors.
Except for the algorithm of Dempster et al. (1980) which is restricted to t with a xed degree of freedom and linear f i ( ), to use all of the other algorithms mentioned requires special code. Moreover Lange et al. (1989) and Lange and Sinsheimer (1993) proposed methods to obtain d acov(^ ). As we will see in Section 4 these methods also require special code. Jamshidian (1997) described a method for obtaining^ and d acov(^ ) f o r the t model using BMDP-LE, a maximum likelihood program. BMDP-LE is not as widely available as nonlinear regression programs. Moreover, as explained by Jamshidian (1997), convergence of the maximization algorithm used in BMDP-LE is very sensitive to starting values. Jamshidian (1997) also described a method of using a nonlinear regression program for parameter estimation of the t model with a xed degree of freedom . In addition to being restricted to t with a xed-, his method does not produce standard errors.

Parameter Estimation
Let y i have a N/I distribution with location f i ( ), scale 2 , and a tuning parameter , and let`i( ) denote the log-density o f y i . W e seek^ by maximizing the log-likelihood P n i=1`i ( ) with respect to . T o d o t h i s w e propose a generalized EM (GEM) algorithm (Dempster et al. 1977).
Since GEM is closely related to it, we begin by describing the EM algorithm of Dempster et al. (1977). Given a family of densities g(yj ) a n d t h e observed values y = ( y 1 y n ) T the EM algorithm begins by i n troducing a second family of densitiesg(zj ), related to the rst by the requirement t h a t there is a function h such that y = h(z) has density g(yj ) i f z has densitỹ g(zj ). The vector z is usually called the complete data and y is called the observed o r i n c omplete data. The algorithm begins by d e n i n g t h e Q-function Q( 0 ) = Eflogg(zj 0 )jy g: This is called the E-step. Next the vector that maximizes Q( 0 ) with respect to 0 is found. This is called the M-step. Replacing bỹ and repeating the E-and M-steps produces a sequence of values of that under appropriate conditions (Wu 1983, Dempster et al. 1977 converges to^ . If maximization of Q in the M-step is replaced by the weaker condition of obtaining a~ such that Q(~ ) > Q ( ), then the resulting algorithm is called GEM algorithm. Dempster et al. (1977) have s h o wn that, like the EM algorithm, the GEM algorithm, under certain regularity conditions that are often satis ed for the problems discussed here, is globally convergent { that is, it will converge to a stationary point from almost any starting value. Lange and Sinsheimer (1993) developed an EM algorithm to obtain^ .
Their development emphasizes the comparison principle introduced by D u tter and Huber (Huber 1981). To derive their algorithm, we l e t z T = ( y T u T ) be the complete data, where u = ( u 1 u n ) T . Then it can be shown that where E ( ) denotes the conditional expectation E( jy ), 0 = ( 0 2 0 ), 2 i ( ) = y i ; f i ( )] 2 = 2 , a n d h (u) denotes the density o f U. Then given a set of starting values , t o o b t a i n , the EM algorithm proceeds as follows: Step 1. Solve t h e w eighted regression problem Step 2. Update 2 using~ 2 = S(~ )=n.
Steps 1{3 make-up the M-step of the EM algorithm. The E-step consists of computing w i ( ) = E (u i ) a n d E flog h 0 (u i )]g. The former is needed in Step 1 and the latter is needed in Step 3. When f i ( ) is nonlinear in , Step 1 is iterative.
Step 3 is also generally iterative. These iterative steps within the iterative EM algorithm make i t di cult, and often prohibitively di cult, to implement EM in a nonlinear regression program.
To a void iteration in Step 1, we propose obtaining a point~ such t h a t S(~ ) < S ( ). We then use this~ in Step 2 to obtain~ 2 . The resulting algorithm is a GEM algorithm. To obtain such~ we propose using one step of the Gauss-Newton algorithm with step-halving. That is, we l e t ( 2) where f( ) = f 1 ( ) f n ( )] T , df=d denotes the Jacobian matrix of f( ), and W is a diagonal matrix with diagonal elements w i ( ). There is an m 0 for which S(~ ) < S ( ). We c hoose the smallest such i n teger m. The choice of a Gauss-Newton step is made because of its often e ectiveness in decreasing S( ) a n d i t s w i d e a vailability in statistical software packages such as SAS, BMDP, SPSS, and S-plus. There are problems in which the Gauss-Newton Steps are not e ective. These are mainly the problems referred to as large residual problems (see e.g., Seber and Wild, 1988, page 627). For such problems one may use other available options such a s L e v enberg-Marquardt method.
As noted, Step 3 is also generally iterative. This is because Q 2 ( 0 ) c a n be a general nonlinear function of 0 and so its maximization may require iterative methods. Since Q 2 ( 0 ) is model-speci c, depending on the choice of U, w e cannot give a general method of avoiding iterations in Step 3, as we did for Step 1. We will give speci cs for the two examples of t and slash families. It turns out that for the slash family~ can be obtained in one step (Lange and Sinsheimer, 1993). Obtaining~ for the t, h o wever, requires iteration. In Section 3.1 we g i v e a one step method of obtaining~ to a good approximation and describe application of the GEM algorithm to the t model. In Section 3.2 details of the application of the GEM algorithm to the slash family are given.

A GEM algorithm for the t family
Assume y i iid t with location f i ( ), scale 2 , and degrees of freedom . L a n g e et al. (1989) show that w i ( ) = + 1 + 2 i ( ) : These weights are used in Step 1 of the GEM algorithm to obtain~ .
The quantity Q 2i ( 0 ), needed in Step 3, is with ;( ) and ( ) denote the gamma and the digamma functions. Thus maximization of Q 2 ( 0 ), required in Step 3, for the t model requires solving the equation (3), an iterative method is required. To a void iterations we propose utilizing the approximation log(~ =2) ; (~ =2) 1 + 1 3~ 2 : (4) This approximation has a maximum absolute error of 1=(120~ 4 ) which makes it accurate to at least two signi cant digits when~ > 1. This accuracy is su cient for most practical problems. Replacing (4) in (3) leads to solving a quadratic equation for~ with its positive solution being = ;3 ; p 9 ; 12C 6C : It can be shown that C < 0, and hence~ in (5) is positive.

A GEM algorithm for the slash family
For the slash family, t h e w eights w i ( ) required in Step 1 are not as simple as those for the t distribution. These are where ;(x ) is the value of the gamma cumulative distribution function with parameter at x (Lange and Sinsheimer 1993).

Standard Error Estimation
Both Lange et al. (1989) and Lange and Sinsheimer (1993) proposed using the inverse of the expected information matrix I ;1 (^ ) t o o b t a i n d acov(^ ). Lange and Sinsheimer (1993) gave computational formulas for I(^ ) for the N/I family and Lange et al. (1989) gave that for the special case of the t family. Like their parameter estimation algorithms, to use these computation formulae requires special code.
, where`( ) = `1( ) ǹ( )] T and d`( )=d denotes the Jacobian of`( ). J(^ ) is referred to as the empirical information. We propose using J ;1 (^ ) to obtain d acov(^ ). Our proposal to use J ;1 (^ ) i s m o t i v ated by the following facts: (i) J(^ ) is asymptotically equivalent t o I(^ ) in the sense that J(^ );I(^ )]=n ! 0 a s n ! 1 whenever the central limit theorem applies to the sequence @`i( )=@ )] T @`i( )=@ )].
(ii) J(^ ) arises naturally in the context of the Gauss-Newton method, used in Step 1 of our GEM algorithm. (iii) Computation of J(^ ), as we will see shortly, is closely related to the EM Q-function.
To compute J(^ ) requires formulae for the partial derivatives @`i( )=@ . These formulas are related to the EM Q-function by the Fisher (1925) identity @`i( )=@ = _ Q i ( ) where _ Q i ( 0 ) denotes the derivative o f Q i ( 0 ) with respect to its rst argument. Using this identity w e obtain @`i( ) @ = 1 Equations (6)-(8) depend on w i ( ) = E (u i ) a n d @f i ( )=@ both of which are used in Step 1 of our GEM algorithm. Also the components required to compute _ Q 2i ( ) are often those that are used in Step 3. These make obtaining d acov(^ ) b y using J(^ ) attractive since its required computational components are available from the parameter estimation part.

Using a nonlinear regression program
In this section we discuss implementation of the GEM algorithm and computation of d acov(^ ), discussed in Sections 3 and 4, in a nonlinear regression program. We will use SAS NLIN. But, as mentioned earlier, the techniques that will be discussed can be used in most nonlinear regression programs that have an iteratively reweighting option.

Parameter estimation
A set of starting values are required to start the GEM iteration process. We have used ordinary least squares estimates for , its corresponding residual mean squares for 2 , a n d = 4. This choice of starting values has worked well in the examples reported here and in most other examples that we h a ve run.
To obtain~ in Step 1, we proposed using the Gauss-Newton step (2). SAS-NLIN uses (2) by default, starting with m = 0. It continues adding 1 to m as long as S(~ ) S( ). This process is called step-halving. For the GEM algorithm it is required that the weights W remain constant throughout the step-halving process. Unfortunately when variable weights are used, as is the case here, SAS and most other nonlinear regression programs vary W in the step-halving process. To remedy this problem, like i n m a n y other iteratively reweighted least squares applications, we turn o the step-halving. When f i ( ) is linear in , step halving is not required and in e ect turning it o does not alter the GEM algorithm. However if f i ( ) is nonlinear, to turn o the step halving makes the resulting algorithm not to be in general globally convergent. So in these cases it's advisable to check the iteration process for convergence. If nonconvergence results, one may try di erent starting values. We h a ve not had any problems with nonconvergence, however. In our experience the starting values described above h a ve almost always resulted in convergence. In Step 2 the value of the weighted sum of squares S(~ ) is required. This value is generally available in nonlinear regression programs. In SAS, it is assigned to a variable named SSE .
As mentioned, the required computations in Step 3 depend on the choice of the N/I distribution. In Sections 3.1 and 3.2 we discussed methods of obtaining~ for the t and the slash family. In both cases this requires computation of P n i=1 v i ( ) and in the case of t the quantity P n i=1 w i ( ) is additionally needed. The methods for this computation are discussed in more details in Section 5.3.

Standard error estimation
In nonlinear regression programs commonly d acov(^ ) = RMSf dF( )=d ] T dF( )=d ]g ;1 where RMS is the residual mean squares, F( ) = F 1 ( ) F n ( )] T is the response function, and dF( )=d is the Jacobian of F. This and the discussion of Section 4 suggest using the following method to obtain J ;1 (^ ) in a nonlinear regression run: Set (i) starting values to^ , (ii) the response function F( ) t ò( ), (iii) the dependent v ariable equal to`( ), and (iv) RMS = 1. Since both the response variable and the response function are set to the same value the Gauss-Newton step will be zero in the rst iteration and thus the starting parameter values^ will not change. In fact since there is no change in the parameter estimates the program will stop after the rst iteration. But, this trick will result in the program producing the appropriate standard errors in the process.
Instructions (ii) and (iii), given above, require speci cation of`i( ) t h a t might n o t b e a vailable in the EM setting. Note, however, that only the derivatives of`i( ) are needed in this run. Since, by the Fisher's identity, @`i( )=@ = _ Q i ( ), we can use Q i ( ) = Q 1i ( ) + Q 2i ( ) i n p l a c e o f i ( ) in (ii) and (iii). This avoids the need for speci cation of`i( ). Finally, setting RMS = 1 is a standard practice in many iteratively reweighted least squares procedures. This is done in SAS using the PROC NLIN option \SIGSQ=1".
In the following subsection we g i v e examples of regression with t and slash errors. In both cases our SAS instructions can be used to run any t or slash model with minimal modi cations. The main modi cations needed are to change the response function and its derivatives. The derivatives are not required if Version 6.12 of SAS is used. In this version SAS computes derivatives automatically, if they are not speci ed. This feature is also available for example in BMDP-3R.

Examples Example 1: A t model example
Here we give SAS NLIN instructions to t a linear regression model to the stack-loss data presented by B r o wnlee (1965). This data is often used for testing robust estimation procedures. Lange et al. (1989) used it to discuss linear regression with t errors. The stack loss data consists of measurements on the oxidation of ammonia to nitric acid during the operation of a plant. The data were collected over 21 consecutive d a ys and the variables measured were air ow to the plant (AIR), cooling water inlet temperature (TEMP), acid concentration (ACID), and percent of ammonia lost (LOSS). The lin-ear regression of LOSS on the variables AIRFLOW, TEMP, a n d A CID is considered.
An annotated SAS instructions is given in Figure 1. Instructions in 1] read in the data from a le named \stack.dat" and add a dummy c a s e t o the end of our data set. The dummy case is used in Step 3 of the GEM algorithm to signal the end of the data when passes are made through the data. Instead of using the last two lines of instructions in 1], one can also manually add a dummy case to the end of the data le. The values for the dummy c a s e c a n b e c hosen arbitrarily.
In 2], we start the procedure NLIN. The NOHALVE option turns o the step-halving. The CONVERGEPARM sets the convergence criterion to be based on parameter estimates. The convergence criterion value of .0001 used gives about three to four digits of accuracy in the parameter estimates. The MAXITER option determines the maximum number of iterations allowed. If the algorithm does not converge within the default value of 50 a higher value should be set for MAXITER. L i n e t wo of 2] speci es the name of parameters and their starting values.
The rst line in 3] sets the starting value for 2 . The second line computes the update~ 2 . This is Step 2 of the GEM algorithm. We h a ve coded it before Step 1, since the initial value of 2 is needed in Step 1.

4] implements
Step 1 of the GEM algorithm, as the program passes through the n = 21 cases. The values of the response function f i ( ), the dependent v ariable y i , and weights w i ( ) are assigned to F, Z, a n d W, respectively. The variable C, required to obtain~ , i s a c c u m ulated at line 5 of 4]. Finally, the partial derivatives @f i ( )=@ are assigned to variables DFXX, where XX is replaced by a parameter name. @f i ( )=@ = 0 and this is set in DFNU.

5] implements
Step 3 of the GEM algorithm. This step is executed when the program is operating on the added dummy case (case 22 here). Both the response function and the dependent v ariable are set to~ , g i v en by ( 5 ) a n d the weight is set to 1. This simply causes the value~ to be assigned to the variable NU before the start of the next iteration. The partial derivatives of Finally in 6] the MODEL statement is used to specify the overall model, putting together the model for the rst n = 21 cases and the dummy case 22. The weights W are assigned to the SAS variable WEIGHT , and the derivatives are assigned to the SAS variables DER.XX where again XX is replaced by t h e name of the parameters. The OUTPUT statement instructs SAS to add the values of^ and^ to a SAS data le. Similarly the ID statement instructs SAS to write the variables whose names are included in that statement t o a SAS data le. These values are then used to compute standard errors at convergence.
Instructions 1]{ 6] will produce parameter estimates and a set of standard errors. While the parameter estimates are correct, the standard errors that will be printed are not. Instructions 7]{ 9], shown in Figure 2, use the method discussed in Section 4 to produce the correct asymptotic standard errors.
The SET statement in the data step 7] brings the SAS data set which contains the variables in the OUTPUT and ID statements of 6] into our disposal. The second line of 7] deletes the dummy case from the SAS data set.
In 8] PROC NLIN is started and the option \SIGSQ=1" is set. The PARMS statement sets the starting values for the parameters to 0. It is really lines 3 and 4 of 8] that set the initial values to^ obtained from the instructions 1]{ 6] of Figure 1.
Finally 9] de nes Q 1 , Q 2 the dependent and the response variables, as discussed in Section 4. The partial derivatives of the response for this run are given in 10]. These are based on the formulas (6){(8).
To use the instructions 1]{ 10] in solving other problems, one only needs to modify the statements underlined typeset Figures 1 and 2. These are mainly the variable names, the parameter names and their initial values, the response function, its derivatives if required, and the values 21 and 22 that are n and n + 1 for this example.
We ran instructions 1]-10] in SAS. SAS required 57 iterations to meet the convergence criterion. The parameter estimates obtained were^ 1 = ;38:32,  Lange et al. (1989) to at least two signi cant digits. We will discuss the obtained standard error estimates in Section 5.4.

Example 2: A slash model example
Here we consider a data set that relates soil concentration of an agrochemical x i to the growth response measured as log weight y i of the plant lepidium. This data was analyzed by Racine-Poon (1988), where she considered the response function f i ( ) = ( 1 = 1 + e 2 + 3 ln(x i )+ 4 ln(x i )] 2 if x i > 0 1 if x i = 0 (10) Lange and Sinsheimer (1993) pointed out that this data set has two rather mild outliers among the 42 points observed. They used the slash model to t the data. Figure 3 shows SAS NLIN instructions to obtain parameter estimates and their standard errors for this problem. These instructions are similar to those in Figures 1 and 2. The main di erences are: (i) We h a ve n o t included derivatives of the response functions in this example. As noted, SAS NLIN version 6.12 produces these automatically, if they are not speci ed. (ii) The model speci c formulas for w i ( ), v i ( ),~ , a n d Q 2 ( 0 ) are speci ed as de ned in Section 3.2. Again one can use these instructions to solve other problems by making appropriate changes to the underlined statements.

Standard error estimates
In Section 4 we proposed using the empirical information to obtain asymptotic standard error estimates. This was motivated by the fact that the empirical information is asymptotically equivalent to the often used expected information. A s i m ulation study is of interest to examine the behavior of  T2H T3H T4H NUH   ID SIGH W V DEL  DATA SET  IF N =43 THEN DELETE  PROC NLIN SIGSQ=1   PARMS T1=0 T2=0 T3=0 T4=0 NU=0 SIG=0   IF ITER =-1 THEN DO   T1=T1H T2=T2H T3=T3H T4=T4H NU=NUH SIG=SIGH END   IF X=0 THEN F=T1 ELSE F=T1/(1+EXP(T2+T3*LOG(X)+T4*LOG(X)*LOG(X))) R=LOG(Y)-F DEL=(R*R)/SIG Q1=-.5*(W*DEL+LOG(SIG)) Q2=LOG(NU)+NU*V Z=Q1+Q2 MODEL Z=Q1+Q2 RUN both of these estimates in small samples in the context here. In this section we make a limited comparison of the information based standard error estimates to those obtained by bootstrap. We use the two examples given in the previous section and two new examples with arti cial data and moderate sized sample sizes of 210. We refer to the new examples as Examples 3 and 4.
The data for Example 3 was generated using the model of Example 1 with population parameters = ( ;38 : 9 : 5 ;:1), 2 = 1 , a n d = 1 . T h e observations in the independent v ariables were replicated 10 times to generate 210 cases, and t noises were added to f( ) t o o b t a i n y. The data for Example 4 w as generated using the model of Example 2 with population parameters = ( 2 ;4 3 ;:4), = 1 :5, and = :01. Again the X values were replicated 5 times to obtain 210 cases, and then slash distributed noises were added to f( ) to obtain y. Using SAS-NLIN, the parameter estimates obtained for Example 3 werê = ( ;40:2 0:91 0:47 ;0:08), = :93, and 2 = 1 :01 and those for Example 4 w ere^ = ( 2 :37 ;4:28 2:66 ;:43), = 1 :52, and 2 = :002. These estimate the population parameters well. Table 1 shows a comparison of standard errors obtained for the four Examples. The expected information estimates for Examples 1 and 2 were obtained from Lange and Sinsheimer (1995) and Lange et al. (1989) and these were not computed for Examples 3 and 4. The empirical information estimates were obtained using SAS runs as described in Section 5.3. The bootstrap estimates were obtained using S-plus. They were based on bootstrap sample sizes of 150. For Examples 1 and 2 the expected and empirical informations estimates of asymptotic standard errors of the parameters agree to approximately one signi cant digit with the exception that for^ in Example 2 the expected information estimate is .2 as compared to the empirical information estimate of 1.2 . Since the sample sizes are small, one does not expect great agreement b e t ween the information matrix estimates and the bootstrap estimates, but overall the empirical information estimates are closer to the bootstrap estimates than the expected information estimates. In examples 3 and 4 where the sample sizes are larger, the bootstrap estimates and the empirical information estimates agree fairly well.
These limited examples give some comfort to using empirical information estimates of standard errors when the sample sizes are moderate to large. For small sample sizes the empirical information seem to produce slightly better estimates than the expected information but overall both of these seem to underestimate standard errors. Thus, in case of small sample sizes one may use the S-Plus scripts given in the Appendix to obtain bootstrap estimates of standard errors.

Appendix
In this section we g i v e t wo S-plus scripts that will produce parameter and standard error estimates for Examples 1 and 2. By modifying the rst 8 lines of the rst script and the rst 13 lines of the second script to specify information about a problem at hand these script can be used to solve other regression problems involving t and slash errors. The scripts were tested on S-Plus Student Edition 4.5.