A note on least squares sensitivity in single-index model estimation and the beneﬁts of response transformations

: Ordinary Least Squares (OLS) is recognised as being useful in the context of multiple linear regression but can also be eﬀective under the more general framework of the single-index model. In cases where it is ineffective, transformations to the response can improve performance while still allowing for interpretation on the original scale. In this paper we introduce an inﬂuence diagnostic for OLS that can be used to assess its eﬀectiveness in the general setting and which can also be used following response transformations. These ﬁndings are further emphasized and veriﬁed via some simulation studies


Introduction
It is well known that Ordinary Least Squares (OLS) is an efficient and effective estimator of coefficients from a multiple linear regression (MLR) model. Additionally it is also applicable to a much wider framework of models. Consider a regressor vector x ∈ R p , random response variable Y ∈ R and random error term ǫ assumed to be independent of x with mean 0. In extending results from Brillinger (1977Brillinger ( , 1983, Li & Duan (1989) showed that, for mild conditions on x, the OLS slope vector is equal to cβ for a c ∈ R under the assumed model Y = f (β ⊤ x, ǫ). Here the form of f is unknown. However, given that the OLS slope vector, which we will denote as b, is a scalar multiple of β, a plot of Y versus b ⊤ x may be used to visually explore possibilities for f .
Depending on the form of f , OLS is not always a good estimator of the direction of β in practice. Nevertheless as the error term is not necessarily additive, response transformations are possible that may result in significant improvements in estimation. The purpose of this paper is two-fold. Firstly, we introduce an influence diagnostic for OLS in dimension reduction based on recent results from Prendergast & Smith (2010). This influence diagnostic can be used to assess the effectiveness of OLS as a dimension reduction method under various assumed models. The second purpose is to discuss the suitability of response transformations when using OLS. To do so, we use our influence diagnostic to show that simple response transformations can lead to very notable improvements in estimation.
We discuss OLS in Section 2 and show by example how poorly it can perform. We also explain how response transformations are supported by the theory. In Section 3 we introduce the influence diagnostic for OLS before establishing the associated diagnostic following a response transformation in Section 4. This section affords general insights as to how and when simple transformation functions can improve estimation. Finally, the main findings of the paper are summarized in Section 5 and further research is discussed.

Sufficient summary plots using ordinary least squares
Consider a random response variable denoted Y ∈ R and a vector of predictors x ∈ R p . The MLR model assumes Y = β 0 + β ⊤ x + ǫ where β 0 is the unknown intercept coefficient and β is the unknown slope vector consisting of regression coefficients for the predictors. Here ǫ is the random error assumed independent of x with E(ǫ) = 0. Let {Y i , x i } n i=1 denote a random sample of n observations drawn from the MLR model. Then, based on this random sample, OLS is an efficient estimator of both β 0 and β. The population OLS slope vector is denoted where Σ = var(x) and Σ xy = cov(x, Y ). Under the MLR conditions b = β. The OLS estimator for the slope vector is then b = Σ −1 Σ xy where Σ and Σ xy are the usual sample-based estimators of Σ and Σ xy respectively.

The single-index model
An important development for OLS was provided by Brillinger (1977Brillinger ( , 1983 which allowed OLS to be applied to a wider variety of models. Li & Duan (1989) provided further generalizations and extensions, the result of which is a very diverse range of suitable applications for OLS. Continuing the previous definitions for Y , β, x and ǫ, the single-index model is of the form where f is an unknown link function on which there are no conditions (see for example Theorem 2.1 of Li & Duan, 1989). In seeking information regarding f , x ∈ R p can be replaced by β ⊤ x ∈ R without loss of information. Li (1991) and Carroll & Li (1992) refer to this as dimension reduction in a visualisation sense as Y is plotted against the lower-dimensional β ⊤ x enabling inference regarding the form of f . Li & Duan (1989) gave the following condition, generally known as the Linear Design Condition (LDC): This condition holds when x is elliptically symmetrically distributed (e.g. see Cook & Weisberg, 1991). Moreover, Hall & Li (1993) show that Condition 1 often holds approximately when p is large. Under Condition 1 and when the model in (2) holds, Li & Duan (1989) show that b = cβ for a c ∈ R. Consequently, when c = 0, OLS is expected to find the direction of β. Since f in (2) is unknown, it is only the direction of β that is important. For instance, for any γ which is a non-zero scalar multiple of β, the model can be re-specified as Y = g(γ ⊤ x, ǫ). A plot of Y versus γ ⊤ x is called a Sufficient Summary Plot (SSP, e.g. see Cook, 1998). The sample-based equivalent is referred to as an Estimated SSP (ESSP) for which the Y i 's are plotted against the γ ⊤ x i 's such that γ is an estimate to the direction of β. As a consequence, b can be used to obtain an ESSP in an effort to seek an appropriate link function when Condition 1 holds.

When OLS is ineffective as a dimension reduction estimator
OLS can often work very well under the framework of (2), yet in some cases it can fail spectacularly even when Condition 1 is satisfied. As noted in Section 2.1, the OLS slope vector under the required conditions is equal to b = cβ for c ∈ R. To successfully find the direction of β it is naturally required that c = 0. A cause of c = 0 is when f is symmetric around β ⊤ E(x) for an elliptically symmetric x. For example, suppose x ∼ N (0, I p ) and Y = (β ⊤ x) 2 + ǫ so that cov(x, Y ) = 0. OLS is expected to fail since b = 0. However, there are other situations for which OLS will fail that are not so obvious. To show this we will consider the following example.
where β = [1, m, 0, . . . , 0] ⊤ with m ∈ R. Initially, we expected OLS to provide a reasonable estimate of the direction of β due to the lack of symmetry of the link function about β ⊤ E(x) = 0. For example, plot (a) of Figure 1 depicts Y versus β ⊤ x when m = 3.25 where it is clear symmetric dependency is not evident. However, simulation results show that OLS is typically a poor estimator for choices of m close to 3.25 even for very large sample sizes. To assess OLS as an estimator we will report cor(Xβ, X b) 2 where X is the matrix whose ith row contains the predictor for the ith observation. This is the squared correlation between the true and estimated dimension reduced predictors. Plot (b) of Figure 1 provides boxplots of cor(Xβ, X b) 2 for varying m run over 1000 simulations with n = 1000 observations. It is clear that as m approaches 3.25, the ability of OLS to successfully find the direction of β deteriorates since cor(Xβ, X b) 2 is typically small for m in this vicinity.
To further explore the cause of OLS failing, we consider the population slope vector b = cov(x, Y ). For any m, it can be shown that As Li & Duan (1989) suggest, the OLS slope in (4) is a scalar multiple of β. However, b = 0 when m = ±3.258 and close to 0 for m in this vicinity. Hence OLS estimates the zero vector for this m resulting in poor performance.

Response transformations
The original results of Brillinger (1977Brillinger ( , 1983 assumed that the error term was additive. However, the generalization by Li & Duan (1989) to allow for a nonadditive error as in (2) is important when contemplating response transformations. Consider a function t : R → R. Under (1), we may write for a new and still unknown link function g. Now let denote the OLS slope vector with respect to the transformed Y . Since Condition 1 is unaffected by the transformation, b t is a scalar multiple of β providing that Condition 1 holds. Two obvious options for SSP's are: x to seek f with similarly defined ESSP's in practice. The second option is generally most appealing as it allows for examination of the predictors and response on the original scale. Consequently, it is important to realize that the transformation need only occur in the step involving the estimation of the direction of β. Prendergast (2008) introduced an influence diagnostic that can be used in practice to detect influential observations when using OLS for ESSP's. Here we introduce an influence diagnostic that can be used to provide general insights with respect to various models. The diagnostic shares similarities with Hampel's influence function (Hampel, 1974) and extends on work by Prendergast & Smith (2010) who introduce diagnostics for certain dimension reduction methods.

Influence analysis of OLS for ESSP's
Consider (Y, x) ∼ F and let the contamination distribution be defined as where ∆ (y0,x0) is the Dirac distribution that places all of its mass at the contamination point (y 0 , x 0 ), and ǫ denotes the proportion of contamination. Now let T be a functional for an estimator defined in both F and F ǫ . The Influence Function, IF, (Hampel, 1974) is defined by which can be used to quantify the relative influence of contamination on an estimator. For example, a power series expansion of T (F ǫ ) gives When ǫ is small, T (F ǫ ) ≈ T (F ) + ǫ IF(T, F ; y 0 , x 0 ) hence IF(T, F ; y 0 , x 0 ) can be used to quantify the approximate influence of the contamination. This type of sensitivity analysis has important connections with the empirical setting. For example, IF(y i , x i ) is proportional to the difference in estimation with and without the ith observation.
A generalization of the model in (2), although not applicable to OLS, allows for numerous vectors of regressor coefficients. For such a model, an influence diagnostic for some dimension reduction estimators based on average canonical correlations was introduced by Prendergast & Smith (2010). Whilst their diagnostic is applicable to the model in (2), it is not directly relevant to OLS due to scaling differences. For OLS, the influence diagnostic given by Prendergast & Smith (2010) is based on the squared correlation between the b ⊤ x and b ⊤ ǫ x where b is the OLS slope vector at F and b ǫ is the OLS slope vector at F ǫ . Recall the target for OLS is cβ ⊤ x for any non-zero c ∈ R. Consequently, if contamination results in b ǫ = c ǫ β where c ǫ is nonzero, then an influence diagnostic based on the squared correlation between b ⊤ x and b ⊤ ǫ x will appropriately indicate zero influence. We now consider the following theorem, the proof of which can be found in Appendix A.
can be written as Remark 3.1. The sensitivity analysis and resulting diagnostic that we consider within this paper is based on Hampels influence function (Hampel, 1974). However, further research could also explore other more general sensitivity analysis approaches that have been considered for least squares. One alternative could be to consider small perturbations of the design matrix X and vector of responses. Golub & Van Loan (1996) p. 242, for example, detail this approach allowing for the computation of error bounds. Other sensitivity approaches have also been discussed by Chatterjee & Hadi (1988). For example, Chatterjee & Hadi (1988) consider an asymptotic approach which examines the effect of measurement errors in X on estimates.
shares similarities with the sample-based leave-one-out diagnostic introduced by Prendergast (2008) and, from Theorem 1, we can use it to assess sensitivity of OLS as an estimator in the dimension reduction setting. The influence diagnostic can provide interesting general insights to OLS estimation in this setting. For example, even extreme predictor outliers can have very small influence regardless of the choice of accompanying response. We highlight this in the example below. Example 3.1. For simplicity suppose that µ = 0 and Σ = I p . Also suppose that x 0 is in the same direction as β and therefore in the same direction as b.
This also holds for x 0 in the opposite direction to β.
Zero influence resulted in Example 3.1 since the contaminant regressor x 0 was in the same, or opposite, direction to β. The corollary below is useful for assessing the role that the direction of x 0 has in exertion of influence on OLS.
Corollary 1. Let z 0 = Σ −1/2 (x 0 − µ) and θ 0 denote the angle between z 0 and Σ 1/2 b. Then the influence diagnostic can be represented as Proof. The proof is straightforward by noting that, since ( From Corollary 1, the influence diagnostic is a function of Σ −1/2 (x 0 − µ) and cos 2 (θ 0 ). We now consider another example to highlight the role of the angle between the predictor contaminant and b.
Example 3.2. For simplicity suppose µ = 0 and Σ = I p , and let b = 1 and y 0 = E(Y ) so that y 0 is a typical response. Then, from Corollary 1, we have ρ b (y 0 , x 0 ) = x 0 These plots also show that even extreme predictor outliers (i.e. large x 0 ) may not have a large influence. For example, when cos(θ 0 ) = 0 or ±1 (which occurs when x 0 = cb for some c ∈ R) then ρ b (y 0 , x 0 ) = 0.
Prendergast & Smith (2010) also reported the expected value of the influence diagnostic for the contaminant equal to a random (Y, x) from F . Noting that the influence function can be used to derive the asymptotic variance of an estimator as, for an estimator with functional T , the asymptotic variance, at F , is will provide a useful means for which to compare estimators and we report this in the following corollary to Theorem 1.
Proof. Using similar arguments to those by Prendergast & Smith (2010) in the proof of their Corollary 3, it can be shown that Although we do not discuss Corollary 2 further here, the results will be important in the considerations provided in the next section.

Influence following response transformations
As discussed in Section 2.3, response transformations may be employed without disrupting the theory. Here we consider such response transformations and use the influence diagnostic introduced previously to compare the estimator with and without transformation. In the following corollary we provide the influence diagnostic from Theorem 1 following response transformation. This result also holds for transformation functions that are affected by contamination, see Appendix A, when Condition 1 holds. Thus we have two cases: (i) t is unaffected by contamination, (ii) t is affected by contamination but Condition 1 holds.
Corollary 3. For both cases consider a response transformation function t defined at F which may be different at F ǫ . The influence diagnostic presented in Theorem 1 is Note that when the model in (2) and Condition 1 hold, b and b t are scalar multiples of β. Hence, provided they are not the zero vector, P b and P t are identical. Subsequently, a comparison of OLS with and without a response transformation can be carried out by comparing r 2 t,0 / Σ 1/2 b t 2 and r 2 0 / Σ 1/2 b 2 . Next we will consider some example transformations.

Log-transformation for exponential growth models
Consider an exponential growth model defined by for an a ∈ R + , b 0 ∈ R, b 1 ∈ R and σ ∈ R + . This model satisfies the model in (2) so that the OLS slope vector is equal to cβ for a c ∈ R. Ignoring trivial cases such as a = 1 or b 1 = 0, OLS is expected to find the direction of β. Throughout suppose that x ∼ N (µ, Σ), ǫ ∼ N (0, 1) and define β such that β ⊤ Σβ = 1. This is not a restriction since b 1 in (10) can be chosen to be any value.
To compare influence with and without the log transformation, we will derive E[ρ b (Y, x)] and E[ρ log (Y, x)], the expected influence at (Y, x) ∼ F for the original and log transformed OLS estimators respectively. These are provided below and the associated derivations can be found in Appendix B.
For ln denoting the natural log we have Following log transformation of Y , with any base, which does not depend on the base chosen or a. Interestingly, for a fixed error variance σ 2 , the expected influence following a log transformation decreases with increasing b 1 . This is not necessarily the case with (11)     poorly due to the large expected influence. This is not surprising due to the small contribution of b 1 β ⊤ x to Y relative to the error. As |b 1 | increases, we see a marked difference in the expected performance of both approaches. For the usual OLS approach performance quickly deteriorates even for moderate sized b 1 . For the model, extremely large responses are possible when b 1 is not small which should highly influence OLS. Conversely, OLS following a log transformation is expected to perform well with small expected influence and, as shown in (12), improve further for increasing |b 1 |. Using the log transformation, sensitivity of OLS to very large response values is reduced. This can be seen in Corollary 3 where the diagnostic depends on the log of the response only. We saw similar results for other choices of a.
For verification, we now give our simulated results in Tables 1 and 2 for varying a and b 1 respectively. Each simulation was run 1000 times with n = 50.

Table 2 Values of E[ρ b (Y, x)] (OLS with no transformation) and E[ρt(Y, x)] (OLS following log
transformation) for Model 1 with varying b 1 and fixed a = 2 including average cor(Xβ, X b) 2 (standard deviations, SD, in parentheses) for 1000 trials of n = 50 randomly sampled observations Here x ∼ N (0, I p ), ǫ ∼ N (0, 1), β = [1, −2, 0, 0.5, 0, . . . , 0] ⊤ and p = 10. From Table 1, a significant improvement due to the log transformation is evident. In- , which show that without transformation, increasing a is expected to diminish performance of OLS. This is emphasized by the mean cor(Xβ, X b) 2 decreasing and associated standard deviation increasing with a. Conversely, the transformation gives small, approximately constant mean cor(Xβ, X b) 2 with small standard deviation across all values of a. From Table 2, increasing b 1 results in standard OLS deteriorating in performance, both in mean influence and the correlations. In contrast a vast improvement has been achieved using the log transformation which, as mentioned previously, has improved performance with increasing b 1 .

Response-discretization transformations
Let the discretization function be where D h ∈ R with D 1 < D 2 < · · · < D H and S 1 , . . . , S H are non-overlapping subranges of the range of Y such that H h=1 S h = range(Y ). We denote P (Y ∈ S h ) as p h . The attractive feature of this transformation is that it retains some co-variability between the response and predictor whilst limiting the minimum and maximum transformed response values. This transformation shares some similarities with SIR (Li, 1991) where the role of the response is to determine positioning of observations within slices. Consider the model of the form where f is a strictly increasing link function, x ∼ N (µ, Σ), ǫ ∼ N (0, 1) and σ ≥ 0. Before we move on, we will provide the form of b t for the model in (14) under the assumption of a normal x; the proof can be found in Appendix C.
Corollary 4. For the model in (14) where f is a monotonic increasing link function, x ∼ N (µ, Σ), ǫ ∼ N (0, 1) and σ ≥ 0, the OLS slope vector following the discretization described in (13) is where Z j is the (j/H) × 100th percentile from the N (0, 1) distribution such that P (Z ≤ Z j ) = j/H for Z ∼ N (0, 1) and where φ denotes the standard normal probability density function.
A simple choice for the D h 's in (13) is to let D h = h and for the S h 's to be chosen such that P (Y ∈ S h ) = 1/H. For now we will proceed with this approach and refer to this transformation as H-fold equally weighted response discretization. It should be noted for this transformation function, t depends on F since the S h 's depend on F . As such, when applied to F ǫ , the contamination can affect a change in t through changes to the S h 's. However, Corollary 3 holds when Condition 1 holds.
Corollary 5. Suppose that (Y, x) ∼ F and where x ∼ N p (µ, Σ). Then, for t being the discretization function described in (13) with D h = h and p h = 1/H for all h = 1, . . . , H (H-fold equally weighted response discretization), Proof. The proof is straightforward from Corollary 3 when noting that t(Y ) is a discrete uniform random variable such that Var[t(Y )] = (H 2 − 1)/12.
We now consider the effects of H-fold equally weighted response discretization on a general model with monotone link function. From Corollary 4, b t is a simple expression in terms of the standard normal density and specific standard normal percentiles that are dependent on H. When applying this result to Corollary 5, it is possible to calculate the expected sensitivity E{ρ F [t(Y ), x]}.
In Table 3 we present values of E{ρ F [t(Y ), x]}/(p − 1) for the model in (14) when x ∼ N (0, I), ǫ ∼ N (0, 1) and β = 1, for various choices of H and σ. For the three choices of σ, E{ρ F [t(Y ), x]}/(p−1) decreases with increasing H. Many other choices of H were also considered, each showing similar decreases although Table 3 Values of E{ρ F [t(Y ), x]}/(p − 1) following H-fold equally weighted response discretization for the model in (14) with various H and σ and with x ∼ N (0, I) and β = 1 the difference became negligible for larger H. The limit of E{ρ F [t(Y ), x]}/(p−1) is unknown, however simulations suggest it should be very similar to that for H = 1000. These results imply a reduction in estimator variability for this model when H is chosen to be as large as possible. In practice when there are a finite number of observed responses, n, this would suggest an n-fold discretization such that each observed response is replaced with its respective rank. At F , the rank transformation OLS slope is A benefit of the rank transformation least squares estimator in (15) is that H does not need to be chosen. It should be noted that the rank transformation has been widely considered and discussed with respect to OLS in the multiple linear regression contex when response normality is violated. However, the rank transformation in OLS here is simpler as it does not involve additional complications to ensure that the unique β is estimated. For example, Cuzick (1988) considers models of the form given in (14), where f −1 exists, with an emphasis on one predictor variable, p = 1, but where extensions to p > 1 are provided. The approach involves replacing the response with a score based on rank and also requires estimation of f −1 to target the specific predictor coefficient. Other approaches, such as R-estimates, again under the multiple linear regression model (see, for e.g., Chapter 3 of Hettmansperger & McKean, 1998), involve minimization over an objective function based on residuals and scores of the ranks of residuals. Our approach in (15) is simple since (i) it requires no knowledge of estimation of f , and (ii) given the ranks of the observed responses, the estimate in practice exists in a simple closed form and does not require computational minimization over an objective function.
Example 4.2. Recall the model in (3). Earlier we discussed how poorly OLS estimated the direction of β when m was close to 3.25. We showed that OLS without transformation was estimating the zero vector as opposed to the direction of β for this m, causing the cor(Xβ, X b) 2 to decrease significantly. We now demonstrate how discretization can result in a marked improvement to the results. Here we consider the rank transformation with a large sample size of n = 1000, thereby replacing all response values with their corresponding rank. r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k   Figure 1, show the estimate of the direction β deteriorating as m approaches 3.258 for standard OLS. However the grey boxplots show that good results can be obtained in this vicinity of m using the rank transformation. This does not necessarily mean that the rank transformation works well for all m, since small correlation values now occur at approximately 3.75 whereas OLS without transformation performs typically well here. Therefore, if both methods were used in practice, it is likely that one if not both approaches would provide a good ESSP.
In Figure 5 we show plots of the expected influence for these models, with and without the H-fold equally weighted response transformation. It is clear the expected influence decreases with the transformation, particulary for larger values of H. The exponential model is notably affected where the expected influence for this model increases rapidly with increasing b 1 . The large H transformation significantly decreases the influence of both models, indicating the transformation is expected to improve the performance of OLS.
Our simulated results, shown in Figures 6 and 7, complement these findings. It should be noted that for each simulation, x ∼ N (0, I p ) and β = [b 1 , 0, . . . , 0] ⊤ , where b 1 ∈ R and p = 5. For simplicity we consider the rank transformation. These boxplots show the cor(Xβ, X β) 2 for 1000 simulated trials with 100 observations given by OLS with and without transformation for increasing values of b 1 . Here we have three choices for σ to compare how these models are affected by small and large amounts of error. These results demonstrate how estimation has been improved, with rank transformation consistently giving better results compared to general OLS. It is also clear how these boxplots correspond with the expected influence, with our correlation results resembling the trends shown in Figure 5.

Discussion and further research
In this paper we have demonstrated how an influence diagnostic, similar to the influence function (Hampel, 1974), can be used to assess performance of OLS for single-index models. Importantly, we have shown how simple response transformations can greatly improve OLS. This was done with respect to average influence under some proposed models and further highlighted with some simulation studies. Although the log and response discretization transformations were the focus of this paper in an application sense, the results themselves are quite general and will be useful for further explorations. For example, the two extensions below are currently being investigated as part of the lead author's doctorate studies: • Recently, Prendergast & Sheather (2013) carried out a sensitivity study of inverse response plot estimation and their simulation studies indicated that robust M -estimators could provide improved estimates compared to OLS even for normal data. The results of this paper can be used to theoretically compare inverse response plot estimators of the coefficient vector for various response transformations. For example, the rank transformation should be well suited to this type of estimation and other linearization transformations under assumptions such as normality are also being explored. This includes the performance of quantile transformations under normality assumptions for x and include a sensitivity analysis when x is non-normal. • Li (1992) introduced a method call Principal Hessian Directions (pHd) that can be used in the multiple index setting allowing for more than one direction. As with OLS, very large or small observed values for the 0.0 0.5 1.0 o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6 0.0 0.5 1.0 o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6 0.0 0.5 1.0 o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6 response can have a large influence on estimation. In developing the work seen here, we are extending the theory to allow pHd to be used. Response transformations, such as those studied in this paper, are currently being considered as part of the first author's PhD dissertation.
To summarize, since response transformations still allow OLS to be used for single-index models with exploration sill on the original response scale, we recommend routinely considering response transformations in this setting. The 0.0 0.5 1.0 o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6 0.0 0.5 1.0 o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6 0.0 0.5 1.0 o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k o l s r a n k 0.1 0.6 1.1 1.6 2.1 2.6 3.1 3.6 response transformations we considered can be easily implemented in any statistical software package and can yield big improvements to estimation.

Appendix A
Although F is a joint distribution function for (Y, x), for simplicity throughout we take Y ∼ F to mean Y distributed according to its marginal distribution from F . This simplicity is also carried out for x and similarly for F ǫ .
(22) Let b t be the OLS slope functional with respect to a transformed response (t(Y ) at F or t ǫ (Y ) at F ǫ ). Also, let C denote the functional for the usual estimator of the covariance of x. Then b t (F ǫ ) = [C(F ǫ )] −1 cov Fǫ [t ǫ (Y ), x] so that, by using the Product Rule,