Semiparametric modeling and estimation of heteroscedasticity in regression analysis of cross-sectional data

: We consider the problem of modeling heteroscedasticity in semi- parametric regression analysis of cross-sectional data. Existing work in this setting is rather limited and mostly adopts a fully nonparametric variance structure. This approach is hampered by curse of dimensionality in practi-cal applications. Moreover, the corresponding asymptotic theory is largely restricted to estimators that minimize certain smooth objective functions. The asymptotic derivation thus excludes semiparametric quantile regression models. To overcome these drawbacks, we study a general class of location-dispersion regression models, in which both the location function and the dispersion function are semiparametrically modeled. We establish uniﬁed asymptotic theory which is valid for many commonly used semiparametric structures such as the partially linear structure and single-index structure. We provide easy to check suﬃcient conditions and illustrate them through examples. Our theory permits non-smooth location or dispersion functions, thus allows for semiparametric quantile heteroscedastic regression and ro- bust estimation in semiparametric mean regression. Simulation studies indicate signiﬁcant eﬃciency gain in estimating the parametric component of the location function. The results are applied to analyzing a data set on gasoline consumption.


Introduction
The problem of heteroscedasticity, which traditionally means nonconstant variance function, frequently arises in regression analysis of economic data. In this paper, we broaden the scope of heteroscedasticity by considering a general class of location-dispersion regression models, where the relation between a response variable Y and a covariate vector X is given by Y = m(X) + σ(X)ε.
(1.1) In the above model, ε denotes the random error, m(·) is called regression function and the nonnegative function σ(·) is called dispersion function. With different specifications on ε, this formulation includes both the conditional mean and conditional median (or more general quantile) regression models.
In the literature, model (1.1) has been thoroughly investigated for parametric mean regression, where m(·) is characterized by a finite-dimensional parameter and E(ε|X) = 0 (Davidian, Carroll and Smith, 1988;Zhao, 2001;Chapter 11, Greene, 2002, among others). This paper focuses on semiparametric regression models, which are less studied but are extremely useful due to their flexibility to accommodate nonlinearality and to circumvent curse of dimensionality (Härdle, Liang and Gao, 2000;Ruppert, Wand and Carroll, 2003;Yatchew, 2003). In particular, we consider the general setup with m(x) = m(x, α 0 , r 0 ), where α 0 is a finite dimensional parameter and r 0 is an infinite dimensional parameter. The main interest is often in making inference about α 0 while treating r 0 as a nuisance parameter, which can only be estimated at a slower than √ n nonparametric rate.
In semiparametric regression models, the commonly used estimation procedures in general still yield consistent estimators for α 0 even if heteroscedasticity is not accounted for. However, efficiency loss due to ignoring heteroscedasticity may be substantial. Moreover, the correctness of the standard error formula for α 0 and the validity of the associated confidence intervals or hypothesis testing procedures depend critically on the dispersion function (Akritas and Van Keilegom, 2001;Carroll, 2003). In addition, it is sometimes important to model the dispersion function in order to obtain a satisfactory bandwidth for estimating the nonparametric part of the regression function. Ruppert et al. (1997) provided such an example, where the heteroscedasticity is severe and the variance function has to be estimated in order to obtain a good bandwidth for estimating the derivative of the mean function.
In the semiparametric regression setting, Schick (1996), Liang, Härdle and Carroll (1999), Härdle, Liang and Gao (2000, §2), Ma, Chiou and Wang (2006) have studied heteroscedastic partially linear mean regression models, where the variance function σ 2 (x) is assumed to be smooth but unknown. They estimate the variance function nonparametrically, and then use the estimator to construct weights to achieve more efficient estimation of the parametric component of the mean regression function. Härdle, Hall and Ichimura (1993) investigated heteroscedastic single-index models, so did Xia, Tong and Li (2002); and Chiou and Müller (2004) proposed a flexible semiparametric quasi-likelihood, which assumes that the mean function has a multiple-index structure and the variance function has an unknown nonparametric form.
The aforementioned work, however, suffers from several drawbacks. First, they have all adopted a fully nonparametric model for the variance function.
This approach does not work well in high dimension due to curse of dimensionality. Second, their asymptotic theory can only be applied to a specific semiparametric structure. Third, these methods require a smooth objective function thus do not apply to semiparametric quantile regression models, see for instance He and Liang (2000), Lee (2003), Horowitz and Lee (2005). In fact, existing study of heteroscedastic quantile regression is restricted to parametrically specified quantile function (Koenker and Zhao, 1994;Zhao, 2001). The last point is also relevant when one is interested in robust estimation for the mean regression model in the presence of outlier contamination.
We also like to mention Müller and Zhao (1995), who consider a general semiparametric variance function model in a fixed design regression setting. In their model, the regression function is assumed to be smooth and is modeled nonparametrically, whereas the relation between the variance and the mean regression function is assumed to follow a generalized linear model. However, although the variance function has both a parametric and nonparametric component, and so can be considered as being semiparametric, its model differs quite a bit from the semiparametric model we use in this paper.
The above concerns motivate us to propose a flexible semiparametric framework for modeling heteroscedasticity and to develop a unified theory that applies to general semiparametric structures and non-smooth objective functions. In particular, we advocate to adopt a semiparametric structure for modeling the dispersion function. This approach avoids the rigid assumption imposed by a parametric dispersion function; at the same time it circumvents the curse of dimensionality introduced by a nonparametric dispersion function. In this general framework, we establish an asymptotic normality theory for estimating the form of heteroscedasticity by building on the work of Chen, Linton and Van Keilegom (2003), who developed a general theory for semiparametric estimation with a non-smooth criterion function. We provide a set of easy to check sufficient conditions, such that the asymptotic normality theory is valid for many commonly used semiparametric structures, for instance, the partially linear structure and the single-index structure. We discuss but do not get deep into how the knowledge of heteroscedasticity can be used to construct a more efficient weighted estimator for the parametric component of m(·).
We discuss two different constraints for the random error ε in (1.1): the mean zero constraint and the median zero constraint, which correspond to mean regression and median regression, respectively. Although the current theory is restricted to cross-sectional data, the ideas and techniques can be applied to extend to time-series models for heteroscedastic economic and financial data, such as the autoregressive conditional heteroscedastic (ARCH) model of Engle (1982).
The paper is organized as follows. In Section 2 we formally introduce the semiparametric location-dispersion model and discuss how to estimate the dispersion function. Section 3 provides generic assumptions that are applicable to general semiparametric models, and presents the asymptotic normality theory for estimating the dispersion function. In Section 4 we verify these generic conditions for two particular semiparametric models. The finite sample behavior of the proposed methods is examined in Section 5, while Section 6 is devoted to the analysis of data on gasoline consumption. In Section 7 some ideas for future research are discussed. Finally, all proofs are collected in the Appendix.

Semiparametric location-dispersion model
We consider a general semiparametric location-dispersion model: where X = (X 1 , . . . , X d ) T is a d-dimensional covariate vector with compact support R X , α 0 and β 0 are finite dimensional parameters, and r 0 and g 0 are infinite dimensional parameters. Let ( The conditions that need to be imposed on ε to make the model identifiable are given in Section 2.3 (mean regression) and Section 2.4 (median regression).
The dispersion function is assumed to have a general semiparametric structure. This paper discusses two examples in detail (Section 4), corresponding to the exponentially transformed partially linear structure σ(X, β 0 , g 0 ) = exp(β T 0 X (1) + g 0 (X (2) )) with X = (X T (1) , X T (2) ) T and the single-index structure σ(X, β 0 , g 0 ) = g 0 (β T 0 X), respectively. We assume that the unknown function g 0 belongs to some space G of uniformly bounded functions that depend on X and β through a variable U = U (X, β), where β belongs to a compact set B in R ℓ , with ℓ ≥ 1 depending on the model (e.g. U (X, β) = X (2) and U (X, β) = β T X for the above partial linear and single index structures respectively). For any function g, the notation g β will be used to indicate the (possible) dependence on β. The estimator of the true g 0 will in fact in many situations be a profile estimator, depending on β (see the examples in Section 4). For notational convenience we use the abbreviated notation (β, g) = (β, g β (·)), (β, g 0 ) = (β, g 0β (·)) and (β 0 , g 0 ) = (β 0 , g 0β0 (·)), whenever no confusion is possible. Whenever needed, we will replace σ(X, β, g) by σ(X, β, g β ) or σ(X, β, g β (U )) to highlight the dependence of the function g on the parameter β or on the variable U (note that this implies that the third argument of the function σ can be a function in G or an element of R, depending on which notation we use).
To keep the notations and presentation simple, we assume that both g 0 and U are one-dimensional. However, all the results in this paper can be extended in a straightforward way to the multi-dimensional case. For example, we may have g 0 = (g 01 , . . . , g 0k ) for some k ≥ 1, which allows a multiplicative model for σ of the form σ(x) = d j=1 g 0j (x j ). Although we will discuss later how to use the estimated dispersion function to construct a more efficient estimator for α 0 , our main interest in this paper is to establish a general theory for estimating β 0 and σ(X, β 0 , g 0 ). Therefore, we will simply write m 0 or m 0 (X) to denote the regression function in the sequel.
Taking the derivative of (2.5) with respect to β, then replacing s(X (2)i ) and m 0 (X i ) by their respective estimators, leads to the following system of equations in β: Finally, the variance function can be estimated byσ 2 (x) = exp(2β T x 1 )ŝβ(x 2 ). The above procedure can be iterated until convergence, where at each step the estimatorβ * is updated, and the estimated variance function is used to improve the estimator of m 0 . The estimating equation (2.6) is obtained by the backfitting method. An alternative approach is to first replace s(X (2)i ) with the estimatorŝ β (X (2)i ) in (2.5), and then take the derivative with respect to β. One then also needs to take into account the dependence ofŝ β (X (2)i ) on β. This latter approach leads to the so-called profile estimator. We focus our attention on backfitting type estimators in this paper, see also Remark 2.1 in Section 2.3.
Letm be an estimator of m 0 , which can be taken (in this first step) as the estimated regression function under the homoscedasticity assumption. Let g(u) be an appropriate estimator of g 0 (u) that is differentiable with respect to u. In many situations, the estimatorĝ depends on β, see for instance the motivating example in Section 2.2; we will therefore denote it byĝ β whenever the dependence on β is relevant. We estimate the weight w 0 (x) byŵ(x) = σ −4 (x,β * ,ĝβ * ), whereβ * is (in this first step) the unweighted least squares estimator, i.e.β * satisfies H n (β * ,ĝβ * ,m, 1) = 0, where with h(x, y, β, g, m, w) defined in (2.7). Now, defineβ as the solution in β of the equations H n (β,ĝ β ,m,ŵ) = 0. (2.8) We estimate the variance function σ 2 (x, β 0 , g 0 ) byσ 2 (x) = σ 2 (x,β,ĝβ). This procedure can be iterated until convergence, where at each step we update the estimatorβ * and we re-estimate m 0 by using a weighted estimation procedure that takes the heteroscedasticity into account via the estimated variance function.
Remark 2.1. Note that in the formula of H n (β,ĝ β ,m,ŵ) the derivative ∂ ∂β σ 2 (x, β,ĝ β ) is obtained without taking into account thatĝ β depends on β (i.e. we first calculate the derivative ∂ ∂β σ 2 (x, β, g) and then plug-in g =ĝ β , thus ∂ ∂β σ 2 (x, β, g β ) = ∂ ∂β σ 2 (x, β, g)| g=ĝ β ). As a consequence, our general estimation procedure does not cover profile estimation methods (where the derivative of σ 2 (x, β,ĝ β ) takes the dependence ofĝ β on β into account). However, it is easy to extend our method to profile estimators. See Section 7 for more details. For a comparison of the backfitting estimator and the profile estimator, we refer to the recent paper of Van Keilegom and Carroll (2007) and the references therein.

Estimation of the dispersion function with zero median errors
Now we consider the estimation of the dispersion function when it is assumed that med(ε|X) = 0 in model (2.1), which implies that m 0 (X) = med(Y |X). This can be straightforwardly extended to general quantile regression.
For identifiability of σ(x), we need some additional assumption on the distribution of the random error. The assumption med(|ε||X) = 1 leads to σ(X, β 0 , g 0 ) = med(|Y − m 0 (X)| |X) (median absolute deviation). An alternative common assumption is E(|ε| |X) = 1, which leads to σ(X, β 0 , g 0 ) = E(|Y − m 0 (X)| |X) (least absolute deviation). The second case is technically easier to deal with than the first. We therefore concentrate on the first case, see also Remark 3.6 in Section 3.3.
Keeping the same notations as in Section 2.3, and writing model (2.1) as Note that the choice of the weight fucntion is motivated from efficiency considerations. In fact, in the weighted least squares procedure that we used for the setting of zero mean errors, the weights were equal to the variance of the 'errors' in the 'model' (Y − m 0 (X)) 2 = σ 2 (X, β 0 , g 0 ) + σ 2 (X, β 0 , g 0 )(ε 2 − 1). In the current setting of zero median errors, a similar argument is used, but which is now based on the median absolute deviation of the errors instead of the mean squared deviation.
Letm andĝ be appropriate estimators of m 0 and g 0 , depending on the imposed model on the regression and dispersion function. Suppose thatĝ(u) is differentiable with respect to u. We estimate the weight function w 0 (x) byŵ(x) = σ −1 (x,β * ,ĝβ * ), where we define the preliminary estimatorβ * as the solution of the non-weighted minimization problem: β, g, m, w), and where · denotes the Euclidean norm. Finally, let β = argmin β H n (β,ĝ β ,m,ŵ) .
As before, this procedure can be iterated to improve the estimation of β 0 . Note that the function h is not smooth in β and henceβ does not necessarily satisfy H n (β,ĝβ,m,ŵ) = 0.

Notations and assumptions
The following notations are needed. Let f (y|x) = F ′ (y|x) be the density of Y given X = x, and let g ′ (u) = ∂g(u) ∂u for any g ∈ G. For any function g ∈ G, k ∈ K and m ∈ M (where K and M are the spaces to which the true functions g ′ 0 and m 0 belong respectively), we denote g ∞ = sup β∈B sup x∈RX |g β (u(x, β))|, k ∞ = sup β∈B sup x∈RX |k β (u(x, β))| and m ∞ = sup x∈RX |m(x)|. Also, N (λ, G, · ∞ ) is the covering number with respect to the norm · ∞ of the class G, i.e. the minimal number of balls of · ∞ -radius λ needed to cover G (see e.g. Van der Vaart and Wellner (1996)). Finally, Below we list the assumptions that are needed for the asymptotic results in Subsections 3.2 and 3.3. The purpose is to provide easy-to-check sufficient conditions such that the asymptotic results are valid for general semiparametric structures, and for both mean and median semiparametric regression models. The A and B-conditions are on the estimatorsĝ andm respectively, whereas all other conditions are collected under the C-list. In Section 4 we check these generic conditions for particular models and estimators of m 0 (X) and σ(X, β 0 , g 0 ).
n → 0, and K 1 is a symmetric and continuous density of order q ≥ 2 with compact support.
Assumptions on the estimatorm where s = 1 for mean regression and s = 2 for median regression. Moreover, P (m ∈ M) → 1 as n tends to infinity.
nb 4 n → 0, and K 2 is a symmetric and continuous density with compact support.

Other assumptions
(C1) For all δ > 0, there exists ǫ > 0 such that inf β−β0 >δ H(β, g 0 , m 0 , w 0 ) ≥ ǫ > 0. (C2) Uniformly for all β ∈ B, H(β, g, m, w) is continuous with respect to the norm · ∞ in (g, m, w) at (g, m, w) = (g 0 , m 0 , w 0 ), and the matrix Λ defined in Theorem 3.1 and 3.3 is of full rank. (C3) The function (x, β, z) → σ(x, β, z) is three times continuously differentiable with respect to z and the components of x and β, and all derivatives are uniformly bounded on to the components of x and β, and all derivatives are uniformly bounded on R X × B. Moreover, the function (u, β) → g 0β (u) is continuously differentiable with respect to u and the components of β and the derivatives are uniformly bounded on R U × B.
where α is the largest integer strictly smaller than α. Then, by Theorem 2.7.1 in Van der Vaart and Wellner (1996), the condition on the covering number in (A2) is satisfied if G belongs to C α M (R U ) with α > 1/2 for s = 1 and α > 1 for s = 2.
For specific examples of g 0β (u), we refer to Section 4, particularly (4.2) and (4.4). Note that if u(x, β) does not depend on β (like for the partial linear model), then the conditions related to the derivativeĝ ′ and the space K (see (A1) and (A2)) can be omitted. On the other hand, if u(x, β) does depend on β, but ∂ ∂β σ(x, β, g) is linear in g ′ (u(x, β)), then it can be easily seen that the condition sup β−β0 ≤δn sup x∈RX |(ĝ ′ β − g ′ 0β )(u(x, β))| = o P (n −1/4 ) is not necessary. Remark 3.3. Note that assumption (B3) requires that the regression function estimator involves at most univariate smoothing, which is the case for e.g. the partial linear, single index or additive model for the regression function, but not for the completely nonparametric model. It is possible to adapt this condition to allow for the completely nonparametric case as well, but we believe that whenever a semiparametric model is assumed for the variance function, it makes more sense to consider a semiparametric model for the regression function as well.
Remark 3.4. When the data are not i.i.d., the assumptions under which the main results are valid, change. In fact, these assumptions are obtained by applying the results in Chen, Linton and Van Keilegom (2003). Part of the latter paper is valid for general not necessarily i.i.d. data (namely their Theorems 1 and 2), whereas Theorem 3 is restricted to i.i.d. data. For that reason, the extension of the present paper to clustered or longitudinal data consists in replacing the assumptions that rely on Theorem 3 by corresponding assumptions valid for dependent data. The assumptions that are affected are the assumption on the bracketing number in (A2) and (B2), and assumptions (A3) and (B3), to which it should be added that the sum in the representation ofĝ 0 (u) andm(x) converges to a normal limit.

Asymptotic results with zero mean errors
In the following theorem, we give the Bahadur representation and the asymptotic normality of the estimatorβ under the general generic conditions given in Section 3.1, and under the assumption that E(ε|X) = 0 and Var(ε|X) = 1. Since β 0 is often associated with important factors such as treatment effects, the estimation of β 0 is sometimes of independent interest, as it tells us how the treatment affects the dispersion of the response variable in addition to its effect on the location.
The proof is given in the Appendix. We use the notation d dβ σ 2 (x, β, g β ) to denote the complete derivative of σ 2 (x, β, g β ) with respect to β, i.e., σ 2 x, β + τ e j , g β+τ ej (u(x, β + τ e j )) where e j has the jth entry equal to one and all the other entries equal to zero, j = 1, . . . , ℓ.
Based on the asymptotic results for β 0 , we can establish the asymptotic normality ofσ 2 (x) = σ 2 (x,β,ĝβ). The theorem is given below and its proof can be found in the Appendix.
Theorem 3.2. Assume that the conditions of Theorem 3.1 hold true. Then, for any fixed x ∈ R X , Note that the estimatorβ does not contribute to the asymptotic variance ofσ 2 (x), since its rate of convergence is faster than the nonparametric rate (na n ) 1/2 . Remark 3.5. Note that the estimation of the regression function m 0 can now be updated, by using a weighted least squares procedure, where the weights are given by the inverse of the estimated variance functionσ 2 (x) = σ 2 (x,β,ĝβ). This leads to more efficient estimation of the regression function. As a special case, consider the partial linear mean regression model. Then, Härdle, Liang and Gao (2000) (Theorem 2.1.2, page 22) showed that whenever the estimated weights are uniformly at most o P (n −1/4 ) away from the true (unknown) weights, then the variance of the estimators of the regression coefficients is asymptotically equal to the variance of the estimator obtained by using the true weights. In our case the weights are at a distance O P ((na n ) −1/2 ) = o P (n −1/4 ) away from the true weights, and so their result applies, provided we can show that this rate holds uniformly in x ∈ R X . We claim that this can be shown, but the proof is long and technical and beyond the scope of this paper. Their result could be generalized to other semiparametric regression models, but we do not go deeper into this issue here (see also Zhao (2001) for a similar result in the context of linear median regression). It would also be of interest to consider the efficiency of the weighted least squares estimator relative to the unweighted one. We illustrate this issue in the simulation section, where we will calculate the variance of the unweighted and the weighted estimator for some specific models.

Asymptotic results with zero median errors
In Theorem 3.3 below, we give the Bahadur representation and the asymptotic normality of the estimator for β 0 under the assumption that med(ε|X) = 0 and med(|ε| |X) = 1. The conditional density of ε given X is denoted by f ε (·|X). where Theorem 3.4. Assume that the conditions of Theorem 3.3 hold true. Then, for any fixed x ∈ R X , Remark 3.6. The above two theorems can be easily adapted to the case where the dispersion function is defined by σ(x, β, g) = E(|Y − m(X)| |X = x) (i.e. E(|ε| |X) = 1). In fact, the formulas of the matrix Λ and of the function ξ can be similarly obtained by combining the calculations done in the proofs of Theorems 3.1 and 3.3. These calculations show that the parameter s in condition (B2) equals 2, whereas for condition (A2) s equals 1. We omit the details.

Examples
In this section we consider two particular semiparametric regression models, we propose estimators under these models and verify the conditions that are required for the asymptotic results of Section 3. The first example is a representative example for mean regression, the second one for median regression.

Single index mean regression model
In this first example we consider a mean regression model with a single index regression and variance function: where E(ε|X) = 0, E(ε 2 |X) = 1 and where g 0 is a positive function. In order to correctly identify the model, we assume that α 01 = β 01 = 1. This model has also been studied by Xia, Tong and Li (2002), using a different estimation method.
Letm(x) be an estimator of the unknown regression function m 0 (x) = r 0 (α T 0 x), like e.g. the estimator proposed in Härdle, Hall and Ichimura (1993). See also Delecroix, Hristache and Patilea (2006) for a more general class of semiparametric M -estimators of m 0 (x). Since the verification of conditions (B1) and (B2) is easier than of conditions (A1) and (A2), we concentrate in what follows on the verification of the A-conditions. First, define for any β ∈ R d , and letĝ where K 1a (v) = K 1 (v/a n )/a n , K 1 is a kernel function and a n a bandwidth sequence. For (A1), note that uniformly in u and β, provided na 2 n (log n) −2 → ∞ and inf β∈B inf x∈RX f β T X × (β T x) > 0 (where f β T X is the density of β T X). Forĝ ′ β , note that ∂ ∂β σ 2 (x, β, g) = g ′ (β T x)x is linear in g ′ (β T x), and hence, by Remark 3.2, we only need to show that ĝ ′ − g ′ 0 ∞ = o P (1). This can be shown using standard calculations. Next, let G = K = C 1/2+δ M (R U ) for some δ > 0. It follows from Remark 3.1 that the condition on the covering number of G and K in (A2) is satisfied. Moreover, sup u,β |ĝ β (u)| = sup u,β |g 0β (u)| + o P (1) = O P (1) (and similarly forĝ ′ β (u)), and sup β,u1,u2 |ĝ ′ β (u 1 ) −ĝ ′ β (u 2 )|/|u 1 − u 2 | 1/2+δ ≤ M provided na 4+2δ n (log n) −1 → ∞. Hence, P (ĝ β ∈ G) → 1 and P (ĝ ′ β ∈ K) → 1. For (A3), note that Let K 1 be a kernel of order q ≥ 3. Then, the first term above can be written as provided na 6 n → 0. The second term is a degenerate V -process (with kernel depending on n), and can be written as a degenerate U -process, plus a term of order O P ((nb n ) −1 ) = o P (n −1/2 ) provided nb 2 n → ∞. The U -process can be written out using Hajek-projection techniques, similar to the ones for regular degenerate U -statistics, which shows at the end (after long but straightforward calculations) that this term is o P (n −1/2 ) provided na n b n → ∞. Hence, (A3) holds true for η(x, y) uniformly in β and x, whereβ is between β 0 and β. It now follows thatβ − β 0 is asymptotically normal, with mean zero and variance given in Theorem 3.1.

Partially linear median regression model
The second model we consider is a median regression model with a partially linear regression function and an exponentially transformed partially linear dispersion function : where med(ε|X) = 0, E(|ε| |X) = 1, and X = (X T (1) , X (2) ) T , with X (1) = (X 1 , . . . , X d−1 ) T and X (2) = X d . For any β ∈ R d−1 and for m 0 ( wherem(x) =α T x 1 +r(x 2 ) is an estimator of the unknown regression function m 0 (x), see e.g. Härdle, Liang and Gao (2000, Chapter 2). Definê As in the previous example, we restrict attention to verifying the A-conditions. Since u(X, β) = X (2) does not depend on β, we do not need to check the conditions related toĝ ′ and K. Note that uniformly in x and β if na 2 n (log n) −2 → ∞. Hence, (A1) is satisfied, provided inf x2,β s 0β (x 2 ) > 0. For (A2) similar arguments as in the first example show that G = C 1 M (R X (2) ) can be used. Next, consider the verification of condition (A3). Using the property that for any x, y,

First consider
provided na 4 n → 0 and K 1 is a kernel of order 2. Next, note that the term B(x 2 ) is a degenerate V -process, because in the i.i.d. representation ofm(X i )−m 0 (X i ), each term has mean zero, and because E(ψ(Y i − m 0 (X i ))|X i ) = 0. Hence, as for the first example, we have that B(x 2 ) = o P (n −1/2 ). Finally, using the notation since sup x |d(x)| = o P (n −1/4 ) by (B1). Using e.g. Van der Vaart and Wellner (1996, Section 2.11), it can be shown that the process since P (m ∈ M) → 1 by condition (B2). It now follows that This finishes the proof for condition (A3). It remains to check (A4), which can be done in much the same way as in the first example.
We report results from 500 independent simulation runs. First, we compare the unweighted method with the weighted method for estimating α 0 . The unweighted method assumes that the dispersion function is constant; while the weighted method updates the estimatorα via the weighted L 1 regression where the weights are taken to be the reciprocal of the estimated dispersion function. Table 1 displays the bias and the MSE for estimating α 00 and α 01 , respectively. It also reports the simulated relative efficiency (SRE) for comparing the weighted and unweighted methods. The SRE is defined as SRE = MSE for estimating α 0 using the unweighted method MSE for estimating α 0 using the weighted method , where the MSE for estimating α 0 is defined as the sum of the mean squared errors for estimating each coordinate of α 0 . The simulation results suggest that the weighted method significantly improves the efficiency of estimating α 0 compared with the unweighted method. In all six cases, we observe an efficiency gain around 20-30% when using the weighted method.   Next, we consider estimating the dispersion parameter β 0 when the weighted method is used. Table 2 gives the bias and the mean squared error, which suggests that β 0 is estimated satisfactorily in all cases.
Finally, we give some idea on how well we estimate the nonparametric parts of the semiparametric model. More specifically, we consider case 6 and compare in Figure 1 the true curves of r 0 (x 2 ) and g 0 (x 2 ) with their respective estimates (averaged over the 500 simulation runs). The estimated curves are very close to the true curves. Results from all other cases are similar and not reported due to space limitation.

Analysis of gasoline consumption data
We illustrate the proposed method by means of a data set on gasoline consumption. The data were collected by the National Private Vehicle Use Survey in Canada between October 1994 and September 1996 and contain householdbased information (Yatchew, 2003). In this analysis, we use the subset of September data which consists of 485 observations. We are interested in estimating the median of the log of the distance traveled per month by the household (denoted by Y = dist) based on six covariates: X 1 = income (log of the previous year's combined annual household income before taxes which is reported in 9 ranges), X 2 = driver (log of the number of the licensed drivers in the household), X 3 = age (log of the age of driver), X 4 = retire (a dummy variable for those households whose head is over the age of 65), X 5 = urban (a dummy variable for urban dwellers), and X 6 = price (log of the price of a liter of gasoline). The scatter plots of the response variable versus each covariate are given in Figure 2.  We fit a heteroscedastic partially linear median regression model, which was motivated by a homoscedastic partially linear mean regression model by Yatchew (2003). More specifically, we assume dist = α 01 income + α 02 driver + α 03 age + α 04 retire + α 05 urban + r 0 (price) + exp[β 01 income + β 02 driver + β 03 age + β 04 retire + β 05 urban + g 0 (price)]ε, i.e. Y = α T 0 X (1) + r 0 (X (2) ) + exp(β T 0 X (1) + g 0 (X (2) ))ε, where X = (X T (1) , X (2) ) T with X (1) = (X 1 , . . . , X 5 ) T and X (2) = X 6 = price, and where r 0 (·) and g 0 (·) are two unknown smooth functions. For identifying the model we assume that med(ε|X) = 0 and E(|ε||X) = 1. The smoothing parameters are selected using the approach described in Section 5. The smoothing parameter for estimating the conditional median function is 0.02 and that for estimating the dispersion function is 0.05. Table 3 summarizes the estimated coefficients in the parametric parts of the conditional median function and the dispersion function. It is not surprising that households with larger income and more drivers tend to have higher median value of dist, and that retired people and urban dwellers tend to drive less. Table 3 also contains the standard errors of theα j 's andβ j 's. These are obtained using a model-based resampling procedure. More specifically, we estimate the parametric and nonparametric components in the above model and obtainα,β,r andĝ. We then generate a bootstrap sample (i = 1, . . . , n): j=1α j X ji +r(X 6i ) + exp( 5 j=1β j X ji +ĝ(X 6i ))ε * i , where the ε * i satisfy the constraints med(ε * i |X i ) = 0 and E(|ε * i ||X i ) = 1 (we use a normal distribution in the simulations). For each bootstrap sample, we re-estimate the α j 's and the β j 's. The standard errors are then calculated from these estimators based on 200 bootstrap samples. The results in Table 3 suggest that income, driver, retire and urban have significant effects on the conditional median function. Moreover, driver exhibits a significant effect on the dispersion function. Figure 3 displays the estimated nonparametric components. The plots indicate that for the majority of values of price, increased price is associated with reduced conditional median of dist. However, for the lowest and highest values of price, the effect of price on dist seems to be reversed. A similar pattern is observed for the effect of price on the dispersion function. Note however that for small and large values of price, the date are rather sparse, as can be seen from Estimates of the functions r 0 (price) and g 0 (price) for the gasoline consumption data.

Discussion
This paper considers a general class of semiparametric location-dispersion models. The theory we have developed focuses on how to estimate the dispersion function and the theoretical properties of the proposed estimators. The estimators we use are of the back-fitting type. Alternatively, one may consider profile estimators, which are obtained by replacing in the definition of h(x, y, β, g β , m, w) the partial derivative ∂ ∂β σ 2 (x, β, g β ) by the complete derivative d dβ σ 2 (x, β, g β ), i.e. profile estimators take into account that g β also depends on β. See Van Keilegom and Carroll (2007) for a detailed analysis of the pros and cons of profiling versus backfitting.
In the future, we would like to study in more detail the estimation of the mean or median using weighted least squares with weights equal to the inverse of the estimated dispersion function. The simulations in Section 5 suggest that the efficiency gain is quite substantial. A theoretical analysis of the relative efficiency will be very interesting. Certainly, we would also like to extend this class of models to time-series setting.