Kernel Function in Local Linear Peters-Belson Regression Función del núcleo en la regresión lineal local de Peters-Belson

Determining the extent of a disparity, if any, between groups of people, for example, race or gender, is of interest in many elds, including public health for medical treatment and prevention of disease or in discrimination cases concerning equal pay to estimate the pay disparities between minority and majority employees. The Peters-Belson (PB) regression is a form of statistical matching, akin in spirit to Bhattacharya's bandwidth matching which is proposed for this purpose. In this paper, we review the use of PB regression in legal cases from Bura, Gastwirth & Hikawa (2012). Parametric and nonparametric approaches to PB regression are described and we show that in nonparametric PB regression a suitable kernel function can improve results, i.e. by selecting the appropriate kernel function, we can reduce bias and variance of estimators, also increase the power of tests.


Introduction
In disparate treatment and equal pay discrimination cases, the salaries of minority/disadvantaged group (DG) members should be compared to those of similarly qualied majority/advantaged group (AG) employees.The PB regression, as well as ordinary regression with an indicator representing group status, have been accepted by courts (Gray 1993).As Bura et al. (2012) point out the PB method oers some advantages compared to standard regression with a dummy or indicator variable.First, the method is intuitive and comparatively easy to understand for a general audience (e.g., judges, juries, etc.).For example, in the context of sex discrimination cases, PB regression estimates the salary equation for the male group (AG) incorporating related covariates and then takes the dierence between the female group's (DG) actual salary and the estimated salary that the DG employee would have received if s/he were paid according to the equation for AG employees.Moreover, the estimated pay dierential obtained from the PB approach is individualized for each member of the protected group.In contrast, ordinary least squares linear regression with an indicator variable estimates a common overall eect of being DG, after adjusting for the relevant covariates.This approach assumes that any dierential is the same across the entire range of covariate values.Another advantage of the PB method is that the females whose salaries are higher than predicted and were not discriminated against are readily identiable.
The PB regression method was rst introduced by Peters (1941) and Belson (1956) for conducting treatment-control comparisons that accounted for relevant covariates by creating statistical matches for the treatment group observations.Blinder (1973) and Oaxaca (1973) used this idea to decompose the dierence between the means of the two groups into components.Gastwirth & Greenhouse (1995) applied the PB method to salary data as well as to logistic regression for binary responses in order to analyze the data arising from a case involving promotion decisions (Capaci v.Katz and Bestho1 ).Furthermore, Nayak & Gastwirth (1997) extended the method to generalized linear models.Hikawa, Bura & Gastwirth (2010a) introduced nonparametric PB in regressions with a binary response as an alternative to logistic regression.Hikawa, Bura & Gastwirth (2010b) introduced the local linear regression in PB.They considered the unknown functions for modeling the mean response in the two groups.Then they used Epanechnikov kernel to estimate unknown functions (for details about linear and nonlinear regression see Achcar & Lopes, 2016).
In this paper, we review the use of PB regression in legal cases from Bura et al. (2012).We suggest using another kernel functions and show that choose an appropriate kernel function can improve the results.The layout of the paper is as follows: In Section 2, we review parametric PB regression, based on parametric ordinary linear regression.Section 3, introduces a recent nonparametric version from Hikawa et al. (2010b) that increases the applicability of the PB approach.In Section 4, we apply all of the methods outlined to data from a sex discrimination case, and Section 5 contains the simulation study.We present the conclusion in Section 6.

Parametric Peters-Belson Regression
In this section, we review parametric PB regression.Like the study of Bura et al. (2012), we assume that the salaries (Y ) are determined by a set of covariates (e.g., seniority, education, etc.) plus normally distributed random errors (ε).
Suppose the salaries for minority and majority employees are given, respectively, by M inority(DG) : . ., n 2 X denotes the covariate vector and β the corresponding coecient vector.The errors, in each equation, are assumed to be normally distributed with mean zero and variance σ 2 1 and σ 2 2 , respectively.If β 1 = β 2 , there is no unjustiable pay dierential and the dierence of salaries is due to random variability.A meaningful measure of pay dierential against the minority employee with a given value of the covariate is: If δ i is negative, the i-th minority employee is underpaid compared to a majority employee with the same covariate values.In parametric PB, a linear regression model is tted to the data for the majority employees.Then each minority member's salary is predicted by X T 1i β2 , where X 1i is the covariate vector for the i-th minority member and β2 is the least squares estimate of β 2 .The dierence, D i = Y 1i − X T 1i β2 , between the actual and predicted salaries is the estimate of the pay dierential of the i-th minority employee relative to a similarly qualied majority employee.Thus, D i is the parametric PB estimate of δ i .When the model is correct, D i is unbiased for δ i and the corresponding unbiased estimator for the average disparity overall minority employees where X1 is the mean vector of the minority covariate values.The variance of where X 2 = (X 21 , X 22 , . . ., X 2n2 ) T is the usual design matrix of the majority group.When we assume σ 2 1 = σ 2 2 = σ 2 , the test statistic (2) can be used to test the null hypothesis δ = 0. Gastwirth (1989) suggested that majority observations be used to estimate the common variance σ 2 because under the hypothesis of no discrimination both majority and minority are supposed to be paid under the same system and hence the variances are supposed to be the same as well.If we use σ2 = σ2 2 , under the null hypothesis of δ = 0, the test statistic in ( 2) is t-distributed with n 2 − p 2 degrees of freedom, where p 2 is the number of parameters (coecients) in the majority model.Gastwirth (1989) discusses the form of the variance of D in simple regression and the hypothesis testing for δ.Nayak & Gastwirth (1997) focus on a slightly dierent version of δ and its estimator and derive its distributional properties.
When the error variances are assumed to be dierent, we can approximate the distribution of the test statistic by using Welch's approximation approach (Welch (1949); Schee (1970); Nayak & Gastwirth (1997)).Under this approximation, the test statistic distribution under the null is approximated by a t distribution with degrees of freedom: Since σ 2 1 and σ 2 2 are unknown, the degrees of freedom are estimated by substituting σ2 1 and σ2 2 in (3).For more details see Hikawa (2009).
Note 1.The measure of average pay dierential δ was used by Gastwirth (1989) with the particular intention to analyze the data arising from pay discrimination cases where the two underlying mean salary lines do not cross each other in the range of covariate values that are of interest.When the two mean lines cross each other, some values of δ i become negative and others become positive; as a result, taking the average will cancel out these negative and positive values and d will no longer be a meaningful measure of pay dierential.Therefore, δ should be used only for the cases where the two mean lines do not cross.While this is a theoretical possibility, but this situation does not happen in practice.Therefore, we recommend that the regression lines be plotted and if they cross, try to nd out why; perhaps a variable has been omitted.After considering all the necessary variables, we probably can be run the PB method.

Local Linear Peters-Belson Regression
In this section, we review the local linear regression technique to estimate the expected minority responses from the majority data in the PB approach (Bura et al. 2012).Then, we express the role of the kernel function in tting the local linear regression model.Hikawa et al. (2010b) mentioned two problems an analyst of pay discrimination data often encounters.The rst is the diculty in estimating the salary equation when it does not appear to follow any usual parametric forms (e.g., linear, quadratic).The second problem pertains to determining who the relevant male/majority employees are to be compared against female/minority employees of interest.Including too many irrelevant majority employees in the comparisons (e.g., male employees who are too senior compared to the target female employees) may introduce serious bias in the estimated disparity (Greiner 2008).
To address these problems, Hikawa et al. (2010b) 2 ), respectively.The only assumption we make on the unknown functions modeling the mean response in the two groups, m 1 (X) and m 2 (X), is that they are twice dierentiable.
As in the parametric PB denition of disparity, the pay disparity for the i-th minority member is and the average disparity of all minority members is Revista Colombiana de Estadística 41 (2018) 235249 Let , where W is a kernel weight function and ||.|| is a norm.Denoting the elements of the rst row of (Z T i W i Z i ) −1 Z T i W i by S i1 , S i2 , . . ., S in2 , the tted value for the design point X 1i is given by The estimated pay dierential for the i-th minority member is The estimated average pay dierential against all minority members and its variance are Since σ 2 1 and σ 2 2 are usually unknown, the estimated variance of DLOC can be obtained by using estimates σ2 1 and σ2 2 which can be obtained from the residuals of the separate local linear regression models within each group.Details of the estimation approach are given in Hikawa (2009) and Hikawa et al. (2010b).The estimated variance of DLOC is given by var Cleveland & Devlin (1988) showed that, under the assumption of normal errors and negligible bias of mi (X), the distribution of (η 2 j σ2 j )/(κ j σ 2 j ) can be approximated by a χ 2 distribution with degrees of freedom η 2 j /κ j , where for j = 1, 2. Let S j be the n j × n j matrix whose (i, k) th element is S ik obtained from tting a separate smooth curve for the minority and majority groups (i.e., Revista Colombiana de Estadística 41 (2018) 235249 separately estimating m 1 (X i ) for i = 1, . . ., n 1 and m 2 (X j ) for j = 1, . . ., n 2 ).The test statistic for testing for lack of disparity (H 0 : δ = 0 vs. H 1 : δ = 0) is dened to be t = DLOC var( DLOC ) . (4) When the two variances are equal (i.e., σ 2 1 = σ 2 2 = σ 2 ), we estimate the common variance by the majority variance estimate σ2 = σ2 2 ; see Gastwirth (1989).The estimated variance of DLOC becomes Therefore, under the assumption of equal variances, the test statistic for group disparity is t distributed with η 2 κ degrees of freedom, where η = η 2 and κ = κ 2 .
When the two variances are assumed to be dierent, as in parametric PB, we can apply Welch's approximation approach to nd the approximate distribution of the test statistic.The expression of the variance of DLOC can be expressed as When H 0 is true, the test statistic ( 4) is approximately t distributed with the degrees of freedom (5) Since σ 2 1 and σ 2 2 are unknown in most practical situations, the degrees of freedom are approximated by plugging σ2 1 and σ2 2 into (5).
The nearest neighbor bandwidth that xes the fraction of data that contribute to the estimation (Cleveland & Devlin (1988); Loader (1999)) is used in tting the local linear regression model.Hikawa et al. (2010b) is used Epanechnikov kernel as the weight function, but we recommend to use another kernels.The kernel function W (u) is a non-negative real-valued integrable function satisfying the following requirements:    Since the sample sizes of 16 males and 15 female are too small to t a local linear regression model, we augmented the data set following the method used by Bhattacharya (1989) and Bhattacharya & Gastwirth (1999), where they analyze data from Berger v. Iron Workers Local 201.Fitting a Gamma distribution to the seniority data yielded Gamma(4, 20) for males and Gamma(2, 33) for females.Then, we consider two scenarios for enhancing observations.The rst, we generate additional salary data for 34 males and 35 females according to the tted models in ( 6) and ( 7) and other, we generate additional salary data for 84 males and 85 females according to the tted models in ( 6) and (7).Therefore, in two scenarios sample sizes of male and female are same and equal to 50 and Revista Colombiana de Estadística 41 (2018) 235249 100, respectively.The error variances were set equal to σ2 m = 7494.5 and σ2 f = 8757.1 in order to match the estimated variances of the error terms from the tted regression models.The data were simulated based on unequal variances for males and females.Consequently, we compute the variance of D based on the assumption of unequal variances and approximate the degrees of freedom of the test statistics using the Welch's approximation approach.Tables 4 and 5  In Tables 4 and 5, the negative values of D indicate that female employees were underpaid on average compared to their similarly qualied male counterparts.The bias and standard error of the average pay dierential estimated.In Table 4, value of D of the local linear PB with Epanechnikov kernel is dierent with other methods and its variance is greater than the other local linear methods.However, the parametric PB has the minimum variance.In Table 5, the results of methods are similar, but parametric PB still has the least variance and among nonparametric methods, the real kernels have fewer variances.These results demonstrate using the appropriate kernel can be eective in reducing bias and standard error.However, from a single example, one cannot make general conclusions.Therefore, we conducted a further simulation study, discussed in the next section.Since the estimated amount of pay dierential from all the methods is quite large, the p-values of all test statistics are very small.Hence, all methods would reject the null hypothesis of no pay dierential and conrm the court's conclusion that the female employees were discriminated in their pay with an average dierential about $129-134 ($122-125) per pay period for sample size 50 (100).

Simulation
In this section, we display the role of kernel functions in local linear PB regression by using simulated data.It should be noted that the simulation models were chosen from Bura et al. (2012).Consider a company that has three stores with employees of both sexes.In the rst, amount of pay is same for two group (male and female).The data of this store simulated from equation ( 8).
In the stores 2 and 3, amount of pay are dierent for two group and women are underpaid relative to comparable men in both stores but the system leading to the disparity dier in the two stores.In the second store, and women start at the same salary but men receive better raises over time, while in the third store men start at a slightly higher salary and also receive higher raises over time.The data of these two stores simulated from equations ( 9) and (10), respectively. (9) The values of the seniority predictor variable for females and males were generated from the Gamma distribution with scale parameters 3 and 2 for females and males, respectively, and shape 2 for both sexes.To sum up, the simulations run according to 3 choices for sample sizes of two groups.The rst, we choose in all three stores, the number of males is 40 and the number of females is 30, in the second choice, we consider the number of males is 80 and the number of females is 60 and the last scenario is based on 60 males and 80 females.The error variances were the same and errors generated from the normal distribution with mean 0 and standard error 300.Ten thousand replicates were used in the simulation and the signicant level of α = 0.05 considered.
Table 6 reports the results of the rst choice for sample sizes.In the rst store, where amount of pay is equal for two group, by using parametric PB the null hypothesis δ = 0 is rejected 5.75% of the times, But by using local PB with Epanechnikov kernel the null hypothesis is rejected 5.96% of the times and by real kernels Normal, Logistic and Laplace the null hypothesis is rejected 5.35%, 5.28% and 5.34% of the times, respectively.Therefore, the size of all tests is close to nominal level 5%.In store 2, by using parametric PB the null hypothesis δ = 0

Conclusion
In this paper, we show that by selecting an appropriate kernel, we can use local linear PB regression and the results of this method are similar to the parametric method.Therefore in most cases, due to the constraints of parametric PB regression, we can be used local linear PB regression.According to the results of data from EEOC v. Shelby County Government and simulation study we concluded that the use of local linear PB regression be eective in determining the extent of a disparity between groups and in local PB method using real kernels can reduce the bias and variance of the estimators and also increase the power of the test.By comparing the three real kernel functions we deduced Logistic and Laplace kernels have better results.By using these kernels the bias is the lowest and variance of the estimator is relatively small.Moreover, the power of the test is acceptable.
But, the Normal kernel compared to two other real kernels has weaker results.
However, all three real kernel in comparison with the Epanechnikov kernel have better performances.It should be noted that we investigate the performance of some other kernels like Triangular and Cosine and have concluded that results of these kernels are similar to the Epanechnikov kernel.Therefore, we recommend using real kernels in local linear PB regression.
is symmetric about zero; Revista Colombiana de Estadística 41 (2018) 235249 a ordinary regression to each gender separately yields: m and f stand for male and female.The estimated variances of the error terms are σ2 m = 7494.5 and σ2 f = 8757.1,and the sample means and variances of seniority are: Xm = 81, S 2 m = 1603.3,Xf = 69.3, and S 2 f = 2195.5.
summarize the results from applying the ve methods: (1) Parametric PB, (2) Local Linear PB with Epanechnikov kernel, (3) Local Linear PB with Normal kernel, (4) Local Linear PB with Logistic kernel and (5) Local Linear PB with Laplace kernel.
covariates and the data consist of n 1 minority observations, (X 11 , Y 11 ), . . ., (X 1n1 , Y 1n1 ), and n 2 majority observations, (X 21 , Y 21 ), . . ., (X 2n2 , Y 2n2 ), where X is a vector of d xed covariate values.The response values of minority and majority members are generated by the following equations: Bura et al. (2012))l linear regression in PB.Local linear regression ts a linear regression in the neighborhood of the covariate values of each minority member.The method is well suited for equal pay cases since the estimation/prediction of the salary of a minority employee is based on majority employees whose qualications are closest to those of the minority employee and thus should receive the greatest weight.Furthermore, the similarity of this method to matched-pairs is expected to make the results more understandable to judges and juries.Local linear regression is similar in spirit to bandwidth matching introduced byBhattacharya (1989).However, the weight given to each majority observation decreases with the distance of the covariate values from the target minority member.Like the study ofBura et al. (2012), suppose we have d Table3summarizes the ANOVA tables for both the male and female models.

Table 2 :
Shelby County pay discrimination case data.

Table 3 :
ANOVA tables for male and female models from the Shelby County data.

Table 4 :
Analysis of the augmented Shelby County Pay Discrimination data (n = 50).

Table 5 :
Analysis of the augmented Shelby County Pay Discrimination data (n = 100).

Table 8 :
Analysis of the simulated data (Scenario 3).