Field Experimental Evidence on Gender Discrimination in Hiring: Biased as Heckman and Siegelman Predicted?

Correspondence studies are nowadays viewed as the most compelling avenue to test for hiring discrimination. However, these studies suffer from one fundamental methodological problem, as formulated by Heckman and Siegelman (The Urban Institute audit studies: Their methods and findings. In M. Fix, and R. Struyk (Eds.), Clear and convincing evidence: Measurement of discrimination in America, 1993), namely the bias in their results in case of group differences in the variance of unobserved determinants of hiring outcomes. In this study, the authors empirically investigate this bias in the context of gender discrimination. They do not find significant evidence for the predicted bias. JEL J16 J71 M51 J41 C93


Introduction
During the last decade, economists have attempted to estimate hiring discrimination against women in the labour market by means of correspondence experiments. 1 Within these experiments, pairs of fictitious job applications, only differing by the gender of the candidate, are sent to real job openings. By means of standard probit regressions of the subsequent callback from the employer on the gender of the candidate, discrimination is identified. The correspondence testing methodology is the golden standard to estimate hiring discrimination in the labour market since employer discrimination is disentangled from supply side determinants of labour market outcomes and since selection on gender differences in unobservable characteristics is not an issue as all the employees' individual characteristics are under control of the researcher (Riach and Rich 2002).
However, a major critique on this methodology can be formulated based on Heckman and Siegelman (1993). They show that not controlling for group differences in the variance of unobservable productivity determinants (and ipso facto of unobservable determinants of positive callback) can lead to spurious evidence of discrimination. The robustness of ethnic discrimination to the Heckman and Siegelman critique (henceforth "HS critique") is tested by three former contributions to the empirical discrimination literature (Baert et al. 2015, on Belgian data;Carlsson et al. 2014, on Swedish data;Neumark 2012, on US data). These studies show that the HS critique is relevant and the bottom-line of their results is that a higher (perceived) variance in unobservable determinants of positive callback among ethnic minorities (compared to the ethnic majority) leads to an underestimation of the level of discrimination against them when not controlling for ethnic group differentials in this variance. 2 At the same time, as argued by Azmat and Petrongolo (2014) in their overview of experimental advances in the study of gender differences in the labour market "it should be stressed that existing [...] correspondence evidence on gender discrimination is [...] still open to this criticism." The only attempt to fill this gap we are aware of, is Carlsson et al. (2014) who apply Neumark's (2012) econometric framework to a number of already published correspondence studies among which one targeted at gender discrimination.
In the present study, we complement their evidence by an empirical investigation of the HS critique in the context of gender discrimination using the same framework but another -and in our opinion theoretically more convincing -identifying assumption. 2 The results presented by Carlsson et al. (2014) deviate to some extent from this empirical pattern. 4 2 Methods

Heckman and Siegelman's critique
As argued above, correspondence studies adequately address concerns of individual differences in unobservable determinants of productivity. Heckman and Siegelman (1993) show, however, that group differences in the variance of these unobservable determinants may still lead to spurious evidence of discrimination.
To see this more clearly for the case of gender discrimination in hiring, assume that both the average observed and unobserved determinants of productivity are the same for male and female candidates for an unfilled vacancy, but that the variance of unobservable job-relevant characteristics is, at least in the perception of the employer, higher for females than for males. In addition, suppose that the employer considers the observed determinants of productivity, inferred from the CV and the motivation letter, as relatively low compared to the job requirement. In that case it is rational for the employer to invite the female and not the male candidate, since it is more likely that the sum of observed and unobserved productivity is higher for the female candidates. A correspondence test that detects discrimination against females could therefore underestimate the extent of discrimination. 3 5 Neumark (2012) explicitly addresses this critique and provides a statistical procedure in order to recover unbiased estimates of discrimination. In what follows, we succinctly describe Neumark's approach applied to gender discrimination.

Neumark's empirical framework
It is well known that in a standard probit model only the ratio of the coefficients to the standard deviation of the unobserved residual is identified.
In estimations the standard deviation is usually arbitrarily set to one. In our case this means that the variance of unobservable job-relevant characteristics is implicitly assumed to be equal for both males and females, which, for reasons stated above, may bias the measure of discrimination. Neumark (2012) shows, however, that if the researcher observes job-relevant characteristics that affect the male and female populations' propensities of call back in the same way, one can identify the ratio of the standard deviation of the unobserved productivity components of these groups. 4 Implementing Neumark's (2012) idea in the context of gender discrimination boils down to the estimation of a heteroskedastic probit model in which the variance of the error term is allowed to vary with gender.

Identification strategy
As mentioned in the previous subsection, identification of the group-specific variance in observable determinants of positive callback within the heteroskedastic probit framework requires experimental data with variation in observable job-relevant characteristics that affect the (in our case gender) groups' propensities of call back in the same way. Variables used by Baert et al. (2015), Carlsson et al. (2014) and Neumark (2012) in their application of the Neumark framework in the context of ethnic discrimination were education level, personality traits, work experience, type of neighbourhood, sport activities and application quality. In the context of gender discrimination, Carlsson et al. (2014) assumed equal returns for both genders from variation in educational degree, international mobility, work experience, employment status and job tenure. Their choice can be criticised on theoretical grounds. 5 All the aforementioned variables used for identification of the Neumark procedure result from variation in choices and outcomes at the employee side and may -ipso facto -be correlated with ethnicity or gender in reality. 6 The variable we assume to have the same return across groups is the distance between the candidate's living place and the workplace. On the one 7 hand, it is clear that this variable has the potential to affect hiring decisions in respect of employers since they may prefer workers with a social network in the neighbourhood of the firm and since they may expect a higher commitment from workers living close to the firm (and therefore loosing not much time by commuting). On the other hand, by using this variable we actually exploit employer variation instead of employee variation as the living place of the employee is constant. As a result, there is no reason why this variable would be more rewarded for members of a particular sex. 7 Both considerations are confirmed empirically (see Section 4).

Data
We use data from Baert et al. (Forthcoming), a correspondence study investigating the importance of employer preferences in explaining Sticky Floors, i.e. the pattern that women are, compared to men, less likely to start to climb the job ladder. Within the experiment, 1152 job applications of male and female candidates are sent to real vacancies in Belgium. Their data are, in view of our identifying strategy, extended with the distance between the workplace announced in the vacancy and the candidate's residence. 8 7 One could argue that applications to employers living very far away from the residence of the applicant reflect a willingness to be mobile which may be correlated with female sex. However, if we redo our estimations using only observations with distances lower than 30 minutes of car driving, the results are very comparable to the ones presented in the main text. Table 1 presents the results of our empirical analysis. In Panel A we report the degree of gender discrimination that comes out of a standard analysis of the data of Baert et al. (Forthcoming). We retake their main findings by conducting basic probit estimations with positive callback as an outcome variable. Positive callback is defined as getting an invitation for an interview concerning the announced job in models (1) and (2) and defined as getting any positive reaction from the employer side in models (3) and (4). On the one hand (models (1) and (3)), we regress positive callback on a dummy indicating female sex of the candidate and the distance between the workplace and the residence of the applicant. On the other hand, for models

Results
(2) and (4), the effect of female sex is broken down by whether the vacancy indicates a job implying a promotion in occupational level compared with the current job of the candidate. 9 By doing that, we get results that are very similar to those presented in columns (1) and (2) of Table 4 and Table 5 of Baert et al. (Forthcoming). More concretely, the regression results indicate that, overall, the tested employers did not discriminate based on sex.
However, if the research sample is broken down by the occupational level of the posted job, we find that a female name lowers the probability of positive callback by four to five percentage points when they apply for jobs implying a promotion in this respect.

9
Interestingly, the estimation results for the variable "distance between the workplace and the candidate's residence" -not presented in Table 1 but available on request -are, for all of the mentioned models, highly significantly different from zero (p < 0.01) and have the expected (negative) sign.
Moreover, based on Wald test applied to the estimation results of a probit model with an additional interaction variable between female sex of the candidate and the distance between workplace and residence, we cannot reject that this variable is rewarded equally for males and females (see Table   1 for the test results).
Panel B reports the results based on a re-estimation of models (1) to (4) by means of a heteroskedastic probit model in the spirit of Neumark (2012) allowing the variance of the error term to vary with the gender of candidate.
By doing that, we get unbiased results that are very comparable to those in Panel A. In other words: we find no evidence for a bias in the sense of the HS critique. This finding is related to the fact that the estimated male and female standard deviations concerning the error term (σ Male and σ Female ) are very comparable. Therefore, our results seem to indicate that the tested employers do not perceive a (gender) group difference in the variance of unobserved determinants of productivity. 10 Last, we decompose, in the spirit of Neumark (2012), the unbiased 10 estimates in an effect through level (keeping group differences in the variance of the error term constant) and an effect through variance (keeping differences in unbiased parameters constant). Interestingly, but differing from the findings of Carlsson et al. (2014), we find that the effects through level are, although not significantly different from zero, more or less of the same magnitude as the total unbiased effect, while the effects through variance are rather close to zero.
Our result of no important perceived gender group difference in the variance of unobserved variables deviates from the finding of the more substantial ethnic group difference in this respect outlined in Baert et al. (Forthcoming), Carlsson et al. (2014) and Neumark (2012). One explanation for this finding is that perceived group differences in the variance of unobserved variables can be thought of as a sort of statistical discrimination in the sense of Altonji and Blank (1999) where employers believe that the same observable signal is more precise for one group compared to another.
This theory seems to be more applicable to ethnic groups than to gender groups.

Conclusion
In this study, we occupied the research gap indicated by Azmat and Petrongolo (2014). This gap boils down to the fact that standard analyses of correspondence testing data aimed at investigating hiring discrimination do not control for group differences in the variance of unobservable productivity determinants and, as a consequence of that, may be biased. While the robustness of ethnic discrimination to the HS critique is tested by three former studies, Azmat and Petrongolo (2014) stress that correspondence studies on gender discrimination are still open to this critique. Estimating the bias predicted by Heckman and Siegelman (1993) in the context of gender discrimination was the aim (and the contribution) of this study.
We used Belgian correspondence data aimed at measuring hiring discrimination against young females. We employed the empirical framework introduced by Neumark (2012) and proposed an original identifying assumption. By doing that, we found no significant evidence for the by Heckman and Siegelman (1993) predicted bias related to the fact that the estimated (perceived) variance of unobservables is very comparable for male and female job candidates.