The Gender Employment Gap among Refugees and the Role of Employer Discrimination: Experimental Evidence from the German, Swedish and Austrian Labor Markets

Compared to their male counterparts, refugee women exhibit low employment rates in many countries. Discrimination by recruiters could possibly explain this phenomenon, but thus far, there is little direct evidence on this. This study addresses this gap. We develop a set of hypotheses about the effects of gender and family status on refugees’ labor market integration, and then test these hypotheses using data from an original survey experiment administered in 2019 to online panels of recruiters in three major refugee-receiving countries (Germany, Austria, and Sweden). We find that recruiters indeed prefer female over male refugees across different job types, all else equal. However, we also find evidence of a disadvantage connected with motherhood among refugees. Overall, our findings raise doubts about the relevance of discrimination as an explanation for the observed employment gap between male and female refugees.


Introduction
Gainful employment is a central dimension of immigrant integration into host countries (Harder et al. 2018).Of course, simply having a job is not equivalent to successful integration (e.g., Ballarino and Panichella 2015), but not having a job will very likely impede other forms of social integration (e.g., Gallie 1999).Unfortunately, immigrants in many Western societies struggle to find employment, which is particularly the case for refugees and other humanitarian immigrants (Hooijer and Picot 2015;De Vroome and van Tubergen 2010).In addition, within this already disadvantaged group, female refugees' employment rates are significantly lower compared to those of their male peers (Bloch 2007;Cheung and Phillimore 2017).In this paper, we focus on this gender employment gap among refugees.
Different explanations exist for why female refugees are less likely to have employment than their male peers are.One potential explanation is that female refugees are discriminated against by virtue of being refugees and immigrants (Zschirnt and Ruedin 2016), and because they are women (González, Cortina and Rodríguez 2019).This added disadvantage in recruitment that refugee women experience might explain at least a part of the gap between refugee men and women that is empirically observed in western economies.However, the case for the "double burden" hypothesis is by far not open-and-shut because refugee women might have an advantage in recruitment over their male peers.This is because any negative attributes applied to refugees (e.g., unreliable, untrustworthy; e.g., Kotzur et al. 2019) are applied primarily to refugee men (Eagly and Kite 1987) but less so to women.In addition, stereotypes commonly attached to women (e.g., warmth, communality; see Ellemers 2018) could counteract negative stereotypes often applied to refugees.If that were the case, and if refugee women accordingly hold an advantage over male refugees in recruitment, this would imply that the employment gap among refugees would result from factors other than discrimination in recruitment.Such other factors could include traditional beliefs about proper gender roles held in immigrant and refugee communities, which lead women in these communities to adopt the homemaker role instead of seeking to join the labor force (Fernández and Fogli 2009;Koopmans 2016; but see also Breidahl and Larsen 2016), or institutional factors, such as a lack of affordable childcare and other supportive family policies in their destination countries (Dumper 2002).
Empirically, the question of whether discrimination or other factors are more important determinants of the refugee gender employment gap is not yet settled.Existing research has largely relied on indirect evidence such as testimonies from refugees (e.g., Bloch 2007;Dumper 2002) or survey or registry data where employment gaps that cannot be attributed to differences in productivity serve as a indicatorsof discrimination (Bevelander 2011).An arguably superior approach is to study directly recruiters' perceptions and hiring behaviors, but this has only rarely been done (but see Lundborg and Skedinger 2016;Vernby and Dancygier 2019).
In this contribution, we provide new evidence for the role of discrimination in the gender employment gap among refugees.To do so, we first develop a set of hypotheses about the likely patterns of recruiter discrimination that refugees of different genders and with different family backgrounds might face, building on theories of discrimination and stereotypes from economics and psychology.We then test these hypotheses using data from an original survey experiment administered to an online panel of recruiters from Austria, Germany, and Sweden in 2019. 1ur main finding is that-all else equal-recruiters indeed prefer female refugees to their male peers.But we also find that this female advantage disappears once children are present.With children, female refugees are on par with their male peers.This finding is robust across two distinct job types and all three countries.
These findings have implications for our understanding of the mechanisms of refugee integration and public policy.For one, our findings suggest that recruiter discrimination against women might not actually be the main mechanism behind female refugees' lower employment rates.Our findings also suggest that any antidiscrimination initiatives to support refugees should not bracket out the situation of refugee men.
The remainder of this article proceeds as follows: The next section develops hypotheses about hiring discrimination against refugees based on a discussion of theories about stereotyping in recruitment.The third section then presents our experiment, data collection, and estimation method.The fourth section presents the results and the final section concludes.

Hiring Discrimination and the Role of Stereotypes
There is a wealth of evidence for the presence of discrimination against immigrants, women, and other groups in job recruitment as well as for the relevance of stereotypes as a driver of such discrimination (e.g., Ellemers 2018;Midtbøen 2014;Zschirnt and Ruedin 2016).Therefore, it is plausible-but not self-evident-that stereotyping is also the mechanism behind refugee women's low employment rates.In this section, we discuss different scenarios of how stereotypes could affect male and female refugees in recruitment procedures.We define stereotypes as oversimplified (and frequently false) expectations or beliefs about social groups and their characteristic attributes.Stereotypes can be positive ("Germans are punctual") or negative ("Immigrants are criminals") (see also Bordalo et al. 2016;Cuddy et al. 2009;Ellemers 2018).
The "Double Burden" Hypothesis Whatever labor market discrimination refugees in general face (Zschirnt and Ruedin 2016), it is possible that refugee women suffer an additional disadvantage connected to their female gender and the stereotypes attached to it (Cejka and Eagly 1999;Ellemers 2018).Such a "double burden" for female refugees could result from two related forms of stereotyping.In the first case, recruiters might see refugee women-because they are women-as more likely to be less committed to their jobs and less reliable than men are due to current or potential future caring responsibilities.In addition, prescriptive stereotypes about the "proper role" of women in society might also lead recruiters to discriminate against female applicants in general (González, Cortina and Rodríguez 2019; see also Phelps 1972).In both cases, stereotyping based on gender would create an additional disadvantage for female refugees.
This leads to the expectation that the employment gap between male and female refugees might result from negative stereotypes attached to both female refugees' gender and their immigration status, creating the type of "double burden" or "double jeopardy" that has also been observed for women from other disadvantaged groups (King 1988;Taylor, Charlton and Ranyard 2012).Specifically, if the "double burden" hypothesis holds, we would expect to find the following: H1: Female refugees will be considered as less employable than male refugees are, with all other factors held equal.
The "Female Advantage" Hypothesis An alternative scenario is that female refugees in fact are perceived as more employable than their male peers because negative stereotypes about refugees in general are applied more strongly to males than to females.
Stereotypes about refugees are indeed often negative.For example, in their study of perceptions of refugees and asylum seekers conducted in Germany, Kotzur et al. (2019; see also Cuddy et al. 2009;Kotzur, Forsbach and Wagner 2017) found that "generic" refugees were seen as low in both likability and trustworthiness (warmth) and in ability and independence (competence).Froehlich and Schulte (2019) got similar results when they compared stereotypes toward different immigrant groups in Germany.They found that immigrants from countries associated with recent refugee outflows such as Syria or Afghanistan were rated as low in both warmth and competence compared to Germans, but the same held true for immigrant groups with a longer presence in Germany such as Turks.
Yet, research has also found that any negative stereotypes about foreigners are unequally applied to men and women.Notably, Eagly and Kite (1987) found that negative traits associated with a given foreign nationality were mostly attributed to men with that nationality rather than to women.They suggested that this was because "indirect contact" mostly informed stereotypes about foreigners, for example, portrayals in news media.And this is where women-due to their generally lower profile in public life-are less likely to be shown.For example, women are less likely to be seen in negative or problematic news reports (e.g., irregular border crossings that involve scuffling with security forces), which means they are less likely to be associated with behavior that is judged as problematic. 2 In addition, it is possible that the stereotypes attached to refugee women due to their gender (warmth, compassion; Ellemers 2018) can counteract the stereotypes attached to them due to their refugee status, and thus, make them appear less threatening than their male peers.This reduced threat would be particularly advantageous in recruitment processes for jobs that are relatively undesirable, low in status, and that involve boring, repetitive, or dirty tasks.In these types of jobs, "tractability" is an important attribute-and these job types are most likely to be open to applicants with few or no formal credentials or networks, such as refugees (e.g., Bonoli and Hinrichs 2012;Waldinger and Lichter 2003;Zamudio and Lichter 2008).In other words, because female refugees are less likely to be perceived as being "threatening" or having "the wrong attitude," they might have an advantage compared to their male peers.H2: Refugee women are considered as more employable compared to male refugees, with all other factors held equal.
2 Systematic differences in how men and women from the same country of origin are depicted have indeed been observed in the media coverage of the recent (but also past) refugee inflows into Western countries (e.g., Wilmott 2017;Wigger 2019;Berry, Garcia-Blanco, and Moore 2015).Both newspaper articles and photographs have been shown to use different frames to portray refugees (Wilmott 2017), and media portrayals of (young) (Muslim) refugee men are more likely to elicit negative stereotypes.
It is important to point out the further implications of this last hypothesis: If female refugees indeed are seen as no less or even more employable than male refugees are, this would imply that recruiter discrimination might not explain the observed employment gap between male and female refugees.A potential alternative explanation could be an adherence to traditional gender roles within refugee communities that leads female refugees to drop out or stay out of the labor force, as mentioned above (Koopmans 2016).
However, we also consider the possibility that the stereotyping dynamics described above work differently once children are present, and that women might then suffer from a motherhood penalty whereas men can potentially benefit from a fatherhood advantage. 3If present, such effects might change the overall picture again, especially if refugees come from origin countries with high fertility rates and retain these after migrating (e.g., Scott and Stanfors 2011).

The Motherhood Penalty Hypothesis
Existing evidence points strongly to a significant penalty for mothers in the labor market in many (but not all) advanced economies (Budig, Misra and Boeckmann 2016).Objective factors can explain, in part, this motherhood penalty, such as the loss of work experience that women who leave the labor force after having children suffer or differences in human capital (Anderson, Binder and Krause 2002;Budig, Misra and Boeckmann 2016).However, studies also find a remaining penalty that observable characteristics cannot explain, and this remaining penalty may result from discrimination against mothers in recruitment (Anderson, Binder and Krause 2002;Budig and England 2001;Correll, Benard and Paik 2007).This, in turn, may also be due to stereotyping of mothers.For example, mothers are commonly perceived as warmer, but also as less competent (Cuddy, Fiske and Glick 2004).
If this also applies to refugee women, then this would imply the following: H3: All else equal (including human capital), refugee mothers are considered as less employable compared to childless refugee women.

The Fatherhood Advantage Hypothesis
Finally, we also consider that any potential disadvantage male refugees experience might be reduced if they are married and/or have children because being a husband and father can reduce their perception as "threatening." Equivalent to the evidence for a motherhood penalty in many labor markets, there is also considerable evidence for a fatherhood advantage or bonus, meaning that men benefit professionally from having children.More important, here again research suggests that at least a part of this effect works through factors other than objective ones such as behavioral changes (Bygren and Gähler 2012;Killewald 2013; but see also Bygren, Erlandsson and Gähler 2017).Cuddy, Fiske and Glick (2004) research is again illustrative as they found that men with children were seen as warmer than men without children, but equally competent.
If this also applies to refugee men, then it is possible that they can compensate for their otherwise lower perceived warmth by becoming fathers.
H4: Refugee fathers are considered as more employable than refugee men without children, with all other factors held equal.

Research Design
To evaluate our hypotheses, we used data from an original survey experiment in which we studied how a sample of recruiters evaluates refugee job applicants who differ in gender, family status, and several other attributes.
Specifically, we conducted a so-called factorial or vignette survey experiment.In this type of experiment, participants evaluate brief descriptions of fictional persons or objects ("vignettes") that vary randomly on a set of attributes (Auspurg and Hinz 2015;Jasso 2006;Wallander 2009).In our experiment, participants read brief vignettes of refugee jobseekers (described in more detail below) and evaluated the employability of each fictional vignette person.
In our vignette experiment, we randomized treatment assignment in two respects.First, each vignette displayed a random combination of jobseeker attributes-in other words, the fictional persons shown were equally likely to be male or female, to have children or be childless, to come from different origin countries, etc.This random vignette composition ensured that the unique and unbiased effect of each individual attribute on the participants' evaluations could be identified.In addition, the randomized composition of vignettes and that each vignette included information about several different attributes at a time made it increasingly difficult for participants to identify which particular attribute was of central interest in the experiment, which helped to reduce social desirability bias (Auspurg and Hinz 2015, 11).Especially when studying discrimination, this is an important precondition to gathering valid measurements.Second, vignettes were also randomly assigned to participants, which meant the experiment controlled for potential confounding effects of respondent-level variables.
The vignettes varied along the following set of attributes.First, we included as attributes our core variables of interest, namely gender (male or female) and family status (single, married without children, or married with one child aged five).In addition, each vignette description included information about the fictional refugee's age (24, 37, or 48 years), country of origin (Afghanistan, Syria, or Turkey), year of arrival (2015 or 2018), language proficiency (local language A2, local language B2, local language A2 and English B1), former occupation in the country of origin (elementary school teacher, medical doctor, administrative assistant, cleaner, or temporary worker), and integration measure participation (integration course, work practice private sector, work practice public sector, wage subsidy, or volunteering) (for more details, see Table S3, Supplementary Material). 4he refugees were presented as applicants for two distinct job types.The first position was a job as an administrative assistant, which involved basic office duties such as delivering internal mail, sorting office material, and copying documents.The second position was as a janitor/caretaker position and consisted of tasks such as cleaning offices and taking care of the establishment's outside area.The job descriptions (see experimental protocol, Tables S2a and S2b and Figure S2) were limited to lower-skilled occupations because refugees, even if they are highly qualified, often only have access to lower-skilled jobs due to a lack of language skills, formal credentials, or networks (e.g., Bloch 2007).We presented each respondent with four vignettes of applicants for each of the two positions (i.e., eight vignettes in total), one vignette at a time.Each time, participants evaluated how likely they were to invite the candidate for a job interview on a scale from 1 (not at all likely) to 10 (very likely).To avoid order effects, we randomized the order of both the position described in the survey and the order of the four vignettes per job.Participants were instructed that all applicants had been officially recognized as refugees and had valid work permits to prevent diverging assumptions regarding the bureaucratic hurdles to hiring these workers and/or uncertainty about their ability to stay in the country.
By declaring that all fictional applicants were refugees, we focused specifically on the refugee population's recruitment experience-and thus, our experiment does not allow us to compare refugees to other immigrant groups or to members of the majority population.This was for two reasons.First, and most important, we are interested in the gender employment gap within the group of refugees, hence the central comparison was between refugees of different genders.Second, while in principle it would be interesting to include natives as a comparison group, there is a risk that the recruiters would have a strong preference for native applicants (e.g., Zschirnt and Ruedin 2016), and that their tendency to engage in "attention discrimination" against less appealing groups of candidates in recruitment (Bartoš et al. 2016) might drown out the finer gender differences within the group of interest, namely refugees.
We included Syria and Afghanistan as countries of origin because these were the two largest groups of recent refugees in all three countries in which we conducted the experiment in 2019 (see Table S1, Supplementary Material).However, by having Turkey as a third country of origin, we also included one group that has a longer immigration history and larger diaspora in Western Europe-but still a valid cause of refugee emigration in the form of the failed coup d'état in 2016 and the generally eroding civil liberties in the years before and after this event (e.g., Esen and Gumuscu 2016).This allowed us to account for the possibility that stereotypes could differ between immigrant groups depending on the size of their diaspora and their historical presence in a destination country (Froehlich and Schulte 2019).
We also want to point out an important potential drawback of our experimental setup, which is that this type of experiment can only approximate a real hiring situation and can only capture the hiring intent, but not actual hiring behavior.However, as Hainmueller, Hangartner and Yamamoto (2015) have recently shown, intentions stated in vignette experiments correspond well to observed real-world behavior, which testifies to this type of experiment's external validity.An added advantage is that vignette experiments do not have the same ethical issues as alternative approaches such as audit studies or correspondence tests.Finally, vignette experiments are particularly appropriate when studying recruitment for low-skilled positions where recruitment practices are less structured and application processes do not require highly formalized, individualized, and extensive application documents -which is precisely what our experiment captures.

Survey Setup and Country Case Selection
The experiment was embedded within a survey questionnaire in which participants were first asked a set of screening questions (see below) before they proceeded to the experiments.
The survey was fielded in three European countries: Austria, Germany, and Sweden.The rationale behind selecting these three countries was as follows.First, these countries were severely confronted with above-average numbers of refugees during 2015-2017, making refugee socioeconomic integration an important and salient topic (see Figure S1, Supplementary Material).Moreover, the composition of the refugee population was similar across the three countries.Most refugees were of Syrian, Iraqi, or Afghan nationality; most of them were young, that is, between 18 and 34 years of age; and male (e.g., Martin et al. 2016).
However, these three countries also differ significantly in their labor market and welfare state institutions, which are factors that influence the labor market integration of migrants in general, and migrant women in particular (e.g., Ballarino and Panichella 2018;Kogan 2007;Reyneri and Fullin 2011).Germany and Austria have prototypical conservative welfare systems with a strong focus on status maintenance and a bias toward the "traditional" family model, while Sweden has a Nordic or social democratic welfare state with a more egalitarian orientationincluding with regard to the gender dimension (Esping-Andersen 1990).In addition, these countries also differ in their immigration history.Continental countries such as Germany or Austria are traditional guest-worker countries that relied heavily on foreign workers to meet the demands of their booming labor markets after WWII.Conversely, Sweden took a different route and opted to expand female labor force participation to reduce the labor market reliance on foreign workers (Afonso 2019).By conducting our survey and experiment in different countries with different labor market and migration traditions, we can account for the possibility that different policy regimes might affect recruiter evaluations of refugees-for example, that recruiters are less likely to penalize refugee women where policies allow for an easier reconciliation of work and family life (e.g., Budig, Misra and Boeckmann 2012).

Participant Recruitment
We sought to select participants with substantial recruitment experience who therefore could deliver a realistic assessment of the fictional refugee job candidates.Obviously, obtaining such a specific participant sample is not a straightforward task.Our approach was to select participants from an online panel operated by a large survey and market research company (Qualtrics), using a set of requirements and quotas.
Upon starting the survey, we asked all initially recruited participants if they had any experience with job recruitment processes.We selected those participants who indicated they had been involved in hiring processes during the 12 months prior to the survey; all other participants were screened out.After those without recruitment experience were screened out, we applied quotas based on the participant's age (50 percent of participants had to be older than 35), gender (50 percent had to be female), and firm size (60 percent had to work in firms with up to 250 employees) to obtain a diverse sample.Overall, we had 368 participants from Germany, 228 from Austria, and 363 from Sweden, which taken together rated a total of about 7,600 vignettes.We present more detailed descriptive statistics for our respondent sample in Table S5 in the Supplementary Material. 5

Empirical Model
Given that our dependent variable was metric and our data had a hierarchical structure, with multiple vignette evaluations pooled within many participants, the recommended empirical model is a linear random effects multilevel model (see e.g., Auspurg and Hinz 2015, Chapter 5), which we chose as our main specification.Our models included dummy variables for all the various vignette attributes listed above as the main predictors; interaction terms were added to some models, as described in detail below.Unless otherwise indicated, our models were estimated on the pooled data that included observations from all three countries in which we conducted our survey.These models included country dummies and in some specifications, interactions to capture between-country differences.We also estimated the main model separately for each country (reported in Table A1), and we re-estimated both the pooled and the country-specific models using a fixed-effects specification and cluster-robust standard errors (see Table S9 in the Supplement) to check if our results were stable across estimation techniques (Bryan and Jenkins 2016).The results did not change substantively across specifications.

The Main Effects of Gender and Family status
We start the analysis by inspecting the main coefficient estimates for all vignette attributes from the main pooled model, which are shown in Figure 1 (the detailed estimation results are reported in Table A1, under "Model 1").
The main first result is the overall small magnitude of the vignette attributes' effects.The largest effect estimate (the dummy for "Janitor" as the position applied for) is around 0.5 points on the 10-point response scale, and other coefficients are half of that or less.We suspect this might result from the focus on refugees alone in this experiment, and that recruiters may have found it difficult to differentiate more strongly between candidates within this narrow group.Had the experiment pitted refugee candidates against other immigrant groups or a native-born group, stronger differences might have emerged.Nevertheless, we do find statistically significant effects of several vignette attributes, which indicate that recruiters do make some systematic distinctions within the refugee group.
Among these, we first point to the significant and negative effect of being male compared to being female on recruiters' evaluations of employability (holding constant other characteristics, including the type of job applied for).This finding is consistent with our second hypothesis, which predicts that men with a refugee background are perceived as less employable because they are more strongly connected to their respective culture's negative stereotypes.By implication, the "double burden" hypothesis is rejected.
The results also show that married individuals with a child are, on average, evaluated less positively compared to married applicants without children or single applicants.Single persons are evaluated as better than married ones without children are, but this difference is not statistically significant.
When looking at the other vignette attributes, we do not find significant age-related discrimination or country-of-origin effects.The explanation for the latter might be due to that all fictional candidates were presented as having been awarded refugee status, and our participants might in this case have trusted that their residence in the destination country was legitimate, regardless of their country of origin.Humanitarian concerns might then play into recruiters' evaluations (Bansak, Hainmueller and Hangartner 2016), leading them to treat equally Afghans, Syrians, and Turks.However, we stress that this does not mean that such individuals would not be discriminated against if compared to other lowskilled individuals on the labor market (Bevelander 2011;Bloch 2007).
Regarding other migration-related variables, the results show that, on average, speaking the local language (Swedish or German) at a B2 level or a combination of knowledge of the local language at the A2 level and English at the B2 level does not lead to a statistically significant improvement in recruiter evaluations compared to having only A2 skills in the local language.Refugees who held lower-skilled positions in their country of origin (i.e., were cleaners or had only temporary jobs) were evaluated significantly worse than were both former teachers and medical doctors, and these differences are statistically significant. 6Refugees who were assigned to a second integration measure in addition to a mandatory integration course were also evaluated significantly better compared to those who only participated in a mandatory course (the reference category).However, there are no discernible differences between the effect estimates of the different integration programs.This suggests that it does not matter what kind of measure or volunteering activity refugees take part in (see also Fossati and Liechti, 2020), as long as they do something in addition to a simple integration program.
Finally, applicants for a janitor position are evaluated significantly more favorably than those applying for a position as an administrative assistant.As mentioned above, this is also the effect with the largest magnitude.We interpret this as an additional indication for the relevance of stereotypes in hiring processes.Stereotypes, or more specifically what Fiske and Taylor (2013) described as "jobholder schemas," provide a plausible explanation for why refugees are consistently sorted into jobs that involve less prestigious work.That is, recruiters have expectations about which profile is most likely to work in such a position, and they find refugees to be more suited for positions involving "dirtier" and less prestigious occupations (see also Auer et al. 2019;Pager, Bonikowski and Western 2009).

The Conditioning Effect of Family status
Thus far, we found that refugee women are preferred overall in recruitment compared to refugee men.However, we also expected that this effect might differ depending on whether refugees have children.To evaluate our third and fourth hypotheses about how a refugee's family status moderates the effects of gender, we add an interaction between gender and family status to our main specification and compute predicted ratings for each combination of family status and gender from the estimation results.These ratings are shown in Figure 2. 7Firstly, we find that the estimates for men with different family statuses are not significantly different from each other.Being male has a consistent negative effect, which is the same irrespective of whether an applicant is married or has children.This absence of a conditioning effect of fatherhood runs counter to our fourth hypothesis, that is, the idea that refugee men can become fathers to compensate for their disadvantage.
Conversely, we do find support for the hypothesis that being married with a child makes a difference for women-and here the effect is negative, as expected.Thus, we do find that refugee women experience a setback in recruiter evaluations once they have children.Interestingly, the predicted rating for refugee mothers is essentially the same as the ratings for refugee men are across the different family statuses.In other words, it seems recruiters cluster applicants into two groups: in one group are childless women, who are preferred; in the other group are mothers and all male applicants.

Further Analyses and Robustness Checks
To further probe the stability of the negative main effect of being male compared to female, we also tested if women may be considered more suitable for an administrative assistant position whereas men may be seen as better janitors (Cejka and Eagly 1999).We did not find such an effect (see Table S6 and Figure S3 in the Supplementary Material).In addition, we considered whether the gender effect differs across countries because it is conceivable that recruiters in societies with strong gender-equality norms, such as Sweden, are less prone to discriminate than those elsewhere are.Again, we failed to find such differences (see Table S7 and Figure S4 in the Supplementary Material).
Finally, we also investigated if the interaction effects between gender and family status that we uncovered above differ between the three countries.The reasoning here is that the better availability of employment-friendly childcare and family policies in Sweden might reduce the penalty mothers experience (e.g., Budig, Misra and Boeckmann 2016).To analye if this is the case, we estimated a model using the pooled data that include a three-way interaction between gender, family status, and country (Austria, Germany, or Sweden; see Table S8, Supplementary Material) and test for the joint significance of all interaction terms in the model using a Wald test.8All interaction terms were jointly insignificant (χ 2 = 9.16, df = 10, p = 0.512).We also tested for the joint significance of only the three-way interaction terms between gender, family status, and country and can here as well not reject the null hypothesis (χ 2 = 2.13, df = 4, p = 0.712).This indicates that the motherhood penalty does not differ between the three countries we study.

Conclusion
We investigated whether discrimination during recruitment could explain the widely observed employment gap between male and female refugees in most western economies.Our findings suggest that this is not the case.On average, female refugees are preferred over males in the job recruitment process.This finding is robust across occupations and countries.We suggest this is because negative stereotypes often attributed to refugees (lacking reliability and trustworthiness) are primarily attributed to male refugees.We also found that children change this pattern because refugee mothers are no longer evaluated better-but also not worse-than their male counterparts are.
Our findings contribute to the literature on the economic integration of immigrants and refugees in Western countries (e.g., Van Tubergen, Maas and Flap 2004; De Vroome and van Tubergen 2010; Koopmans 2016), and we believe that, in particular, our finding that male refugees never enjoy an advantage over their female peers in recruiting has an interesting implication for this literature.Specifically, it suggests that it is unlikely that gender-based discrimination in recruitment is behind the employment gap between refugee men and women, at least in the three countries we studied.Hiring discrimination could only be a contributing factor if female refugee applicants would at some point be actively disadvantaged compared to their male counterparts-but that is not what we found.Instead, we found a female advantage (for non-parents) or the absence of differences (when children are present).This suggests that the factors that produce the employment gap between male and female refugees are more likely to be located on the supply rather than the demand side of the labor market.
One such supply-side factor that could keep female refugees from joining the labor force could be insufficient access to childcare and generally weak policies to support the reconciliation of work and family life (Budig, Misra and Boeckmann 2016;Bonoli 2013).Another could be an adherence to more traditional gender roles in refugee communities (e.g., Koopmans 2016).Additional research is of course needed to investigate this further.
Our study has limitations, and these produce avenues for future research.A first limitation is that we consider only a small set of refugee origin countries, and it would be worthwhile to see if the discrimination against male refugees we found occurs against refugees from different backgrounds and from earlier refugee waves (e.g., refugees from the Balkan wars of the 1990s or from Iran in the 1970s).Indeed, although the origin countries we consider (Afghanistan, Syria, and Turkey) are different in many cultural, geographic, and economic respects, they are all majority-Muslim countries that experienced refugee emigration recently.In this context, it is important to point to Eagly and Kite's (1987) study, which found that although stereotypes about nationalities are more strongly applied to males than they are to females, this does not apply equally to all nationalities, but mostly to those associated with less gender equality and the absence of democracy and liberal values.Therefore, it would be relevant to study whether there is less between-gender discrimination in recruitment regarding refugees from countries that are perceived as more gender-equal and liberal than Afghanistan, Syria, or Turkey are.
A related limitation of our study is that we covered only recruiter perceptions in three West European countries, two of which belong to the Continental European or Conservative welfare state regime (Esping-Andersen 1990).A natural way to extend our analysis would be to include further countries, in particular non-European countries with welfare state models that are classified as liberal (in the sense of a strong free-market orientation, low taxes, and little regulation and redistribution), such as Canada, the United States, Australia, or New Zealand, but potentially also countries in the Mediterranean area with overall less developed welfare states, such as Italy or Greece (Esping-Andersen 1990;Ferrera 1996).
Furthermore, an obvious limitation of our study is that we focus on differences within the group of refugees and do not also include a comparison to natives or other immigrant groups.As already mentioned above, including natives or other immigrant groups in the experimental design would create a risk that any differences within the group of refugees would be drowned out due to recruiters' "attention discrimination" (Bartoš et al. 2016).Still, conducting this type of experiment-ideally with a large sample of participants to ensure sufficient statistical power to detect even small effects-would be worthwhile because this would allow researchers to test the "double burden" hypothesis more explicitly and directly than we have been able to do here.
We also want to point out that our study's focus-labor market integration-is only one of several core dimensions of host country integration, next to social, political, or psychological integration (Harder et al. 2018).In addition, and as mentioned already in the introduction, simply having a job is not equivalent to being successfully integrated because jobs vary in quality and the extent to which they are appropriate given workers' skills and abilities.In this area, immigrants, including immigrant women, have faced significant disadvantages (Ballarino and Panichella 2015;2018).Although this particular form of disadvantage has not been our focus here, our finding that refugee applicants are strongly preferred if they apply for the position of a janitor instead of the (more prestigious) position of an administrative assistant is in line with the finding from other studies that recruiters tend to channel applicants with foreign or minority backgrounds into less prestigious positions (Auer et al. 2019;Pager, Bonikowski and Western 2009).Of course, this pattern might look differently if one were to study recruitment patterns in countries other than those we have studied.Based on the results of previous research, we would expect this to be less pronounced in liberal countries, such as the United Kingdom, but even more pronounced in Southern Europe (Ballarino and Panichella 2015).This again highlights the value of extending our analysis to these contexts.
Our results have implications for policymaking, in particular that when addressing labor market discrimination against immigrants and refugees in general, a major focus should be on countering stereotypes employers hold (see also Vernby and Dancygier 2019).Particularly stereotypes about male refugees should be countered.Existing research suggests a number of potential interventions such as de-biasing training, fostering positive contact, or institutional reforms, such as mandatory quotas or the implementation of diversity offices (see Kalev, Dobbin and Kelly 2006;Paluck and Green 2009).
De-biasing interventions involve exposing decision makers, such as recruiters, to stimuli and exercises that reduce any biases they may have, including biases that are subconscious or "implicit" (Greenwald and Banaji 1995).Especially "perspective taking" and exposing individuals to counter stereotypes, namely images of famous and admired exponents of the minority group have proven particularly effective in reducing biases (e.g., Dasgupta and Greenwald 2001;Finnegan, Oakhill and Garnham 2015;Lai et al. 2014).
A second type of intervention draws on the logic of the contact hypothesis (Pettigrew and Tropp 2006), which predicts that promoting contact between different groups can help reduce any biases and prejudices that group members might hold about each other, provided that the experiences are meaningful, positive, and constructive.In a labor market context, subsidized temporary employment or internship programs could be used to foster such contact between employers and refugees (Hirst et al. 2019; but see also Ortlieb et al. 2020).Similarly, mentoring programs in which refugees could obtain advice and are introduced into professional networks might be relevant as well (Liechti 2020).
Finally, changing rules and institutions can help include disadvantaged groups such as (male) refugees.One measure in this category would be to create structures within companies that assign direct responsibility for countering bias and increasing diversity, such as diversity managers or affirmative action plans (Kalev, Dobbin and Kelly 2006).A second institutional measure would be mandatory quotas or affirmative action policies.Albeit controversial, these types of policies have proven effective in helping different minority groups across different contexts because they trigger changes in recruitment procedures more generally, which in turn results in fairer and more formalized processes (Miller 2017).
Table A1.Effects of vignette variables on recruiters' ratings of refugee candidates.

Figure 1 .
Figure 1.The influence of vignette variables on refugee applicants' ratings by recruiters (both occupations and all countries).

Table A2 .
Interaction gender and marital status, all countries and both jobs.