Learning to be Unbiased: Evidence from the French Asylum Ofﬁce

: What determines whether some asylum seekers are granted refugee status while others are rejected? I draw upon archival records from a representative sample of 4,141 asylum applications filed in France between 1976 and 2016 to provide new evidence on the determinants of asylum decisions. I find that applicants who are Christian (rather than Muslim) are more likely to be granted refugee status, controlling for all other individual characteristics available to the asylum officers making the decisions. However, linking archival records to detailed administrative data, I also show that bureaucrats at the French Asylum Office stop discriminating after about a year on the job. These findings have implications for strategies to curtail discrimination in courtrooms and administrations. Verification Materials: The materials required to verify the computational reproducibility of the results, procedures, and analyses in this article are available on the American Journal of Political Science Dataverse within the Harvard Dataverse Network, at: https://doi.org/10.7910/DVN/HNVBZG.

T he arrival of over three million refugees in Europe between 2015 and 2019 1 triggered an unprecedented political crisis in the European Union (EU). Anti-immigrant parties campaigning on exclusionary policies are gaining ground across Europe. Several European countries reintroduced border controls, leading to the de facto suspension of the open-border Schengen area. The conflict among countries over the distribution and processing of the influx of asylum seekers is at the root of this crisis. All EU member states are bound by the Convention Relating to the Status of Refugees (hereafter the Geneva Convention), but despite decades of efforts to harmonize the European asylum sys-tem, member states continue to handle asylum applications very differently (European Commission, 2016).
At the center of the asylum process, decision-makers in asylum courts and offices face the difficult task of determining whether asylum seekers are providing truthful and substantiated claims of persecution for reasons of "race, religion, nationality, membership of a particular group or political opinion." 2 For asylum seekers, these decisions are very consequential: those granted refugee status will be able to stay in Europe, while those who are not have to return to their country of origin. However, the subjectivity of the definition of persecution and the possibility that applicants are falsifying claims gives judges and bureaucrats significant discretion to determine the veracity of their claims. On what basis do asylum officers decide? Is asylum granted based on the credibility of the claims of persecution, or do bureaucrats discriminate in the attribution of refugee status?
The lack of microlevel data has thus far limited researchers' ability to analyze the determinants of asylum decisions. In this study, I overcome previous data limitations by taking advantage of an unprecedented effort by the French Office for Refugee Protection and Stateless Persons (hereafter the French Asylum Office) to increase transparency. In 2009, the office opened its administrative archives and made nearly 1.5 million asylum applications available to researchers, which included hours of filmed interviews with some of the main actors in the French asylum system. I digitized a total of 4,141 asylum applications, a representative sample of all applications filed at the French Asylum Office between 1976 and 2016. 3 I collected all the information recorded on the application form (e.g., age, gender, marital status, education, employment, and religion) and transcribed applicants' personal narratives, in which they explain why they need political asylum. Crucially, for each application, I also know the decision and the anonymized identifier of the bureaucrat who decided the case.
By comparing accepted and rejected applicants, I analyze how individual characteristics affect the probability of obtaining asylum in France, holding the country of origin and year of arrival constant. Since I had access to the same information as the bureaucrats who decided the cases, omitted variable bias is unlikely to be a concern. I employ three different strategies to control for the effect of the personal narrative. First, I combined hand coding and supervised machine learning to measure and predict whether the narrative contains credible claims of persecution. Second, I developed a list of substantive features of the text that should be relevant to the decision. Third, I applied the supervised Indian Buffet Process to a random subset of the data to identify the text features that are the most relevant in explaining the decision, which I then predict in the rest of the sample (Fong and Grimmer 2016).
This study yields two main findings. First, I demonstrate evidence of discrimination in the attribution of refugee status in France since 1976. While asylum decisions should only be based on assessments of the applicant's claims that they will be persecuted if forced to return to their country of origin, I instead find that Christians are substantially more likely to be granted refugee status than Muslims. I show that these differences cannot be fully explained by variations in other individual characteristics or by differences in their personal narratives.
Moreover, these gaps are unlikely to be driven by differences in unobservable characteristics revealed during the interview process or by the selective assignment of particular cases to certain bureaucrats. Second, I show that French bureaucrats stop discriminating based on these characteristics after about a year on the job: Muslims are much more likely than Christians to be discriminated against when an inexperienced bureaucrat is assigned to their case than if an experienced case officer makes the decision. The pattern is similar, though less robust, for educational attainment and skill level. Overall, this suggests that bureaucrats learn on the job not to discriminate.
This study makes three main contributions. First, it contributes to the "refugee roulette" literature by analyzing the first microlevel dataset on a representative sample of asylum applications. 4 Several studies have examined country-level data and concluded that both humanitarian and strategic interests explain variation in acceptance rates Holzer, Schneider, and Widmer 2000;Keith, Holmes, and Miller 2013;Neumayer 2005;Rosenblum and Salehyan 2004;Rottman, Fariss, and Poe 2009;Salehyan and Rosenblum 2008;Schneider and Holzer 2002;Toshkov 2014). However, these country-level analyses are merely suggestive since they do not account for the possibility that variation in the composition of asylum seekers between countries of origin could confound these results.
Second, this study provides credible evidence that on-the-job experience can mitigate discrimination using a fine-grained measure of experience. Arnold, Dobbie, and Yang (2017) uncover a similar pattern when comparing discrimination by bail judges across different courts in the United States who have different average levels of experience. Third, the study broadens our understanding of discrimination within bureaucracies. It adds to a growing number of empirical studies which demonstrate that bureaucrats discriminate on the basis of ethnicity and religion (Butler and Broockman 2011;Hemker and Rink 2017;McClendon 2016;Neggers 2018;Olsen, Kyhse-Andersen, and Moynihan 2020;White, Nathan, and Faller 2015). Most notably, by linking self-reported religious affiliation with administrative decisions, a first in the French context, this dataset provides a unique opportunity to study religion-based bureaucratic discrimination in France using real-world data.

Asylum Process in France
To apply for refugee status in France, asylum seekers first need to fill out a standardized application form that elicits demographic and socioeconomic information and to provide a personal narrative that describes, in French, their motives for seeking political asylum. Applications submitted to the French Asylum Office in Paris are directed to the relevant geographic division and assigned to a bureaucrat. Before 2006, this official decided whether to interview the applicant after reading her application; starting in 2006, interviews became mandatory. The bureaucrat advises his supervisor on whether to grant the applicant refugee status, and the supervisor makes the final decision. Those who obtain asylum receive a 10-year renewable residency permit (or a 1-year residency permit in the case of subsidiary protection, described below). Those who are rejected can appeal this decision to the National Court of Asylum, a three-judge panel that publicly reexamines asylum cases. If their appeal is rejected, asylum seekers can resubmit an application to the French Asylum Office if they have new information to provide.
What determines whether a person will receive political asylum in France? The Geneva Convention requires all signatories, including France, to grant asylum to individuals with a well-founded fear of being persecuted for reasons of "race, religion, nationality, membership in a particular group or political opinion." 5 In 1998, France introduced another form of protection called territorial asylum, which was replaced in 2003 by subsidiary protection, which is granted to those who do not meet the Geneva Convention's definition of persecution but who would be subject to the death penalty, torture, or indiscriminate violence in the context of an internal or international armed conflict in their country of origin.
Bureaucrats working at the French Asylum Office have substantial discretion to decide who receives refugee status for two main reasons: (1) subjective interpretations of the Geneva Convention's definition of persecution and (2) applicants who may be economic migrants and are falsifying persecution claims. As early as 1970, the asylum office's annual activity report noted that "the pace of arrivals remains high, although the Office is striving to exclude some elements -Yugoslav in particularwho are in reality, economic refugees, in search of better life and employment conditions." Since the 1980s, when large-scale fraud by applicants from the Republic of Zaire and countries in South East Asia was revealed, bureaucrats at the French Asylum Office have been advised to use caution when assessing claims of persecution. A former director of the French Asylum Office (1996)(1997)(1998)(1999)(2000) illustrated the challenge associated with making asylum decisions: "You are gold diggers. There is a huge stack of rocks. In this stack, there are a few gold nuggets. You have to find them, but there are a lot of rocks." Moreover, asylum officers often lack the time, space, training, and documentation needed to make informed decisions. The 1978 activity report accessed for this study describes their disastrous working conditions: "Four officers usually worked in a 12 square meter office with two typists typing, while the other two officers try to assist asylum seekers in the remaining space." Information provided in the 1986 activity report suggests that asylum officers decided on average 3.5 cases per day, twice as many as their German counterparts at the time resolved. While their working conditions have improved over time, these bureaucrats still report working under very stringent time constraints. As recently as 2013, an asylum officer described how a lack of time and documentation significantly impaired her ability to "discover the truth" (Aho Nienne 2013).
Employment at the French Asylum Office has also been precarious. Until 1993, it had no permanent employees, relying mostly on temporary workers to deal with frequent fluctuations in the number of applications (Figure D.1,p. 18, in the online supporting information). In 1993, the unions successfully negotiated the conversion of temporary contracts into permanent positions. The share of temporary workers dropped radically afterward, but in the early 2000s, the office hired more to deal with a sudden uptick in applications.
In many respects, officials at the French Asylum Office resemble Michael Lispky's notion of street-level bureaucrats, who often respond to tight time constraints by developing "routines of practice and psychologically [simplify] their clientele and environment" (Lipsky 2010, xii) -a process that he argues can give rise to favoritism and stereotyping. While there is evidence from multiple contexts that bureaucrats in various capacities discriminate on the basis of race and ethnicity, this research overwhelmingly relies on experimental methods (Butler and Broockman 2011;Hemker and Rink 2017;McClendon 2016;Olsen, Kyhse-Andersen, and Moynihan 2020;White, Nathan, and Faller 2015); very few studies have identified discrimination using real-world data (Neggers 2018).
In this study, I leverage unique access to archival records from France to study discrimination (i.e., unequal treatment) on the basis of religion, education, skill level, and proficiency in French of otherwise equal applicants. Prior research has shown that citizens in Europe and the United States tend to prefer immigrants who are Christian rather than Muslim, high skilled over low skilled, more educated, and who are fluent in the host-country language (Adida, Laitin, and Valfort 2016;Adida, Lo, and Platas 2019;Bansak, Hainmueller, and Hangartner 2016;Hainmueller andHiscox, 2007, 2010;Hainmueller and Hopkins 2015;Helbling and Traunmüller 2020;Valentino et al. 2017). Yet, these characteristics should not be relevant to decisions about whether to grant asylum, since such determinations should only be based on an assessment of the credibility of the claims of persecution.

Sources of Discrimination
Economists distinguish between two sources of discrimination to explain group-based differential treatment. Differential treatment arises in taste-based discrimination because decision-makers derive utility from favoring one group over another (Becker 1971). Statistical discrimination stems from decision-makers' beliefs about the distribution of the relevant outcome within a certain group (Arrow 1973). Bohren et al. (2019) and Phelps (1972) further distinguish between cases in which the latter is based on accurate beliefs (which they label accurate statistical discrimination) or inaccurate beliefs (inaccurate statistical discrimination).
I examine how these different sources of discrimination might affect French Asylum Office decisions as follows. A bureaucrat evaluates the level of persecution faced by an asylum seeker who has observable characteristics (religion, skill level, educational attainment, and proficiency in French) g = {A, B} and faces persecution ω. The bureaucrat cannot directly observe the level of persecution ω that the asylum seeker faces at home; he must rely on the applicant's written narrative. The credibility of the narrative, q = ω + , is related to the level of persecution ω, but with some noise introduced by (1) the fact that the applicant needs to write about her experiences in French (which is unlikely to be her native language) and (2) the fact that the asylum seeker can get help from family, friends, and volunteers from migrants' associations. For simplicity, I assume ∼ N (0, σ 2 ) is an independent random shock. When reading the narrative, the bureaucrat extracts a noisy signal of its credibility, s = q +η. The noise η comes from the fact that he can only imperfectly assess the narrative's credibility. I model the bureaucrat's ability η to do so as a random noise drawn from a normal distribution N (0, σ 2 ). The bureaucrat receives −(v−(ω−c g )) 2 as the payoff from evaluation v ∈ R for an asylum seeker with characteristic g and persecution ω. The introduction of a taste parameter, c g , in the bureaucrat's payoff function allows for the possibility that the official derives utility from favoring one group over another. Normalizing c A = 0, a bureaucrat has a taste-based partiality for group A if c B ≥ 0. In that case, the bureaucrat has higher standards for asylum seekers from group B than for those from group A, even when they face similar levels of persecution. This opens up the possibility of taste-based discrimination.
Statistical discrimination instead arises from how bureaucrats reach evaluation v. I assume that bureaucrats have prior beliefs about the level of persecution faced by group g, which are distributed according to the normal distribution N (μ g , σ 2 ω ). The bureaucrat has a beliefbased partiality toward group A if he considers asylum seekers from that group to be more persecuted than those from group B (μ A ≥μ B ). These beliefs can be accurate, in which case they are equal to the true mean μ g , but they can also be inaccurate (μ g = μ g ).
The bureaucrat updates his beliefs about the level of persecution faced by the asylum seeker after reading their narrative and chooses the evaluation v that maximizes his expected payoff with respect to this updated belief (see Appendix A, p. 3, in the online supporting information) for more details on the derivation of Equation 1).
Equation (1) provides an important insight for the empirical strategy. To estimate the extent of discrimination involved in decisions about whether to grant refugee status, it is important to control for the signal s that the bureaucrat receives to distinguish between the signal effect and the discrimination effect. In practice, however, I can only estimate q, the credibility of the narrative, since I do not observe η. Moreover, the estimation inevitably introduces some noise λ. As a result, I control for s = q +λ. However, as long as λ and η are uncorrelated with group characteristics, I can recover the effect of individual characteristics on the decision. I return to this point in Section 5.
Discrimination occurs when two asylum seekers with the same signal, one from group A and one from Equation (2) illustrates that disparate decisions, for example, in favor of group A, can arise from two sources: taste-based discrimination, which results from the official's preference for group A (c B ≥ 0), or statistical discrimination, the result of the bureaucrat's belief that group A is more persecuted on average than group

Data
The sample for this study consists of 4,141 asylum applications filed at the French Asylum Office between 1976 and 2016. The sampling design was complicated by the fact that the office regularly destroys a subset of rejected applications to free up space in the archives. To deal with this issue, I proceeded in two steps. The French Asylum Office first extracted a random sample of 100,000 applications of all those filed since 1952-25,118 of which had already been destroyed. I then selected applications from among those remaining for in-depth data collection using the inverse of the probability of still being in the archives as the selection probability. To correct for remaining imbalance between the data collected and the administrative sample, I used entropy balancing to reweight the sample for the analyses (Hainmueller 2012). Appendix B (pp. 4-9 in the online supporting information) contains more information on the sampling design and the construction of the weights.
The outcome of interest is whether an applicant was granted refugee status upon first examination at the French Asylum Office. On average, 15.3% of applicants in the sample were granted political asylum in France based on the Geneva Convention between 1976 and 2016 ( Figure 1, gray bars). By construction, this rate is lower than the overall first-time acceptance rate at the Asylum Office ( Figure 1, dashed line) for two reasons. First, I exclude cases of family unification and resettled refugees from the sample because these cases are not evaluated exclusively on the basis of their application. Second, I focus on the decision to grant refugee status based on the Geneva Convention, which presents a unique feature crucial to the research design. Indeed, the Geneva Convention is applied in France based solely on examining individual claims of persecution rather than on belonging to a particular group (Cohen 2000). This practice rules out the possibility that the observed group differences may be driven by the fact that bureaucrats know that some groups are more persecuted, since in theory this information should not matter. The same assumption does not hold for the attribution of the subsidiary protection since it is based, among other things, on the security situation in the applicant's region of origin, which helps determine whether she needs protection.
In France, asylum cases are examined sequentially, thus only those that did not qualify for protection under the Geneva Convention are examined for subsidiary protection. I thus recode applicants who received refugee status under subsidiary protection as having been rejected for the Geneva Convention protection. Table 1 lists the main independent variables used in the analysis. In this study, I focus on identifying discrimination on the basis of religion, education, skill level, and language proficiency. Overall, close to 80% of applicants identify as either Christian or Muslim, but there is significant variation over time: Muslims represented 60% of applicants in 2016, up from 5% in 1976. Half of all applicants had started secondary education, and 14.2% reported starting university. Highly skilled jobs including academics, doctors, engineers, high-level executives, lawyers, journalists, and students represent 16.6% of the applicants. Blue-collar workers, civil servants, mechanics, farmers, guards, and other such employees represent 39.7% of applicants (Middle), and drivers, hairdressers, sales clerks, and homemakers 18.6% (Low). An applicant was coded as proficient in French if she listed French as a native or spoken language (26.8% of applicants did). In all specifications, I also control for age, gender, marital status, and time spent in France at the time of the application.
For each application, I also transcribed the personal narrative, the bulk of which were handwritten. Of the 4,141 applications included in the final sample, 93.4% submitted a narrative in French. Table 2 provides summary statistics on a number of features of the narratives that, based on my informal discussions with bureaucrats at the French Asylum Office, should affect how credible they are perceived to be. The number of words (including stop words) and the number of dates and locations mentioned (extracted using Stanford CoreNLP) are proxies for the narrative's level of detail. The narratives varied greatly in length, but on average were 777 words long and mentioned seven dates and seven locations. To understand the extent to which the narratives are personal and individualized, I counted the number of first-person pronouns used ("je," "j'ai," "me," "mon," "mes," and "moi"). On average, the narratives used firstperson pronouns 35 times. To control for the originality of the narrative, I also computed each narrative's average Euclidean distance to the narratives of other asylum seekers from the same country of origin. Moreover, to gain insight into the topics covered in the narratives, I also estimated a structural topic model with 20 topics using country of origin as a covariate (Roberts et al. 2014). See Appendix C (pp. 9-17 in the online supporting information) for more information on the construction of these features. For confidentiality reasons, I was not authorized to collect any individual-level information about the bureaucrats working at the French Asylum Office. However, I was able to construct a fine-grained measure of their experience using administrative data on a large random sample of asylum applications. To do this, I complement the list of 100,000 I used to sample asylum applications for archival data collection with an additional random sample of 500,000 applications filed between 1989 and 2014. Together these two administrative samples provide information on 502,997 unique asylum applications filed between 1989 and 2014 -more than half of all applications filed at the Asylum Office during that period. The information contained in this database is scarce, but for the 309,913 applications filed after 2000, these records include the decision, as well as the date and the identifier of the bureaucrat who made the decision. After excluding bureaucrats who made at least one decision in 2000 (and for whom I cannot rule out the possibility that they started working for the office before that year) and those who made their first decision in 2015 or after, I can infer the start date of each remaining bureaucrat using the date of their first decision and computing the order in which they made each of their decisions. I am then able to match this information with the main sample, allowing me to use the number of past decisions made by the bureaucrat in charge of the case as a proxy for his level of experience at the time of the decision.

Research Design
This unique individual-level dataset allows me to estimate discrimination in the attribution of refugee status in France. I regress an indicator variable for whether an applicant was granted asylum on her characteristics, while holding constant the credibility of her narrative, her country of origin, and the year of application. The selection on observables assumption is well supported because I have access to the same information as the bureaucrats making the decisions. However, the fact that interviews became mandatory in 2006 raises the concern that some decisions could have been based on unobservable characteristics revealed during the interview. Fortunately, close to 40% of the applicants in the sample were not interviewed, which allows me to check that the results hold in this subsample. Moreover, while we do not know the exact mechanism used to assign cases to bureaucrats, we have good reason to believe it is based on observable characteristics: Cases are first dispatched to different divisions, each of which handles a different set of nationalities. The division head then assigns cases to the officials in their division on a monthly basis. Since this assignment happens before the interview takes place, we can assume that this assignment is based exclusively on observable characteristics. To address this concern, I show that the results hold when controlling for bureaucrat fixed effects. Finally, one necessary condition for the estimation is overlap, which is often difficult to achieve with highly collinear variables like religion and country of origin. I check for overlap between religion and country of origin in the data by plotting countries of origin in the sample as a function of the share of applicants from that country who are Christian on the x-axis, and the proportion who are Muslims on the y-axis ( Figure D.2, pp. 19, in the online supporting information). While a number of countries cluster on the bottom-left and upper-right corners of the graph, a nonnegligible proportion is spread out on the diagonal, suggesting substantial overlap. The efficacy of this research design also hinges on the ability to control for the signal s that the bureaucrat receives from the narrative, as outlined in Section 3. Moreover, the descriptive patterns reveal that the narratives differ significantly by religion, education, skill level, and proficiency in French, further emphasizing the need to control for the narrative's effect on the decision (Tables C.2, C.3, C.4, and C.5, pp. 11-12 in the online supporting information). I use three different methods to control for the narrative. First, in the main specification, I control for the credibility of the narrative, which I predict using a combination of human coding and automated classification. Due to the limitations of this method (outlined below), I check that the results are robust to two alternative strategies: controlling for substantive features of the narrative that should be relevant in predicting the decision (listed in Table 2) and controlling for features that are relevant to the decision (using the supervised Indian Buffet Process). It is important to note here that I use these measures to control for the credi-bility of the narrative, q, not the applicant's level of persecution, ω. But this is not a concern since bureaucrats also only observe a signal of persecution, not the persecution itself, before making their decision. I now discuss all three methods in turn.
The first method I use to control for the narrative relies on a combination of hand coding and supervised learning to measure the credibility of the personal narrative -that is, the extent to which the narrative (1) meets the criteria outlined in the definition of refugee in the Geneva Convention and (2) sounds authentic and convincing in its claims of persecution. Three coders read 350 unique narratives, 59 of which were triple coded. These coders were selected from the pool of research assistants who helped with the digitization process and who had a master's degrees in law or political theory. This selection process ensured the confidentiality of the narratives, a first-order parameter in this study, and that coders were knowledgeable about the Geneva Convention. By the time they read and coded these narratives, they had each transcribed several hundred, an important requirement for reliable hand coding (Krippendorff 2004). Importantly, they were not informed of the decision reached by the bureaucrat.
For each narrative, they first coded whether it mentioned any form of persecution, and if so, what type (race, political opinion, religion, nationality, or social group). Second, they assessed whether the claims were believable, convincing, detailed, individualized, and coherent, using a 4-point scale for each. Third, they coded whether the narrative referred to a historical event and whether the applicant mentioned members of her family who were already living in France. Finally, research assistants were asked to determine whether the applicant made a reasonable claim to political asylum, as defined by the Geneva Convention, by answering the question, "In your opinion, is this person entitled to claim the right to asylum according to the Geneva Convention?" (Disagree/Somewhat disagree/Somewhat agree/Agree). On average, coders "agreed" with the statement 15% of the time, which is comparable to the average acceptance rate in the sample and "somewhat agreed" 35% of the time (Table C.6, p. 13, in the online supporting information). I coded narratives as credible if research assistants answered either "Agree" or "Somewhat agree" because the intercoder reliability, as measured by the Krippendorff 's α coefficient, 6 is much higher (0.48) when combining these two responses than when considering narratives to 6 Krippendorff α is computed as 1 (number of disagreements actually observed)/(number of disagreements observed by chance). This ratio is used to determined intercoder reliability, where 0 indicates perfect disagreement and 1 perfect agreement. For more in- Using the set of hand-coded narratives, I then compare the performance of three classification algorithms to predict the credibility of the narrative on a left-out sample. Using Random Forest, I am able to accurately predict the credibility measure 75% of the time in the left-out sample of 85 narratives, a substantial reduction in error compared to a baseline of 48% (Table C.9, p. 1, in the online supporting information), and to ensure that the predicted probabilities are well calibrated ( Figure C.1, p.  16). This predicted measure correlates as we would expect with the features of the narrative (Table C.10, p. 17) and, as detailed in Section 6, with the decision as well, which suggests that this measure is capturing the credibility of the narrative, even if imperfectly. Far from claiming that research assistants are better than civil servants at their jobs, or even that we should replace bureaucrats with computers, this procedure is an attempt to summarize, in a single variable, important variation among applicants' narratives that could explain officials' decisions whether to grant or withhold refugee status.
Although this procedure inevitably introduces some noise (λ), the first key identification assumption is that this noise is uncorrelated with the characteristics of asylum seekers I examine (religion, education, skill level, and proficiency in French). This would be problematic, for example, if research assistants were more likely to find the same narrative more credible if the asylum seeker is Christian than if she is Muslim. To help mitigate this concern, I check the robustness of the results by controlling for the effect of the narratives in two additional ways. First, I directly control for features of the narratives that should be relevant in explaining the decision (listed in Table 2). Second, I use the supervised Indian Buffet Process developed in Fong and Grimmer (2016) to discover the features of the text that are relevant in explaining the decision. The model takes a document-term matrix as its input and learns from a training set (30% of the narratives chosen at random [N = 1180]), a set of latent binary features that are predictive of both the text and the outcome. I set the number of features to eight and search using a range of parameters to select the model that ranked the highest on a quantitative measure of model fit (Fong and Grimmer 2016). I use this model to infer the latent treatments for the test set (the remaining 70% of narratives), which I use as additional controls in the analysis. By using a split-sample design, this procedure solves the formation, see Krippendorff (2004, pp. 221-27). Agreements ranging from 0.4 to 0.6 are considered a sign of moderate agreement between coders (Landis and Koch 1977). identification and estimation problems that arise from using the same documents to discover treatments and estimate causal effects (Egami et al. 2018).
The second identifying assumption is that the bureaucrat's specific noise that proxies for his ability (η) -which I do not observe -is uncorrelated with the characteristics of asylum seekers I examine. This would be a concern if bureaucrats were better able to assess the credibility of the same narrative when the asylum seeker belongs to one group over another -if, for example, the bureaucrat was more knowledgeable about one group than another such that the same narrative would appear more credible if it were written by an applicant from a particular group. The fact that bureaucrats specialize by nationality rather than any of the characteristics that I examine helps mitigate this concern for education, skill level, and proficiency in French. This, however, is not implausible for religion, since this is also one of the motives defined in the Geneva Convention. To address this concern, I show that the results on religion are robust to restricting the analysis to the sample of asylum seekers who did not claim persecution on the basis of religion.

Discrimination in the Attribution of Refugee Status
What are the determinants of asylum decisions in France?
In Table 3, I analyze how the applicants' religion, educational attainment, skill level, and proficiency in French affect asylum decisions. In addition to country-of-origin and year-of-application fixed effects, all specifications include a limited set of controls: age, gender, marital status, time spent in France prior to applying for refugee status, and the credibility of the narrative. This analysis reveals that religion is an important predictor of asylum decisions (column 1). Muslim applicants are 6.2 percentage points less likely to obtain refugee status than Christians who are similar across all other characteristics. This represents a substantial difference (41%) given that the average acceptance rate was 15% during the study period. The effect of educational attainment and skill level on the decision is large as well. Compared to those who reported starting university, those who reported secondary-or primary-level education were 2.4 and 6.3 percentage points less likely to be granted refugee status, respectively, though only the difference between primary and postsecondary education is statistically significant. Those with middle-or low-skill levels were both significantly much less likely (6.1 and 8.2 percentage points, respectively) to obtain refugee status. Compared Notes: Point estimates and standard errors in parentheses. The dependent variable is a dummy variable indicating whether the applicant received refugee status upon first examination at the French Asylum Office. All regressions include demographic characteristics (age, gender, and marital status), an indicator variable for whether they spent more than 1 year in France before applying, country of origin, and year of application fixed effects. Reference category for "Religion" is Christian. Reference category for "Education" is university. Reference category for "Skill Level" is high. Reference category for "Speaks French" is yes. † p < 0.1, * p < 0.05, * * p < 0.01.
to those who speak French, not speaking French has a small negative effect (3.3 percentage points) on the decision. However, when restricting the sample to applicants who were not interviewed, even though the estimates of the coefficients are similar in sign and magnitude, only the difference between Christians and Muslims remain statistically significant (Table 3, column 4).
To alleviate concerns that the difference between Christians and Muslims is driven by omitted variables, I perform four sets of robustness checks. First, I show in Table D.1 in the online supporting information (p. 21) that this result is robust to including additional covariates reported in the application form (column 1), including the number of children, whether the application was expedited, whether the applicant declared a passport or a diplomatic laissez-passer, whether the applicant has a family member who obtained refugee status in France, whether the applicant completed military service and whether her entry to the territory was legal (see Table B.1, p. 9) for summary statistics on these additional variables). I also show that this result is robust to including country of origin-year of application interactions (column 2), to pruning the sample using coarsened exact matching (Iacus, King, and Porro 2012) (column 3), and to reestimating effect sizes using a logistic regression model (Table D. 2, p. 22). Finally, in column 4, I show that the coefficient is very similar when the credibility measure is omitted, which suggests that the results do not hinge entirely on this particular measure.
Second, I show that the result is robust to alternative ways of controlling for the effect of the personal narrative. In columns 2 and 3 of Table 3, in addition to the credibility of the narrative, I control, respectively, for substantive text features (described in Table 2) and for text features discovered by the supervised Indian Buffet Process in the left-out one-third of the sample. The coefficient is less precise in this last specification, which could be a function of the smaller sample size. Overall, the result is not sensitive to these alternative ways of controlling for the narrative's effect on the decision.
Third, by showing that the result holds after controlling for bureaucrat fixed effects, I reject the possibility that these differences result from a process in which asylum applications submitted by Muslims are systematically assigned to stricter bureaucrats. This analysis is complicated by the fact that information about bureaucrats is incomplete for applications filed before 2000, both in the administrative database and in the paper applications I digitized. Overall, the identifier of the bureaucrat who decided the case is missing for 17% of the total sample and for 53% of applications filed in 1999 or earlier. To deal with these missing values, I binned all ap-plications for which the identifier of the bureaucrats was missing into an additional "missing" category (Table 3,  column 5).
Finally, I perform a set of robustness checks to ensure that the difference between Christians and Muslims is not entirely driven by the fact that asylum officers have access to background information regarding the persecution experienced by these different groups. To rule out the possibility that Muslims are less likely to be granted refugee status because they are, in fact, less persecuted than Christians, I first show that the results hold when I control for whether the applicant belongs to a minority religious group in her home country (Table 3, column 6). The World Religion dataset of the Association of Religion Data Archives estimates the percentage of the population that has identified with Christianity or Islam since 1945 for most countries in the world. Using these estimates, I generate a binary variable that indicates, for each applicant in the sample, whether she belongs to a religious group that comprises less than 20% of the population in her home country. 7 I then show that differences between Christians and Muslims also hold in the subset of applicants who did not claim persecution on religious grounds. The rationale for this test is that if Christian applicants are more likely to receive asylum than Muslims because they are more likely to be persecuted than Muslims, then the Muslim gap should not hold in the subsample of applicants who did not claim persecution on religious grounds. Only 6% of narratives claimed persecution on religious grounds according to the coding completed by the research assistants on a representative sample of applicants (Table C.6, p. 13 in the online supporting information), and the proportion of the text dedicated to the topic Religion calculated by the structural topic model exceeds 20% in only 2% of the narratives (Table B.1, p. 9). But identifying applicants who claim religious persecution is not straightforward. About 1,021 of the narratives (22%) mention at least one religious keyword from a relatively short list. To identify narratives that claim persecution on religious grounds (one of the five motives listed in the Geneva Convention), I read 292 narratives, a random sample of the 1,021 narratives that contained at least one religious keyword, and for each I coded whether the applicant was claiming persecution on religious grounds. Restricting the sample to narratives that either did not contain a religious keyword or contained one but was not coded as religious persecution, I find that the Muslim penalty 7 A total of 165 observations did not merge due to unequal overlap in coverage by country. For example, Armenia, Bosnia and Croatia appear in the World Religion data only after 1995.

FIGURE 2 Discrimination by Bureaucrats' Experience
Notes: This figure shows the estimated conditional marginal effect, along with the 95% confidence intervals based on standard errors clustered by bureaucrats, and the number of observations in each bin in parentheses. The specifications include covariates and fixed effects for year of application and country of origin. holds (Table 3, column 7). Overall, these additional tests suggest that the difference between Christians and Muslims is unlikely to be explained entirely by the fact that Christians are more persecuted than Muslims.
In short, these results show that applicants who are Christian are much more likely to be granted asylum than Muslim applicants. In a series of robustness tests, I rule out the possibility that this difference is entirely explained by observable or unobservable omitted variables by the selective assignment of cases to bureaucrats or by the fact that Muslims are less persecuted than Christians.

Bureaucrats' Experience Mitigates Discrimination
I next examine the influence of bureaucrats' experience (measured as the number of past decisions) on discrimination on the basis of religion, education, and skill level. I restrict the sample to the 1,821 asylum applications filed between 2000 and 2014 that were decided by a bureaucrat who started working between 2001 and 2014 for whom I know the number of past decisions. In this sample, the average number of past decisions is 457, the median 312, and the 90th percentile is 1,103 decisions. To estimate the conditional marginal effects, I use the binning estimator proposed by Hainmueller, Mummolo, and Xu (2019), which has two main advantages over the classical linear multiplicative interaction model. First, the binning estimator allows conditional marginal effects to vary across bins of the moderator by relaxing the linear interaction-effect assumption. Second, the binning estimator ensures common support to reliably estimate the conditional marginal effects by constructing bins based on the support rather than on values of the moderator. I thus first divide the number of past decisions into three equally sized groups -a low (1−184), middle (185−486) and high (487−3,220) number of past decisions -and pick the median number of past decisions as the evaluation point within each bin. I then estimate a model that includes, in addition to the covariates from the main specification (Table 3, column 1), interactions between the indicator variables for each of the three bins, the individual characteristics of interest, and the number of past decisions minus the evaluation point picked within each bin, as well as triple interactions of these. I conduct this analysis separately for religion (comparing Muslims to Christians), education (comparing secondary and primary to postsecondary) and skill level (comparing middle and low to high). Figure 2 plots the conditional marginal effects of these three characteristics at low, middle, and high values of the moderator, with 95% confidence intervals. These analyses reveal a consistent pattern for all three characteristics: the marginal effect is the most negative in the lowest bin, and it is reduced in subsequent bins, suggesting that discrimination is most pronounced at lower levels of experience. The pattern is similar when constructing bins based on the values of the moderator (Figure D.3,p. 24, in the online supporting information) and when estimating the conditional marginal effects using generalized additive models (Figure D.4,p. 25). To test whether this reduction in discrimination is statistically significant, I further restrict the sample to the first 486 applications (second tercile of the moderator) of each bureaucrat in the sample and report in Table 4 estimates from the main specification (column 1) and from the binning estimator described above (columns 2 to 4). While Muslims in this subsample are overall 2.7 percentage points less likely to be granted asylum (SE: 4.0 pp.) compared to Christians (column 1), I find a substantial and statistical difference depending on whether the application was decided by an inexperienced or experienced bureaucrat. Muslims are 7.8 percentage points (SE: 4.2 pp.) less likely to receive refugee status than Christians when their case is decided by an inexperienced bureaucrat. But while switching from an inex-perienced to an experienced bureaucrat does not improve Christians' chances of receiving asylum (coefficient: −0.011, SE: 0.023), it increases Muslims' chances by 9.7 percentage points (SE: 3.5 pp.), such that the difference between Christians and Muslims is small and insignificant among experienced bureaucrats (column 2). To ensure that this pattern is not driven by the changing composition of bureaucrats over time, I further restrict the sample to applications examined by bureaucrats who decided at least 486 applications (columns 3 and 4 in Table 4). The results are robust to restricting the sample in this way and to including bureaucrat fixed effects (column 4).
The pattern is similar for educational attainment and skill level (Table 5). Compared to highly educated Notes: This table shows point estimates and standard errors clustered by bureaucrats in parentheses from 8 OLS regressions with individual covariates and fixed effects for year of application and country of origin. The dependent variable is whether the applicant was granted refugee status. The sample is restricted to bureaucrats' first 486 decisions in columns 1, 2, 5 and 6 and to their first 486 decisions if they made more than that in columns 3, 4, 7 and 8. "Characteristics" refers to "Education" in columns (1) to (4) and to "Skill Level" in columns (5) to (8). Reference category for "Characteristics" is high. † p < 0.1, * p < 0.05, * * p < 0.01. and highly skilled asylum seekers, those with lower levels of education and skills are less likely to be granted refugee status when their case is examined by an inexperienced bureaucrat (the relevant coefficients are all negative although not all reach statistical significance at conventional levels), but, consistent with a reduction in discrimination among more experienced bureaucrats, the coefficients of the interaction terms are all positive (although only one reaches statistical significance). In Table D.3 in the online supporting information (pp. 23), I show that applications decided by experienced and inexperienced bureaucrats do not differ systematically, which allows me to mitigate concerns that this result reflects the fact that bureaucrats are assigned different types of decisions during the course of their tenure.
Overall, these patterns suggest that bureaucrats discriminate less as they get more experience, but this reduction is stronger and more robust for religion than for education and skill level. To investigate the time frame during which this reduction is taking place, I also report in Table D.3 in the online supporting information (pp. 23), for each group of applications, the average number of months the bureaucrat who decided the case had spent on the job at the time of the decision, simply counting the number of months between each decision and his first decision. The reduction in discrimination seems to be taking place over the course of roughly a year: in the first bin, applications are being decided by bureaucrats who had spent roughly 9 months on the job, while in the second bin they had spent at least 2 years.

Discussion
What explains this reduction in discrimination as bureaucrats get more on-the-job experience? If discrimination results primarily from the fact that bureaucrats have a preference for different types of applicants, then the reduction could be explained by a change in bureaucrats' preferences on the job, for example, if c B is positive for inexperienced bureaucrats but close to zero for experienced bureaucrats. Contact theory hypothesizes that personal contact across social lines can reduce prejudice (Allport 1974). According to this logic, as bureaucrats interview more applicants, they could become less prejudiced against Muslims, leading to a reduction in taste-based discrimination. It is important to note, however, that the interview (which is the only point of contact between the bureaucrat and the asylum seeker) does not fit the traditional assumptions of contact theory, which maintain that contact should be cooperative, have a shared goal, and be endorsed by communal authority -and that participants should have equal power status. Contact during an interview quite strongly violates all of these assumptions, with the possible exception of communal authority. Indeed, recent research suggests that while collaborative contact reduces owncaste favoritism in the context of cricket leagues, adversarial contact does not lead to a similar reduction in favoritism (Lowe 2021).
If discrimination is instead driven by bureaucrats' prior beliefs about the level of persecution faced by different groups, a change in these beliefs (or in the weight they place on them) over time could also explain the reduction in discrimination observed. Several first-hand accounts of decision-makers at the French Asylum Office and the appeals court suggest that the narratives are relatively uninformative for bureaucrats when they start working at the Asylum Office, but that officials hone their ability to identify credible narratives over time. Reflecting on her time as an asylum officer, Aho Nienne reports: "My colleagues assured me that, with experience, I would acquire a gift that is crucial to our profession: intimate conviction. That indescribable feeling when an asylum seeker lies" (2013). Bureaucrats working at the appeals court confirm that they learn how to infer a sharp signal over time. "Being experienced means to be able to recognize when an applicant is lying or when he is a 'fake refugee,"' one of them testified. As a result, she continued, "At first you do not know what to look for, what to base [decisions] on" (Greslier 2007, p. 8). The intuition is simple: it takes time to be able to single out truly authentic narratives that distinguish themselves through original storytelling and language. As they read more and more applications, bureaucrats become better able to distinguish fake narratives from true ones. A bureaucrat working at the French Asylum Office powerfully illustrated this intuition in anonymous testimony: "When you see for the 80th time the same story written with the same font, the same line spacing, in which just a few details change… Sometimes, there is even the name of another asylum seeker in the story at one place" (Aubry and Le Löet 2019). Bureaucrats' experience thus increases the precision, that is, decreases the variance σ 2 η of the signal they receive when reading narratives. According to Equation (1), as σ 2 η decreases, bureaucrats place more weight on the signal ( σ 2 ω σ 2 ω +σ 2 +σ 2 η ) , than on their prior beliefs ( σ 2 +σ 2 η σ 2 ω +σ 2 +σ 2 η ).

Conclusion
Are European countries abiding by the Geneva Convention and granting asylum to those who fear persecution for reasons of "race, religion, nationality, membership of a particular group or political opinion"? Until now, the lack of microlevel data on asylum decisions has limited researchers' ability to answer this question. Yet 10 years ago the French Asylum Office opened its archives. I exploited this unprecedented degree of transparency to digitize a representative sample of more than 4,000 asylum applications filed over the last 40 years to compile the first in-depth, microlevel dataset of asylum decisions. This dataset facilitates an in-depth examination of whether France is indeed granting asylum to those in need of protection. I compared accepted and rejected applicants to identify the effect of applicants' personal characteristics on the probability of being granted asylum in France. This study provides empirical evidence that Christian applicants are much more likely to be granted refugee status than Muslim applicants. Moreover, I show that these effects are unlikely to be driven by either unobserved differences revealed during the interview process or by the selective assignment of cases to bureaucrats. Importantly, the findings also reveal that bureaucrats can self-correct their established discriminatory behaviors. As they gain more on-the-job experience, they discriminate less on the basis of religion, and to a lesser extent based on education and skill level. Anecdotal evidence suggests that bureaucrats learn over time to distinguish authentic narratives, which is consistent with statistical discrimination. While the narratives provided by applicants are relatively uninformative at first, bureaucrats gradually learn how to assess their credibility, allowing them to place more weight on the narratives than on their prior beliefs. However, the available data do not allow me to rule out the possibility that the reduction in discrimination instead comes from a change in bureaucrats' preferences.
The results of the study have important policy implications for strategies to reduce discrimination at the French Asylum Office. For instance, exposure to past narratives during bureaucrats' training could help them learn to better assess the credibility of persecution claims once they are faced with actual applicants. Moreover, increasing bureaucrats' tenure could reduce the overall level of discrimination. But more research is needed to test the effectiveness of these interventions. This research also opens up a new avenue of study on the effectiveness of possible interventions to reduce the influence of bureaucrats' beliefs on decisions in order to curtail discrimination within administrations.
These findings unambiguously call for increased scrutiny in the attribution of refugee status in France, and within the EU more broadly. Since the Geneva Convention constitutes the common framework for granting asylum across Europe, asylum officers elsewhere on the continent are afforded similar levels of discretion as those in the French office. However, we currently lack the data needed to compare working and training conditions across European asylum offices. Similar data collection in other countries, either by researchers or even the offices directly, should be undertaken to determine whether the discrimination patterns identified in France apply more broadly.