Racial Bias in Fans and Officials: Evidence from the Italian Serie A

Recent scholarship studying the impact of race-based prejudice has emphasized its rampant persistence throughout all aspects of modern society, including the world of sports. Prior research from American leagues has shown that even referees, trained officials intended to enact neutral judgements, are subject to bias against Black and dark-skinned players. To extend these studies and inform policies aimed at combating racial bias in public spaces more broadly, we report results from a unique dataset of over 6500 player-year observations from the Italian Serie A to examine whether these biases persist in European football. Our results show that darker-skinned players receive more foul calls and more cards than lighter-skinned players, controlling for a range of potential confounders and productivity-relevant mediators. By exploiting an absence of fans induced by the COVID-19 pandemic, we also present preliminary evidence that fans may play a key role in inducing poor calls against darker-skinned players.


Introduction
Recent high-profile incidents of racial discrimination and harassment in European football have drawn the attention of activists and researchers as the abuse observed in these cases reaches millions of fans across the globe (Alrababa'h et al., 2021;Campbell and Bebb, 2022).While evidence suggests that race-based biases are widespread in the sport despite efforts to combat their impact, two central questions remain: do racial biases directly impact the performance of dark-skinned players, and, if so, what is the mechanism sustaining these public forms of discrimination?
In the hope of contributing to the causal identification of both direct outcomes of racial biases and their antecedents we report the results of a unique dataset of over 6500 observations pertaining to player demographic and performance data from the Italian Serie A between 2009 and 2021.Unlike previous studies, which have often relied solely on either region of origin or self-coded binary distinctions between racial groups, we instead develop and deploy a novel dataset of player demographics created using an extensive set of skin tone data developed to improve the verisimilitude of an online interactive game, Football Manager, which incorporates the input of over 1300 real-world football scouts from around the globe to ensure the accuracy of its gameplay.The game provides detailed data on player skin tones on a 20-point scale, allowing us to examine the influence of racial biases on a far more granular level than has previously been possible.In combination with performance data taken from sources reporting the outcome of the over 8000 matches that took place in the Serie A between the 2009-2010 and 2020-2021 seasons, we examine differential treatment from referees based on skin colour.
Using this extensive dataset, we find that darker skin tones are associated with an increase in disciplinary action taken by match referees.Specifically, we find that after controlling for potential confounders and productivity-related mediators, players with skin tones in the 85th percentile (mean + 1 SD) receive four more foul calls against them per year than players in the 15th percentile (mean − 1 SD), which corresponds to a staggering 20% more fouls.These results carry over to more punitive outcomes, including being disciplined with yellow cards and game ejections (red cards), though evidence is weaker for the latter less common events.As such, our results show evidence of systematic racial bias against dark-skinned players, which we extend upon by drawing from the theoretical foundations of the colourism literature to detail this understudied type of public discrimination.We contribute to existing work by providing extant theories of colourism with evidence from a unique setting while making additional theoretical contributions by highlighting potential mechanisms sustaining the persistence of skin tone biases in public settings.Finally, in addition to providing evidence of bias, we show, using the absence of fans from stadiums during the 2020-2021 season, that these deleterious outcomes may not entirely be the result of individual referee bias.Rather, our findings suggest that crowds might be responsible for exacerbating the occurrence of racial discrimination in public spaces by inducing more punitive actions by referees.Our results support studies from real-world settings, which have linked race-based discrimination in official judgements to adverse outcomes in the fields of healthcare (Hall et al., 2015) and legal decisions (Breger, 2019).Additionally, we illustrate how novel datasets drawn from the growing gaming industry can be used to elucidate real-world effects.Finally, we contribute to a growing literature that has drawn on the ubiquity of sports to elucidate the persistence of discrimination based on national (Back et al., 2001;Carrington, 2010;Coupe et al., 2018;Dawes and Rubenson, 2021;Rosenzweig and Zhou, 2021;Zitzewitz, 2006) and racial (Farrington et al., 2017;Parsons et al., 2011;Price and Wolfers, 2010) identities.

Background
Racist abuse in Italian football stadiums remains a serious problem.In 2014, Carlo Tavecchio, former head of the Italian Football Federation, referred to a fictitious African player as a 'banana eater' in a speech held at an assembly of Italy's amateur leagues.On numerous occasions, fans have been recorded shouting monkey chants at Black players, and players have been targeted with racial abuse online.In response to these incidents, in September 2020 the Lega Serie A and UNAR (National Anti-Discrimination Office of the Italian Government) launched the 'Keep Racism Out' campaign. 1The campaign is a first step at combating all forms of discrimination in sports, and includes the creation of a permanent Corporate Social Responsibility (CSR) Commission, tasked with identifying the lines of action to eradicate racism from stadiums. 2 Thus far these initiatives have failed to minimize the problem, as has been evidenced by an incident in October 2021 in which Napoli defender Kalidou Koulibaly was called a 'monkey' by Fiorentina supporters in Florence.In the aftermath, the Lega Serie A reiterated its desire to take a tougher stance on racism in stadiums, promising new measures that will ban racist fans from all football events in all stadiums across Italy indefinitely. 3Though the final decision remains with individual clubs, Juventus, one of Italy's premier teams, recently demonstrated how this policy may look in practice when it indefinitely banned one of its supporters for racially abusing AC Milan goalkeeper Mike Maignan.
Though these cases and the patchwork of disconnected policies that they have inspired have drawn attention to issues of race and football in Italy, similar incidents of racial discrimination and abuse have occurred in nearly all of Europe's top leagues.This includes the continent's premier competition, the Champions League, which saw one of its games suspended in the middle of a 2020 match in between Paris Saint-Germain (France) and Istanbul Basaksehir (Turkey) due to the racial abuse of one of Basaksehir's Black assistant coaches.Cases of racial abuse against players have been documented during matches in England, Germany, France, Italy, Scotland and Belgium during the most recent season of play alone.To scholars of racial politics, the continued occurrence of racist abuse by both fans as well as opposing players and coaches throughout Europe both reflect and reinforce the persistence of racial language and discrimination in public life.
These incidents are not just limited to the realm of football in Italy.Two of the main Italian political parties, the League and Brothers of Italy (respectively polling at 19% and 20% as of November 2021), continuously campaign on an 'Italian first' platform, reflecting pervasive societal racism.More than half of the Italians surveyed in a SWG poll from 2019 said that racist acts were either sometimes or always 'justifiable', a finding that came after a series of high-profile racist and antisemitic incidents across the country (Giuffrida, 2019).Furthermore, a Pew Research survey from 2019 suggests that negative views towards minority groups, in particular Muslims, persist in Southern Europe, especially in Italy, where a solid majority of respondents have unfavourable views of these groups (Wike et al., 2019).Given this context, the prevalence of racist sentiments in sports is unsurprising.
Previous research has aimed to use the contained ecosystems of sporting competitions to illustrate how racial bias can affect the decisions of officials trained to promote fair play.The rigorous qualifications required to become a Serie A official should make the league a difficult case for the identification of racial bias, with requirements including the passage of a famously difficult exam, persistent performance evaluations by the Italian Football Federation and an average of 10 years of in-game experience.Despite similar levels of training to ensure neutrality, several articles from the USA have shown how officials have allowed racial biases to alter their performance in favour of lighterskinned players (Foy and Ray, 2019;Macri, 2012;Robst et al., 2011).These biases against out-group members are evident in European football, where none of Italy's first division referees are Black, Asian or Minority Ethnic (BAME).These discrepancies carry over to other top leagues, with less than 2000 referees out of approximately 28,000 in the English Football Association identifying as BAME. 4 While these articles and descriptives suggest bias is likely, previous works examining performance bias have often been limited due to complications with data and the categorization of players (Eiserloh et al., 2010).
To extend this work we examine whether these highly trained professionals operating in contained environments conducive to neutral arbitration exhibit biases against darkerskinned players.Our results have implications for the management of officiating and the impetus of anti-racism campaigns in professional sports, but also for everyday interactions off the pitch.In our view, high-level European football should represent a difficult case for the identification of racial bias given the safeguards in place and scrutiny placed upon every decision made by match officials.By examining whether bias persists despite these checks we contribute to ongoing conversations taking place in both supporters' pubs, stadiums and the halls of European parliaments regarding the persistence of racial biases in modern life.

Theory
Several studies have attempted to examine the effects of racial bias on performance in modern football (Buraimo et al., 2012;Campbell, 2013;Gallo et al., 2013), though we know of no studies that have analysed these effects in the Italian context.While certain limitations are discussed in the next section, the underlying theories informing these cases have informed the policy prescriptions developed to combat these biases.As a study of racism in the English Premier League noted, evidence of on-field bias, or disparities in the statistical outcomes of players based on differential treatment of players based on their race, could hypothetically be linked to 'racist referees' if adequate data were available (Chu et al., 2014(Chu et al., : 2929)), harkening back to the overt racism associated with the 'hooliganism' culture prevalent throughout the 1970s and 1980s (Back et al., 2001: 23).This focus on personal biases has shifted both the attention and responses of anti-racism activists towards individualized actions (Petersen and Wichmann, 2021).This emphasis makes sense given the autonomy given to referees to adjudicate football games, as it is estimated that referees make between 200 and 250 'foul/no foul' decisions per game (Plessner et al., 2009), requiring a decision roughly every 22 seconds. 5In such environments, though blatant discrimination would likely be detected by league officials, implicit biases, or biases that are unconscious and unintentional, may be sustaining an unequal playing field in favour of white players.
Contributing to these potential biases are well-documented historical stereotypes linking dark skin with physical dominance, which is then often juxtaposed with white intellectual superiority (Haslerig et al., 2020).As expertly traced by Carrington (2010), these connotations became more prominent in part due to the dominance of Black athletes such as Jack Johnson in the early 20th century, whose world title solidified existing narratives regarding the 'innate' talent of Black athletes (Miller, 1998: 125).Similar strains of social Darwinist thought have persisted (Campbell and Bebb, 2022), aiding our understanding as to why referees may be primed to interpret the actions of darker-skinned players as more aggressive and less strategic than their white counterparts. 6 Evidence that individual officials exhibit related biases against darker-skinned athletes has been found in the United States in studies of baseball (Hamrick and Rasp, 2015;Parsons et al., 2011), professional basketball (Price and Wolfers, 2010) and collegiate basketball (Dix, 2019).Relatedly, there is evidence that bias is common in managerial decisions, with racial discrimination impacting decisions related to hiring, playtime and career length (Campbell, 2020;Ducking et al., 2015).These quantitative examinations of bias are reflected in several illuminating pieces from the sociology of sport detailing the lived experiences of athletes facing discrimination both within and outside the game (Campbell, 2020;Carrington, 1998;Long et al., 2009).Given the extensive evidence of bias across sports and processes, we acknowledge the role of personal biases in the judgement of even highly trained officials and expect this pattern to persist in the case of Italian football.
We further expect these biases to extend beyond binary categorizations of race.Colourism, or skin tone 'stratification' (Keith and Herring, 1991: 760), is a type of discrimination that places additional value on the lightness of individual skin tones (strmicpawl et al., 2021).Evidence of colourism has linked darker skin with deleterious outcomes in areas as disparate and consequential as household income and wealth (Bodenhorn, 2006;Dixon and Telles, 2017;Goldsmith et al., 2006), educational outcomes (Blake et al., 2017;Keith and Herring, 1991;Ryabov, 2016) and health (Bodenhorn, 2002;Hargrove, 2019;Monk, 2021).Unsurprisingly, given the ubiquity of this pernicious form of bias, evidence of colour-related biases has also been found to induce prejudices in the world of sports (Foy and Ray, 2019;Furley and Dicks, 2014;Mills et al., 2018).While recent studies have attempted to isolate the impact of colourism, endogeneity issues, including correlations with other forms of capital, have complicated causal studies (Foy and Ray, 2019: 730).As a result, we hope our research provides additional insight into biases related to skin tone that are less easily distinguishable in alternative settings by leveraging our unique dataset and the extreme levels of quantification associated with contemporary professional sports.These theoretical underpinnings collectively lead us to our first hypothesis: Hypothesis one: bias against darker-skinned players has likely resulted in unfair patterns of refereeing, including the distribution of a greater number of foul calls, yellow cards and ejections (red cards).
For those unfamiliar with football, fouls occur when a player is judged to have conducted himself in a manner deemed beyond the bounds of fair play.Fouls can be given for several infractions, including excessive physical contact with opposing players, dissent (a form of insubordination towards the officials) and illegal contact with the ball, among others.Yellow cards are given as a warning to players that their conduct may result in expulsion from the game, while red cards are given to show that a player has been ejected from the game in question and may face a longer suspension.For each of these actions, we expect to find evidence of bias in the form of a disproportionate number of each of these outcomes, which are often linked to interpretations of 'aggressive' play, called against darker-skinned players.As previous work has shown that attention to ingame biases can lead directly to changes in fairness (Pope et al., 2018), we hope that identifying persistent gaps will lead to improvements in how the game is officiated.
While previous studies have relied on individualized explanations for bias, a separate field of study has examined how a variety of factors can influence the advantages given to 'home' teams by officials.Unlike the explanations for racial biases, this literature has emphasized the influence the crowd can have over an official's decisions (Dohmen and Sauermann, 2016).Evidence that home crowds bias referees towards home teams includes evidence from football (Goumas, 2014), basketball (Boudreaux et al., 2017) and hockey (Guerette et al., 2021).Though the mechanism by which crowds influence referee decision making continues to be debated, the most widely held view is that crowd noise can tip referees towards favourable decisions in high-leverage situations (Allen and Jones, 2014;Ribeiro et al., 2016).Given the extent to which racial incidents include members of the crowd and groups of fans, we expect that crowds also play a role in biasing on-field decisions against darker-skinned players, particularly when tied to both recent incidents of racial abuse and historical overlap between supporters' groups and white nationalist thought (Back et al., 2001).This discussion leads us to our second hypothesis: Hypothesis two: officiating bias against darker-skinned players is exacerbated by the influence of crowds.
We expect the biases of home supporters to also exhibit similar patterns of bias against darker-skinned players as they do for home teams.Rather than home and away crowds, for which we unfortunately do not have data, we rely on the absence of fans due to the COVID-19 pandemic during the 2020-2021 season.As studies outside of sports have shown, racial biases are prevalent in decision-making processes in the fields of health (Marcelin et al., 2019;Wildeman et al., 2014), education (Huang and Cornell, 2017;Nance, 2017) and housing (Fennell, 2017;Greenberg et al., 2016).If trained officials are unable to maintain a fair balance between players due to biases against dark-skinned players (Hypothesis one), we find it likely that untrained, partisan crowds also bias their conduct against dark-skinned players, as many visceral examples of deplorable racism led by fans suggest.
As with our first hypothesis, we view the implications of this second question as reaching beyond sports.If crowds are implicated in biased decision making against darker-skinned players this would support investment in a set of policies, both in football and beyond, that can augment the work being done to limit the influence of individual 'bad' actors and implicit bias by extending their focus to the collective influences that induce discrimination.As studies of racism in groups and crowds have shown, rather than tamping down biases, the anonymity provided in crowds may heighten erstwhile unactioned biases against members of other racial groups (Arnold and Veth, 2018).We discuss the real-world implications of both hypotheses in more detail in the discussion section below.

Data
Several previous studies on racial bias in sports in both the American and European contexts have relied on incomplete data due to issues related to availability and difficulties with case selection, as noted by Eiserloh et al. (2010).An example comes from longitudinal difficulties related to data collection, for which studies have typically looked at no more than a single season of data.Complicating matters further, many studies have also relied on broad categorizations of players' ethnicities based on either country or even region of birth (with all South Americans being classified as non-white and all Europeans classified as white, for instance).
To account for these issues, our data contain information for each player in the Serie A from the 2009/2010 season to the 2020/2021 season.Whenever a player played in more than one team in a season, we assigned him to the team in which he played the most minutes, so that our unit of observation is player-team-season, for a total of 6533 observations, that encompass every player to have played in the league during this period.
We used three versions of the Football Manager videogame (Football Manager, 2011, 2018, 2021) to collect data on player skin tones.Football Manager's skin tone data are collected, reviewed and ratified by Sports Interactive's global network of 1300 researchers to ensure the accuracy of its virtual gameplay.Using these data, we matched each player present in our dataset from 2009 to 2021 to the same player in the Football Manager Editor and recorded their skin tone. 7This skin tone variable is a continuous variable that ranges from 1, lightest skin tone, to 20, darkest skin tone.Figure 1 shows the distribution of skin tone in the total dataset, where the majority of players have lighter skin.
For our dependent and control variables, we used multiple sources of data. 8For red and yellow cards, we used data from Footystats (2021). 9Fouls were available from WhoScored (2021) and from FBREF (2021).We run our models with both sources and the results do not differ significantly (we use WhoScored data in the main models, while models with FBREF data are only reported in the Online Appendix in Table A9).The number of attempted tackles for the entire period of our analysis was only available from WhoScored (2021).The use of tackle data allows us to plausibly control for differences in playing styles as well as propensity towards violent conduct as discussed in Miguel et al.'s (2011) excellent article.For players' position we use data from FBREF, and the four broad categories we use are: goalkeepers, defenders, midfielders and forwards.In all our models we decided to exclude goalkeepers as they have a far lower probability of committing fouls and are also very unlikely to tackle other players, which is our proxy for aggressive play.Figure 2 shows the distribution of fouls, yellow and red cards, our three dependent variables.To test whether part of the racial bias effect may be driven by crowds, we also looked at the pre-COVID period (Seasons 2009-2010 to 2018-2019) when there were fans at the stadium, and post-COVID (Season 2020-2021), without fans, separately.We excluded Season 2019-2020 from this analysis since out of a total of 38, between 12 and 14 matches were played with no fans, while in 2020-2021 all matches were played with no fans.Since our data are at the player level we do not know in which matches specifically the player got the red and yellow cards or committed the fouls, hence we cannot break down 2019-2020 between matches with fans and no fans (this is also the same reason why we unfortunately do not have access to home and away data separately, as the data we have are at the player, and not at the match level). 10Table 1 shows a snapshot of what the final dataset looks like.

Methods
For our main analyses, to investigate the effect of skin tone on referee calls for fouls, yellow cards and red cards, we use ordinary least squares (OLS) regression, 11 and we control for player's position (since players playing in certain positions may be more likely to commit fouls and get cards), attempted tackles (as a proxy for aggressive play) and minutes played (since a player that plays longer incurs a higher chance of committing fouls and getting a card). 12We also include season, team and country of origin fixed effects.Team fixed effects account for unobserved, time-invariant team-level heterogeneity.Season fixed effects account for common shocks affecting all teams.Finally, country of origin fixed effects reduce concerns that higher numbers of infractions for darker-skinned players might be linked to other types of discrimination based on one's country of origin; this would be the case if players with darker skin were more likely to come from countries against which there is more prejudice for reasons other than solely skin colour.This means that comparisons are within season-team-country cells: effectively we are comparing two players playing in the same position (e.g.defenders), in the same team (e.g.Atalanta), in the same season (e.g.2009-2010), from the same country (e.g.Italy), while holding minutes and tackles constant and only varying skin tone.Since the data on yellow and red cards are count data and most of these data are concentrated on a few small discrete values, in the Online Appendix we also report results using a Poisson regression instead of OLS.Finally, all standard errors are clustered at the player level since many of the same players appear in multiple football seasons, and the error terms would hence not be independent.
Finally, to test whether part of the racial bias effect may be driven by crowds, we compare the pre-COVID and post-COVID periods using two separate OLS regressions rather than one regression with an interaction term between skin tone and pre-post COVID-19, since it is possible that the effect of some of the covariates (such as    aggressiveness -proxied by total attempted tackles) may also vary differentially in response to the presence of fans.These models feature the same control variables as the main analyses. 13

Results
Findings from our OLS models suggest that skin tone does affect referee decisions, especially with respect to fouls committed and yellow cards, and more weakly with respect to red cards.Specifically, Figure 3a suggests that referees are more likely to call more fouls for players with darker skin. 14Figure 3a shows the predicted number of fouls for a player during an average season. 15Moving from a lighter-skinned player (three out of 20, which corresponds to the mean skin tone -1 SD, or the 15th percentile) to a darkerskinned player (17 out of 20, which corresponds to the mean skin tone + 1 SD, or the 85th percentile), while keeping everything else constant, leads to four more fouls being called by the referee, in this specific case from 21 to 25 total fouls, or an increase of 20%.
Looking at yellow and red cards tells a similar story, although the coefficient for red cards is not statistically significant at the conventional 95% level.Substantively, keeping everything else constant as above, moving from a lighter-skinned player to a darkerskinned player leads to an increase in yellow cards of 0.4 and an increase of red cards of 0.03 (see Figure 3b and 3c for predicted values of cards).For yellow cards, this would translate into a lighter-skinned player getting 3.5 yellow cards, while his darker-skinned counterpart would be getting 3.9 cards, or 11% more cards.In terms of red cards, while the lighter-skinned player would get 0.19 red cards, his darker-skinned counterpart would get 0.22, or 16% more.Since we are dealing with count data and most of these data are concentrated on a few small discrete values, we also run Poisson regressions and report results in Table A5 of the Online Appendix, which paints a very similar picture, as darker-skinned players are more likely to get both yellow and red cards -although the coefficient for the latter is again not statistically significant.Next, we turn to the results of our regressions pre-and post-COVID, to ascertain the role of crowds in racial bias. 16Findings suggest that before COVID, with full attendance of matches, there is evidence of racial biases, especially with respect to fouls and yellow cards.After COVID, the size of the effect shrinks and loses significance.None of the three coefficients, for fouls, yellow or red cards, can be distinguished from zero.One concern may be that post-COVID the effects of racial biases disappear merely due to the sizeable inflation of standard errors, likely resulting from the much smaller sample size, since, although we have the entire population of players, we only have one season without fans.However, the results suggest that post-COVID, it is not just the confidence intervals that become wider due to larger standard errors, but the effect sizes also change.Figures 4-6 illustrate this well. 17The change in the number of fouls called against a player with darker skin, compared with one with lighter skin, is lower by over 30% post-COVID, when no fans were present during matches -from 3.9 pre-COVID to 2.7 post-COVID.Moreover, the change in the numbers of both yellow and red cards received by a player with darker skin compared with one with lighter skin becomes negative post-COVID, while they are positive pre-COVID.

Discussion and Conclusions
The permeation of football by racial biases is unfortunately unsurprising given the prevalence of race-based discrimination across many other settings.By constructing a comprehensive dataset of player skin tone and performance data we advance the existing literature to examine racial biases against players in the context of a European league for which discrimination and racial abuse have become notorious.In addition to examining this issue with new data in a novel context, we contribute additional evidence that elucidates potential mechanisms sustaining biases against darker-skinned players despite the sport and the league's efforts to promote equality, while illustrating how novel datasets drawn from the gaming industry can be used to extend analyses interested in examining real-world phenomena.
Our results suggest that the effects of racial biases in the Italian Serie A are striking.Darker-skinned players are judged to have been guilty of both minor and major infractions at higher rates than their lighter-skinned teammates.Furthermore, controlling for country of origin, we effectively show that this is about skin colour and not just nationality.This should not come as a surprise to those knowledgeable of the racist incidents that striker Mario Balotelli -born in Italy to Ghanaian parents -was subject to while playing in both Serie A and the Italian national team, which includes being subjected to the crowd at a 2018 match between Italy and Saudi Arabia revealing a racist banner that read: 'My captain has Italian blood.'The grey line shows the 95% confidence interval, while the black line shows the 90% confidence interval.White markers show non-significance at the 95% confidence level.
Our findings have several implications for the policies of the Italian Football Federation and European football more broadly.These results show that despite the expansion of both FIFA (Fédération Internationale de Football Association) and UEFA (Union of European Football Associations) anti-racism campaigns, these race-based biases have persisted both in the stands and on the pitch.Though the focus of these efforts has been on a reduction of racial incidents, our study suggests that there remains bias in the way the game is adjudicated, which has disadvantaged darker-skinned players.Given the importance of fouls and card decisions to game outcomes and manager player selections, it is highly likely that these biases have altered both game results as well as the subsequent salary negotiations of darker-skinned players.Furthermore, as most of what happens in football is not recorded directly in statistics measures, our results suggest that it is likely that bias has also played a role in decisions related to highleverage offside calls as well as out-of-bounds and goal decisions.Finally, we hope these findings contribute to the wider literature on the impact of colourism by exposing the persistence and nuances associated with biases related to skin tone across contemporary society.
Our results also go beyond previous studies by probing these biases for the underlying mechanism.Several previous studies into the bias of referees have focused on individual forms of bias, which was assumed to be at fault given the supposed individualized nature of in-game decisions.As a result, the prescriptions for change include programmes such as referee bias training.We have theorized that rather than racial bias resulting solely from individual actions, this form of bias may include a collective component.The outputs of our models suggest that bias against players could be driven at least in part by crowd participation in inducing biased calls by referees, even after controlling for style of play.Given the incidents that have occurred both off the pitch and in the crowd during football games across Europe, we find this unsurprising.If trained referees are unable to fully limit their biases against darker-skinned players, we find it unlikely that untrained and unmotivated crowds can constrain their biases.
While our results are not conclusive, they illustrate a central channel from which collectives can induce biased actions against darker-skinned players that is likely to persist beyond the pitch, and provide strong evidence for further research into the role of crowds in perpetuating racial bias.This result also suggests that Serie A officials should prioritize bans against fans implicated in racist chants and harassment, as the league has attempted to prioritize in recent seasons.Moreover, it suggests that programmes such as UEFA's 'three step procedure', which focuses on improving how responsive referees are to racial harassment, may need to be broadened to create a role for specialized officials away from the on-field environment to identify and respond to racial discrimination.As previous studies have shown that attention to in-game biases can lead directly to improved fairness in referee decision making (Pope et al., 2018), we hope meaningful attention directed towards attenuating this gap on behalf of European football will lead to sustained changes in how the game is officiated, ultimately improving conditions for darkerskinned players in the Serie A, other football leagues, and other sports.
10.Even if 2019-2020 data were available, one concern about analysing the matches with fans against the matches with no fans within the same season is that the two parts of the season may not be comparable as matches at the end of the season may be very different from those at the beginning and in the middle (for instance with regards to parallel obligations related to participation in European cups or Coppa Italia matches).11.In the Online Appendix we also account for the possibility that the relationships between skin tone and fouls and cards may not be linear.12. Since our data are at the player-season level, we do not have information on referees by match.
Referees in the Serie A are not randomly assigned to matches.However, since all referees in the Serie A are white, for this to be a concern for our results, more racist referees would have to be assigned with a higher probability to matches with higher numbers of darker-skinned players.Although this is possible, we do not find this to be a plausible explanation.We do not control for variables that are likely to be post-treatment and that would hence bias our coefficients.These include tackles won, which, as opposed to total attempted tackles, could be affected in part by the referee's biased judgement in calling fouls, and salary or transfer market value, which could be affected by racial bias.
In the Online Appendix we also address an alternative explanation: that darker-skinned players may play more aggressively, possibly as a result of racist chants against them.Contrary to this alternative hypothesis, we find suggestive evidence that darker-skinned players may play less, not more, aggressively, possibly due to their higher likelihood of being called for infractions (as detailed by our results).13.To alleviate the concern that the post-COVID season with no fans (2020-2021) may be different due to the introduction of VAR (video assistant referee) rather than due to the absence of fans, in the robustness checks section of the Online Appendix we run the same OLS models for the two post-VAR years (2017-2018 and 2018-2019), but before COVID-19.We find that the effect of racial bias is as large and significant for these two years.14. See Table A1 in the Online Appendix for tabular results.15.We use R's function ggeffect to compute predicted values; ggeffect holds numerical variables, like minutes and total attempted tackles, constant at their means, while it computes a kind of average value for factors, which represents the proportions of each factor's category.16.See Tables A6 and A7 in the Online Appendix for results on yellow and red cards using Poisson regression.Findings are not significantly different.17.See Tables A2 and A3

Figure 3 .
Figure 3. Predicted number of (a) fouls, (b) yellow cards and (c) red cards per season by skin tone, holding everything else constant, with 95% confidence interval.

Figure 4 .
Figure 4. Change in number of fouls called against a player with darker skin (17 out of 20: mean skin tone + 1 SD) compared with one with lighter skin (three out of 20: mean skin tone -1 SD) in seasons with fans (2009-2010 to 2018-2019) and without fans (2020-2021).The grey line shows the 95% confidence interval, while the black line shows the 90% confidence interval.White markers show non-significance at the 95% confidence level.

Figure 5 .
Figure 5. Change in number of yellow cards received by a player with darker skin (17 out of 20: mean skin tone + 1 SD) compared with one with lighter skin (three out of 20: mean skin tone -1 SD) in seasons with fans (2009-2010 to 2018-2019) and without fans (2020-2021).The grey line shows the 95% confidence interval, while the black line shows the 90% confidence interval.White markers show non-significance at the 95% confidence level.

Figure 6 .
Figure 6.Change in number of red cards received by a player with darker skin (17 out of 20: mean skin tone + 1 SD) compared with one with lighter skin (three out of 20: mean skin tone -1 SD) in seasons with fans (2009-2010 to 2018-2019) and without fans (2020-2021).The grey line shows the 95% confidence interval, while the black line shows the 90% confidence interval.White markers show non-significance at the 95% confidence level.
in the Online Appendix for tabular results.Global Politics from the London School of Economics and a MA in Political Science from the University of Washington.His research interests focus on the challenges posed to democratic consolidation efforts by new communications technologies, interpersonal violence and deficient public goods provision.His research has appeared in journals such as Nature Human Behavior and the Journal of Quantitative Description as well as popular outlets such as The Conversation.Date submitted January 2022 Date accepted October 2022

Table 1 .
Snapshot of the final dataset, including all the variables of interest.