Do Experts and Laypersons Differ? Some Evidence from International Classical Music Competitions

Do expert juries and laypersons differ in their judgement of quality? We investigate this question in the context of classical music performance, taking advantage of the fact that in many international music competitions, lay audiences as well as expert juries award prizes. This allows us to proxy for shared preference as a situation in which the audience prize and the jury’s first choice are awarded to the same person or group. Using data on 370 competition-events held between 1979 and 2021 across nine competition-types (e.g., piano, strings, woodwinds and brass, voice, etc.) hosted in 22 different countries, we find that jury and audience preferences match 38 percent of the time, and that the match between audience and jury varies across competition-types and countries. We then explore gender bias and host country bias as possible explanations for the divergence between expert jury and audience preferences by comparing juries’ first ranked participants and audience prizewinners with other competition finalists. We find that being female and domestic (i.e., from the competition host country) reduce the likelihood of being a jury’s top choice but have no impact on the likelihood of being an audience prizewinner. We also find heterogeneity in the extent of jury bias across competition-types. Our findings are suggestive of the possibility that expert juries are more biased than lay audiences along the dimensions of nationality and gender.


Introduction
For many goods and services where quality is unobservable ex ante, the opinions of experts and laypersons may matter substantially for those whose products are being judged. Before going to the theatre or bookstore, we typically ask friends for their opinion, and read what the critics have to say. We often consult Yelp as well as the Michelin Guide before dining at a new restaurant.
In the labor market, recruiting a new employee may involve interviews conducted by experts in the field, as well as their more generalist colleagues. But do expert and non-expert opinions differ? And do their opinions reflect hidden biases or prejudices, biases that may be especially easy to indulge when quality is inherently subjective? A substantial literature documents the existence of bias along the dimensions of gender, race, and nationality in settings as varied as labor markets, housing, and sports (e.g., Page 1995;Neumark 1996;Goldin and Rouse 2000;Bertrand and Mullainathan 2004;Krumer, Otto and Pawlowski 2021). However, evidence on the nature and extent of expert versus non-expert bias is limited, perhaps because it is not easy to measure the difference in expert and non-expert opinion.
We investigate these questions in the context of international classical music competitions, an environment in which experts and non-experts have strong opinions about quality, the benefits of winning are potentially large (Ginsburgh and Van Ours 2003), and the assessment of quality is highly subjective. Our study exploits the fact that in some music competitions, experts (juries) as well as non-experts (audiences) award prizes to their favored performers. Juries rank the finalists and select a first-place winner. Audiences, meanwhile, are given an opportunity in these competitions to select a performer for an "audience prize." This setting provides us with a natural way to measure expert-layperson agreement or disagreement: experts and laypersons "agree" when the first-place prizewinner is also the audience prizewinner; if not, they "disagree." Additionally, because audience prizewinners and first-place prizewinners are chosen from the same pool of competitors (i.e., finalists), we can compare these prizewinners with other competitors to see if they differ along observable margins in which bias may manifest itself, namely competitor nationality and gender. This allows us to investigate the possibility of bias on the part of experts and audiences. We focus on nationality and gender bias because studies suggest that these are important margins of discrimination in markets for cultural goods like sports and music (e.g., Goldin and Rouse 2000;Krumer, Otto and Pawlowski 2012).
Our key findings are as follows. Using data on 370 competition-editions held between 1979-2021 across nine competition-types (piano, organ, strings, voice, wind, chamber, conducting, composition, and percussion) hosted in 22 different countries, we estimate that audience and jury preferences match 38 percent of the time, and that this match varies significantly across competition-types and competition host-countries. We then explore gender and host-country bias as possible explanations for the divergence between audience and jury preferences by comparing audience prizewinners and first-place winners with other competition finalists. We find that being female and domestic (i.e., from the competition-host country) reduce the likelihood of being ranked first by the jury but have no impact on the likelihood of winning an audience prize. These effects are heterogenous across different competition-types. Taken as a whole, our findings are suggestive of the possibility that expert juries are more biased than lay audiences.
Our work is related to several literatures that concern the value added of winning an international music competition (Ginsburgh and Van Ours 2003); the gender, nationality, or regional biases of experts in contexts as varied as orchestra auditions (Goldin and Rouse 2000), football (Sutter and Kochner 2004) and ski jumping (Krumer, Otto, and Pawlowski 2021); the possibility of home bias and/or home field advantages in sports (Singleton, Reade, Rewilak, and Schreyer 2021) and music (Ginsburgh and Noury 2008;Kokko and Tingvall 2012;); and differences in the judgments of experts and non-experts in music (Haan, Dijkstra and Dijkstra 2005) and punitive damages (Hersch and Viscusi 2004). Consistent with these studies, we find that experts have biases along the dimensions of gender and nationality. However, the direction of the bias we uncover is slightly different. While our evidence is consistent with expert bias against females (e.g., Goldin and Rouse 2000), we do not find that nationality-bias on the part of experts operates in favor of the competition host country, nor do we find evidence of non-expert bias against women. Additionally, while many studies find that non-experts prefer participants from their own country, we find that classical music competition audiences have no such nationalistic bias.
Accordingly, our study suggests that the extent and nature of bias among experts and non-experts will depend on the context in which competition takes place.
The remainder of our paper is structured as follows. In section 2 we briefly discuss the history and structure of international classical music competitions. We present our data set in section 3.
In section 4 we estimate the audience-jury match rate (i.e., the share of competition-editions in which the audience prizewinner is the same contestant as the jury's first choice), and we show that the match rate varies across countries and competition-types. In section 5 we investigate the possibility of gender and host-country bias among juries and audiences. This is followed by a conclusion in which we interpret our findings and discuss their implications.
2. International competitions in classical music: history and structure Music competitions have been a feature of the European cultural landscape since ancient times. Greek shepherds in antiquity participated in a type of singing competition known as amoebaean singing. In the 13 th and 14 th centuries, troubadours took part in competitions held in towns and courts in southern France and northern Spain. In the 18 th century, virtuoso keyboardist-composers like Scarlatti, Handel, Bach, Mozart, and Beethoven, were known to compete with other musicians to demonstrate their technical mastery and skill at improvisation. However, formalized competitive performance at an international level did not arise until the late nineteenth century, with the advent of the Anton Rubinstein Competition, the first semi-annual competition for piano performance and composition, which took place in five editions between 1890and 1910(McCormick 2015Kwok and Dromey 2018 While music competitions are heterogeneous structurally, they do share certain common features. 2 First, they are open to all potential applicants, regardless of nationality, subject to an age restriction (usually between 16-35 years old), with the upper age limit depending on the specific competition (some are geared toward younger participants) and type (piano and string competitions generally have a lower upper age limit than voice and wind competitions). A "typical" applicant is a young professional musician in her early to mid 20s, recently graduated (or soon to graduate) from a conservatory or music college. Applicants are required to submit a recording / DVD / video along with an application fee (often around $200, as of 2019). A jury then scrutinizes these recordings (sometimes blinded, sometimes not) and selects a subset of the applicants to participate in live rounds which take place at the competition host country. While some competitions cover travel expenses, participants typically pay for their travel to the competition and are billeted by a host family.
In general, there are two to four live rounds, although two rounds are most typical, with competitors eliminated after each round. Live rounds are usually open to the public. After the final round, a jury deliberates in secret and ranks finalists (i.e., first place, second place, third place, etc.). The number of finalists can vary, depending on the competition. Some finalists are unranked; sometimes ranks are shared; sometimes nobody is awarded the first rank. The value of the prizes awarded by the jury depend on rank in the usual way, with higher ranked participants winning disproportionately larger shares of the total pool, and the total pool depending on the Cliburn Competition , the first prize winner will receive $100,000 USD, the second $50,000, the third $25,000, with other finalists receiving $10,000 each and an audience prize of $2,500. 3 For the 2022 Busoni Competition, first, second, third, fourth, fifth and sixth prizewinners will receive €22,000, €10,000, €5,000, €4,000, €3,000, €25,00 with €3,000 being awarded to the audience prizewinner. 4 In addition to prize money, higher ranked participants may also gain access to representation by a musicians' agency, recording contracts, and performance opportunities. If an audience prize is awarded, the audience prizewinner selected from the pool of finalist and voted upon while the jury deliberates. Accordingly, the audience prizewinner is selected by the audience from the same pool as the first prizewinner, in principle without knowledge of the jury's ranking (and vice versa).
Jurists at international classical music competitions are selected by competition organizers and consist of a mix of prominent performers and pedagogues (usually music professors) in the discipline of the competition. A jury usually consists of 7-13 individuals. They are often former competition winners and finalists, many years after the fact. Jurists are drawn from all countries, but typically a plurality are from the competition host-country. 5 How jurists' preferences are aggregated varies by the competition. Additionally, while secret, their deliberations are sometimes highly fraught and dramatic, with individual jurists taking strong stands for or against a particular performer (Kwok and Dromney 2018). 6 Competition audiences include classical music fans, concert organizers, musicians' agents, and other industry insiders, as well as local volunteers who are committed to the organization. Accordingly, while most audience members are not experts, per se, they are nevertheless highly 3 See https://cliburn.org/2022prizes/ (accessed on 13 March 2022). 4 See https://www.busoni-mahler.eu/competition/en/busoni-prizes/ (accessed on 13 March 2022) 5 For the competitions in our sample, which are all affiliated with the World Federation of International Music Competitions (WFIMC) a majority must be from outside of the host-country. Additionally, the WFIMC recommends that juries have at least seven members. See Kwok and Dromey (2018). 6 For examples, see Isacoff (2015). informed afficionados. At highly prestigious competitions, many audience members are from abroad; high profile competitions can be a tourist attraction for some cities. However, in most competitions, the bulk of the audience are from the competition host country (McCormick 2015, 204-213).

Our data set
To our knowledge, there is no directory of the universe of international classical music competitions. Accordingly, we base our sample on competitions that are affiliated with the World For each competition that is currently affiliated with WFIMC, we collect (manually) data on each competition-edition, taken from each competition's website as well as other online sources.
We classify each competition-edition by type (i.e., strings, piano, organ, chamber, etc.,) and by competition host-country. Additionally, within each competition-edition, we gather the names, nationalities, gender, and rank of all competition participants. In the few cases in which nationality is not listed, we found nationality by searching online sources. In cases where gender is not listed, we used an algorithm to identify gender based on the competitor's first name. We also took note if a competitor won an audience prize. For this project, we restrict attention to competitioneditions in which an audience prize was awarded. Additionally, because audience prizewinners in our sample are selected from the pool of finalists, we restrict attention to competition participants who made it to the final round.
Our sample includes 2,007 finalists in 370 competition-editions held between 1979-2021 (see Table 1A in the appendix for a listing of the competitions and their host countries). We sorted competition countries into 13 country-groups: Australia and New Zealand, Austria and Germany, Canada, France, Italy, Japan, Netherlands and Luxembourg, Norway and Sweden, Spain, Switzerland, United Kingdom, United States, and an "other" category that includes a heterogeneous mix of countries that only had one competition. 7 Some countries were grouped together based on geography or linguistic/cultural similarity, but also because they had only one or two competition-editions with an audience prize. Finally, we sorted competition-editions into nine competition-types: piano, organ, strings, wind instruments (i.e., woodwinds and brass), percussion, conducting, composition, and voice. While some of these categories include multiple instruments that often compete independently (for example, strings include violin, viola, cello, and double bass, while wind include flutes, oboes, clarinets, bassoons, horns, trumpets, trombones, and tubas), there were very few competitions for some instruments (for instance, there were no viola competitions in our sample and only one cello and one double bass competition). Accordingly, we grouped competitions for these instruments into larger categories, depending on the family of instruments to which they belong. Figure 1 shows the number of competitions with audience prizes in each year from 1979 to 2021. Audience prizes were uncommon in music competitions prior to the turn of the century, with fewer than five per year until 2000. However, their popularity has increased dramatically in the last two decades, reaching a peak of 30 in 2019. Figures 2 and 3 show the growth of competitions with audience prizes by decade for the 13 country groups and nine competitiontypes. Table 2 summarizes this information for the full sample period. Competition-editions held in Austria and Germany represent over 18 percent of the sample with France and Japan following closely at 15 and 14 percent respectively. Twenty-seven percent of all competition-editions with audience prizes were for piano; almost 20 percent were for string instruments (principally violin), followed by chamber music competitions (16 percent), voice competitions (13 percent), and wind competitions (9 percent). Consistent with Figure 1, the average year of a competition-edition in our sample was 2010, reflecting the fact that competitions with audience prizes are a relatively new phenomenon. Accordingly, our data encompass substantial variation across countries and competition-types.

The Audience-Jury Match Rate
How often do expert juries and lay audiences agree? For each competition-edition in our sample, we created a variable called audience-jury match, which equals one if the audience prizewinner is also the first prize winner, and zero otherwise. The mean of this variable, across all competition-editions, is our measure of audience-jury agreement, which we call the audiencejury match rate. It tells us the fraction of competition-editions in which the audience's most favored performer is also the jury's first choice. Table 1, in the full sample the audience-jury match rate is 0.381 (s.d. 0.481).

As show in
Audience and jury concur just under 40 percent of the time. Using one tailed t-tests we can decisively reject that the match rate is equal to zero or one. 8 To gain a sense of the divergence between the audience's and jury's preferences, we also report the share of audience prizewinners who were ranked second through sixth, as well as the share that the jury did not rank. Almost 30 percent of audience prizewinners were ranked second, 13 percent were ranked third, 3 percent ranked fourth, 2.5 percent ranked fifth, and 1.3 percent ranked sixth. Two out of three audience prizewinners were ranked first or second, four out of five were in the top three, and the likelihood of winning an audience prize declines as rank increases, facts that suggest some similarity in audience-jury preferences. That said, almost 14 percent of audience prizewinners were unranked by the jury; accordingly, audience-jury preferences diverge substantially oneseventh of the time Table 2 shows the audience-jury match rate for each country group (column 1) and each competition-type (column 2). To obtain these estimates, we regressed audience-jury match on binary indicator variables for each of the 13 country-groups (column 1) or for each of the nine competition-types (column 2), in both instances omitting the constant term. Across countries, the audience-jury match rate is highest in Sweden and Norway (0.67) and lowest in Japan (0.21) while across competition types, the match rate is highest for conducting competitions (0.75) and lowest for piano (0.27). As shown in the bottom row of Table 2, F-tests reject the null hypothesis that the match rate is the same across countries or across competition types at the 5 percent significance level. Accordingly, our data show that the rate of agreement between audience and jury varies substantially across countries and across competition-types.

Are juries and audiences biased?
Audiences and juries may differ in part because their assessments of quality incorporate different biases. We now investigate the role that biases about gender (whether a competitor is male or female) and nationality (whether a competitor is from the competition host country) may play in influencing the preferences of juries and audiences. As mentioned, earlier we investigate these potential sources of bias because the literature suggests that they are important margins of discrimination in cultural settings. To estimate jury and audience bias, we compare first prizewinners and audience prizewinners with other finalists and ask whether being female or being from the host country affect the probability of winning either the first prize or the audience prize. For this analysis, the unit of observation is an individual finalist in a competition-edition.
To determine the effect of these factors on the likelihood of winning either an audience prize or being a first prizewinner, we estimated two sets of regressions. In the first set of regressions, the dependent variable is an indicator equal to one if a finalist is a first prizewinner in a given competition-edition, and zero otherwise. In the second set of regressions, the dependent variable is an indicator equal to one if a finalist won the audience prize in a given competition-edition, and zerp otherwise. For both sets of regressions, we include the same right hand side variables. These are an indicator equal to one if a finalist is female and zero if male; an indicator equal to one if a finalist is from the competition host country and zero if otherwise; as well as fixed effects for the competition host country, the competition-type, the competition-year, the specific competition, and the competition-edition. The competition host country fixed effect holds constant factors that may influence whether a participant wins either prize that are specific to the competition host-country (for instance, audiences or juries in some countries may have certain tastes or preferences that influence the likelihood that a given competitor wins); the competition-type fixed effect holds constant factors that are unique to the competition discipline (for instance, the extent of bias may vary depending on the instruments involved); year fixed effects sweep out any temporal influences that operate across all competition-editions and competition types (for instance, the changing pool of musicians); the competition fixed effects hold constant factors that are unique to each competition (for instance, the method of aggregation used within a competition), while the competition-edition fixed effects hold constant anything that is specific to a particular competition-edition (for instance, the composition of a specific jury). When competition-edition fixed effects are included, identification is coming from within-competition variation in the gender and host country status of individual competitors.
In these regressions, the coefficients on the gender and the host-country indicators are our estimates of jury or audience bias (depending on whether the dependent variable is being ranked first by the jury or winning an audience prize). A negative coefficient on the gender indicator suggests bias against female competitors (in favor of male competitors). A positive coefficient on the host-country indicator suggests bias in favor of domestic competitors (against foreign competitors). The literature suggests that the coefficient on the host-country indicator should be positive for the audience prize regressions-studies using data from sports and music find that audiences often have a nationalistic bias-but offers no clear prediction for the gender indicator with respect to audiences. For the first prizewinner regressions, the evidence from Goldin and Rouse (2000) suggests that the coefficient on the gender indicator should be negative (orchestra auditions tend to be biased against women when not blinded) and that the coefficient on the host-country indicator should be positive (in many settings, experts have a nationalistic bias), with the important caveat that juries in music competitions are an international body, with only a plurality from the competition host-country.
Our sample consists of 2,007 individual finalists drawn from the 370 competition-editions with audience prizes. For all finalists, we have data on their nationalities. Accordingly, we can estimate effect of being from the host country using the full sample (N = 2,007). However, the number of observations for which we have gender is smaller (n = 1,731). This is for two reasons.
First, for chamber groups like string quartets or piano trios, many of which include men and women, it is unclear how to assign gender. Accordingly, our gender analysis drops competitors from chamber music competition-editions. Additionally, within other competition-types, there are some competitions that involve men and women playing together. For instance, within voice competitions, there are sometimes competitions for lied duo (German art song written for voice and piano), which may involve two persons of different genders (e.g., a male singer and a female pianist, or vice versa) and who are judged as an ensemble. We therefore drop these from the gender analysis as well. Finally, it is important to note that the size of our sample will depend on which fixed effects we include. For one competition, we only had data on a single finalist.
Accordingly, we will lose this observation when we include competition-specific fixed effects.
Additionally, a few competitions in our sample only had one edition where an audience prize was awarded. These will be dropped when we include competition-edition fixed effects. Table 3, audience prizewinners comprise 18.4 percent of the sample while first place winners comprise 18.3 percent. This slight difference reflects the fact that a handful of competitions that awarded audience prizes did not award a first prize. Almost 20 percent of the sample were ranked second, 19 percent were ranked third, and five percent were ranked fourth.

As shown in
Unranked finalists constitute almost 30 percent of the sample. Eighteen percent of finalists were from the host country of the competition in which they competed. Of the subsample of finalists for which we have data on gender, almost 40 percent were women. Pianists comprise the lion's share (31.5 percent) of all finalists in our sample, followed by string players (19.5 percent), singers (14.1 percent), chamber ensembles (12.2 percent) and wind instrumentalists (9.3. percent). These shares are roughly in line with the frequency with which competition-editions of these different competition-types occur within the sample (compare Table 3 with Table 1).
Before turning to the regression analysis, it is worth comparing the prevalence of women and competition host-country participants in our sample of finalists with the frequency with which female and host-country participants are first prizewinners and audience prizewinners. As discussed earlier, 18 percent of finalists are from the competition host country and 40 percent are female. As shown in Table 4, finalists from the competition host-country constitute 20 percent of audience prizewinners but only 11 percent of first prizewinners. Meanwhile, 37 percent of audience prizewinners and 33 percent of first prizewinners are female. While t-tests cannot reject the null hypothesis that female or host country participants win audience prizes with the same frequency as they appear as finalists, the difference between female or host country representation among first prizewinners and among finalists is statistically significant at conventional levels. Accordingly, a first look at our data suggests that while host country participants and females win audience prizes roughly in proportion with their representation among finalists, they are under-represented among first prize winners.

Fixed-effect regression results
To examine these issues more systematically, we now turn attention to Table 5, which displays the coefficients from our fixed effect regressions when the dependent variable indicates whether an individual finalist in a given competition-edition was ranked first by the jury (i.e. was a first prizewinner). Each column in this table represents a different regression. The columns are sorted into three groups (1, 2, and 3), depending on which fixed effects are included. In group 1, we include fixed effects for competition host country, competition type, and year. In group 2, we add a fixed effects for each competition. In group 3, we add a fixed effects for each competitionedition, which is the most restrictive and preferred specification. Recall that the number of observations will decline as we include more fixed effects, as well as when we include the gender indicator. In all regressions we cluster standard errors at the host country, competition-type, and year levels.
As shown in Table 5, we find that being female and being from the competition host-country have a negative impact on the likelihood of being ranked first by the jury. The magnitude of the estimates is remarkably stable, regardless of the configuration of fixed effects. Additionally, they retain statistical significance at conventional levels in all specifications. Taking the results at face value, we find that being female reduces the probability of being ranked first by approximately 10 percent, while being from the host country reduces the probability by approximately 4 percent.
How do these same factors affect the likelihood of winning an audience prize? To investigate this question, we estimated the same set of regressions, using the indicator for whether a finalist won the audience prize as the dependent variable. The coefficient estimates from these regressions are displayed in Table 6, which presents the results in the same manner as Table 5.
Strikingly, we find that being from the host country or being female do not have a statistically significant impact on winning an audience prize. The magnitude of the point estimates is also substantially smaller (in absolute value) relative to those shown in Table 5. Our results would therefore suggest that audiences, unlike expert juries are not biased along the dimensions of gender or nationality.

Heterogeneity of bias by competition-type
Our findings so far include competition types that vary substantially in terms of the presence of women. Women are well represented among the ranks of string players and pianists; however, there are relatively few female conductors. Additionally, it is unclear whether a competitor's gender is relevant in some contexts. In voice competitions, for instance, women compete against men, but there are no reasons to believe that juries should prefer male voices to female voices. 9 While the inclusion of competition-type fixed effects in the previous regressions should control for these instrument-specific factors, our findings likely conceal substantial heterogeneity in the extent of bias across different competition-types. Accordingly, we re-estimated the jury and audience regressions separately for four competition-types, namely piano, strings, voice, and wind instruments. In each of these regressions, we include fixed effects for host country, year, competition, and competition-edition.
The results from this exercise are shown in Table 7. We find that the extent of jury bias against women is strongest and most significant for pianists; being female reduces the likelihood of being ranked first in a piano competition by 8.6 percent, an effect that is twice as large as that found using the full sample (approximately 4 percent). Jury bias against finalists from the host country was strongest for singers (15.1 percent), followed by pianists (11.5 percent) and string players (10.3 percent). Consistent with our intuition, being female does not affect the likelihood of being ranked first by the jury in singing competitions. Additionally, we find no statistically significant evidence of jury bias against women in string competitions or wind competitions.
Audiences, meanwhile, are remarkably unbiased regarding either host country participants or women across all competition types, except for strings, where being a woman reduces the probability of winning an audience prize by 5.6 percent.

Robustness checks
To probe the robustness of our findings regarding jury bias against women and host country competitors, we first investigate the impact of these factors on the rank the jury assigns to each competitor. To do so, we restricted attention to those participants who were ranked by the jury (i.e., first, second, third, etc.) and estimated a set of regressions where the rank of a finalist in a given competition-edition was the dependent variable. The number of observations for these regressions will be smaller (n = 1,151) because many finalists were not ranked. When we include the gender indicator, host country indicator, and all five fixed effects as right hand side variables, we find the coefficient on the gender indicator to be 0.36 (s.e. 0.20) and the coefficient on the host country indicator to be 0.47 (s.e. 0.17). Accordingly, being female reduces ones rank by 0.36 spaces (marginally significant), and being from the host country reduces ones rank by almost 0.5 spaces (statistically significant at the 5 percent level). If we disaggregate by competition-type, we, once again, find the strongest results for pianists. Being female reduces ones rank in a piano competition by 0.72 (s.e. 0.24) and being from the host country reduces ones rank by 0.69 (s.e. 0.31). Both effects are statistically significant at the 5 percent level. The evidence from jury rank is therefore consistent with jury bias against women and host country participants, especially in piano competitions.
As an additional robustness check, we re-estimated our baseline regressions investigating jury and audience bias restricting attention to those competitors who were ranked (i.e., dropping unranked finalists). It is possible that the performances of those who were not ranked differ in some way from those who were ranked in a way that is observable to audiences or juries but not to us, but is nevertheless correlated with gender or nationality. Accordingly, we check to see if our results are sensitive to excluding these competitors from the sample. The findings from this exercise are similar those reported in Tables 5 and 6. Being from the host country or being a female do not represent the likelihood of winning an audience prize, but they do reduce the likelihood of being ranked first by the jury. Disaggregated by competition type, we also find results like those shown in Table 7.

Conclusions and discussion
We uncover two key results in this study. First, in the context of international classical music competitions, expert juries and lay audiences agree approximately 40 percent of the time. The extent of agreement varies significantly across competition types, and countries. Second, this divergence in opinion may be partly attributed to biases on the part of expert juries. Comparing audience prizewinners and first ranked prizewinners with other finalists, we find that being female and being from the competition host country significantly reduce probability of being ranked first by the jury. In contrast, these same factors have no impact on the likelihood of winning an audience prize. The extent of gender and host-country bias by juries varies across competition-types, with female pianists and host country vocalists suffering the worst penalties. Taken at face value, our estimates suggest that expert juries have biases that audiences do not.
Is it possible that our findings are driven by omitted variables and that female and host country finalists are indeed weaker musicians? On the host country front, perhaps because it is costly to attend a competition held in another country, only the strongest musicians compete in competitions that are held abroad. If this is the case, international participants may, on average, be stronger than participants from the host country, which would explain why juries would appear to be biased against host country participants. While we accept this is possible, three factors militate against this view. First, to our knowledge, the competitions within our sample do not have a host-country quota (i.e., there is no requirement that competitions admit a minimum number of participants from the host country, either as initial round competitors or finalists).
Accordingly, if host country players are indeed sub-standard, they should simply be eliminated from the pool of finalists and will not bias our estimates. Second, it is unlikely that the difference in average quality between host country and international finalists perceptible to jurists but imperceptible to audiences, who seem to show no preference on this basis. Third, there is no obvious reason why international participants should be stronger than domestic ones for strings, piano, and voice, but not for wind instrument competitions.
On the gender front, it is possible that gender may be a proxy for weaker performance, if, for instance, female musicians are more likely to choke under pressure or to lose stamina during performance, perhaps because they are, on average, physically smaller or weaker. However, there are reasons to be skeptical of this view as well. First, there is no systematic evidence that women are more likely to choke in competitive settings. In fact, the evidence, at least from tennis and basketball, suggests the opposite (Cohen-Zada, Krumer, Rosenboim, and Shapir 2017;Silverberg, Tran, and Laue 2018). Second, there are no apparent reasons why women should be weaker or more likely to choke in piano competitions, but not in wind or string competitions, which can be no less physically or emotionally exhausting.
Assuming our results are correct, it is worth asking why juries in classical music competitions are biased against host-country participants, while referee bias in sports operates in the opposite direction. In part, this may be because, as discussed earlier, juries in classical music competitions are internationally diverse, with only a plurality from the host-country. The apparent bias against host-country competitors may, in fact, be a bias in favor of competitors from home countries of some subset of jurists, although it is an open question as to how jurists from a foreign country are able to exert so much influence over the rest of the jury, given that multiple nationalities are present and that host-country jurists are usually the largest contingent. The difference in nationality-bias in sports and music may also be attributable to a preference for the "foreign" or "exotic" in classical music as opposed to a preference for "home" in sports. Finally, we need to consider the different incentives faced by sports referees and music competition jurists. In sports, host country referees face considerable spectator scrutiny if they make calls against the home team. In contrast, host country jurists in a music competition are unlikely to face any backlash for failing to award the first prize to a domestic competitor.
Finally, why might juries be more biased than audiences? We suspect that the answer to this question may be due to two factors. First, audiences and jury members have different objectives and incentives. Individual audience members primarily attend competitions to hear music played at the highest level or to spot new talent. The identity of the performer may therefore not be particularly important in their decision about whom to award the audience prize. In contrast, while jurists are tasked with identifying and rewarding the best players, they may have other objectives, which might include maintaining cordial relations with other jurists, advancing the careers of specific competitors, or rewarding a specific style or school of playing. 10 To the extent that these goals are correlated with a competitor's gender or nationality, they may manifest themselves as bias against women or host-country participants.
Second, juries may be more biased than audiences simply because juries are comprised of a small number of individuals, whereas a competition audience numbers in the hundreds, if not even a thousand or more for the highest profile competitions. In a small group, one or two individuals with idiosyncratic but forcefully articulated opinions can be pivotal. In contrast, within an audience, no individual or group of individuals with strongly held views will ever be pivotal.
Idiosyncratic opinions are more likely to be "averaged out" the larger the group. If juries were larger, it is possible that they might be as unbiased as audiences.       Figure 1