Comparing Voting Methods : 2016 US Presidential Election

. This paper presents data from a survey leading up to the 2016 US presidential elections. Participants were asked their opinions about the candidates and were also asked to vote according to three alternative voting rules, in addition to plurality: approval voting, range voting, and instant runoff voting. The participants were split into two groups, one facing a set of four candidates (Clinton, Trump, Johnson, and Stein) and the other a set of nine candidates (the previous four plus Sanders, Cruz, McMullin, Bloomberg, and Castle). The paper studies three issues: (1) How do US voters use these alternative rules? (2) What kinds of candidates, in terms of individual preferences, are favored by which rule? (3) Which rules empirically satisfy the independence of eliminated alternatives? Our results provide evidence that, according to all standard criterion computed on individual preferences, be there utilitarian or of the Condorcet type, the same candidate (Sanders) wins, and that evaluative voting rules such as approval voting and range voting might lead to this outcome, contrary to direct plurality and instant runoff voting (that elects Clinton) and to the official voting rule (that elected Trump).


Introduction
Since November 2016, both the US and worldwide media have emphasized that the sitting Republican president Donald Trump did not get a majority of the votes cast.Indeed, he lost the "popular vote" by almost three million votes to Hillary Clinton, the Democratic Party nominee.The popular vote is defined by the nationwide sum of all ballots in the official election.The Electoral College elects the US president via a different route.The latter is an indirect winner-take-all system that adds up votes state by state.Those state-level results are used to determine electoral voters (based on Senate and House seats within each state) who are then expected to vote for their respective candidates.These two distinct approaches explain why the outcomes might differ.And 2016 was not the first time this discrepancy has occurred.Neubauer et al. (2012) stressed that, since 1780, at least four elections of this kind have occurred-with two cases in the last five elections: this issue has been widely discussed by political scientists and social choice theorists (among others, see Abbott and Levine 1991, Miller 2014, Barthélémy et al. 2014, Kurrild-Klitgaard 2018).
While the official US voting method is often criticized for its two-tier aspect, questions regarding its "mono-nominal" character-that a voter can vote for only one candidate-are less often raised, despite the current movement towards the use of different voting rules.For instance, the states of Maine in 2017 and Alaska in 2020 adopted variants of ranked-choice rules and the city of Fargo adopted Approval Voting in 2018, followed by the city of Saint Louis in 2020 (Vox 2018, Alaska Division of elections 2020, AP News 2020a, 2020b).In this paper we will be chiefly interested in evaluative voting, that is "multi-nominal" rules where the voter evaluates the candidates one by one, independently.Such processes are commonly used for decision making in practical settings like schools, sports clubs, online applications, et cetera.It has been argued that these kinds of rules would enable voters to be more flexible in casting their ballots and to vote more sincerely if they so wished, thus increasing their overall satisfaction.But only a few voting theorists have analyzed the possible effects of these evaluative processes in general (see Hillinger 2004ab, Gaertner and Xu 2012, Pivato 2013, Macé 2018).Among the various evaluative rules, relatively more attention has been paid to approval voting, a particular case where the voter grades each candidate on a two-level scale (0 or 1).The winner of an approval voting election is the candidate who gets more 1-grades than any other candidate (see notably Brams and Fishburn 1978, 1983, Laslier and Sanver 2010).
The game-theoretical literature has devoted a lot of attention to the "divided majority problem" that is the coordination game among voters of a same camp that will lose the election if they do not coordinate their (plurality) votes on the same candidate.This is a typical pathology of plurality rule that approval voting might be immune to (see Brams and Fishburn 1983, Forsythe et al. 1993, Bouton et al. 2016).In fact, it is very easy to find such examples, but counter-examples have also been found: see De Sinopoli et al. (2014).In any case, since Fiorina (1976) the empirical literature has established clearly that voters behavior in practice do not conform the purely rational nor the purely expressive model.See on this point Spenkuch (2018) for Germany or the collection Stephenson et al. (2018) for various countries.
A stream of research has conducted experimental tests of approval voting and other multinominal rules alongside large-scale official elections.Such studies have been conducted in France since 2002 in parallel with presidential elections (Baujard and Igersheim 2010, Grofman et al. 2011, Baujard et al. 2014), and similar protocols have been used in Germany (Alos-Ferrer and Granic 2010).Several large-scale voting experiments have also been organized online, in France, in Canada, and throughout Europe; see Laslier et al. 2015 and the website vote.imag.fr.The lessons drawn from these studies are as follows: (i) The principles of multi-nominal voting are easily understood by voters, who show a slight preference for range voting over approval voting.(ii) Such voting methods can indeed modify the aggregate ranking of candidates and can lead to different winners.
The U.S. is one of a rather small number of countries with a presidential regime to elect its president under universal 1 (albeit indirect) suffrage.Another such country is France, where, as noted above, many relevant studies have been conducted.But, to the best of our knowledge, there is no research that aims to test multi-nominal voting rules empirically in the context of a US national election.Compared to countries in Europe and elsewhere, the U.S. political system is rather unique, so that existing results from elsewhere might apply to the U.S.This study thus fills this gap by analyzing how US voters react to multi-nominal voting rules and assessing what the impact of these rules might be on election results.To do so, we use data from a survey conducted in November 2016 in which over 2,000 participants were asked to vote according to the three following alternative voting rules in addition to plurality voting: • Approval voting (AV): Voters can approve of any number of candidates they want; the winning candidate is the one with the largest number of approvals.
• Range voting (RV): Voters score each candidate on a six-level scale (from 0 to 5 points, inclusive); the winning candidate is the one who receives the largest total number of points.• Instant-runoff voting (IRV): Voters rank their three favorite candidates; the candidates with the fewest first-place rankings are sequentially eliminated with their next-preference votes being transferred until a single candidate remains. 3  Additionally, participants in the survey were asked to honestly assess the candidates on the same six-level scale.We refer to this evaluation as HA for "honest assessment".
The participants of the survey were randomly split into two groups.The first group was presented with a "short set" of four candidates (Clinton, Trump, Johnson, and Stein), and the second group was presented with a "long set" of nine candidates (the previous ones plus Sanders, Cruz, McMullin, Bloomberg, and Castle).These two sets were designed to mimic the actual 2016 election, and that same election within a more competitive context. 4Our Universal suffrage is for those aged 18 years or older and not currently in prison.Only prisoners in Maine and Vermont may vote while in prison.Most states require former felons to wait an additional waiting period such as parole, probation, or following the subjective decision of a designated official before voting rights can be restored.This point is especially relevant given the US has the highest per capita incarceration rate in the world. 2 The term Range voting is often met in the literature but in the survey we used the phrase "score voting" that, we believe, better conveys the definition.
3 We here use the term "instant-runoff", that is often met in the US literature, but the survey used "ranked choice voting" that might be more descriptive for a US audience.The term "alternative vote", used in Ireland, is poorly descriptive.

4
The idea of varying the number of candidates as an experimental trick to study a theoretical point also appears in Crowder-Meyer et al. (2020) in their study of which cues do voters use to decide their vote when they have relatively few information about the candidates.
analyses focus on three issues.( 1) How do US voters use these alternative rules?(2) How do candidate options, including the presence of little-known candidates, influence the election outcome?(3) Do alternative voting methods affect the outcome compared to plurality voting?According to our data, Bernie Sanders stands out according to voters' true preferences, in the sense that, as will be seen in section 3.3, if one considers the honest assessment of the candidates by the voters (HA), he wins according to several aggregation rules, whether of the majoritarian or the utilitarian tradition.
Given this, the main conclusions of this study are the following: (1) Multi-nominal voting rules such as range voting and potentially approval voting are able to elect such "best" candidates: range voting would clearly elect Sanders, and approval voting would elect either Sanders or Clinton.(2) Such is not the case for direct plurality and instant runoff voting (both would elect Clinton), or the indirect, official, voting rule (that elected Trump).
The next section describes the survey and data.A third section presents the global statistics and results per candidate depending on the voting methods tested.From these, one can draw some lessons regarding the first issue at stake: how do voters use these rules?A fourth section develops further analyses focused on the two remaining issues: the fate of minor candidates and the election outcome.The fifth section concludes.Additional tables and figures as well as the survey questions are to be found in the on-line appendix.

Survey and statistical strategy
The Center for Election Science designed and contracted the survey with the international firm Growth from Knowledge (GfK).The survey ran from November 3, 2016, to the day of the US presidential election on November 8, 2016 (noon).GfK used a panel of 4,181 members of which 2,552 responded and 2,367 completed the survey.The survey had six questions defined by the Center plus additional socio-demographic questions (such as age, gender, education, race, etc.).The survey was conducted electronically on a panel of representative American participants, but with no specific target beyond that: general population adults (18+), English-and Spanish-language survey-takers.In the on-line appendix B, we describe the socio-economic profile of the sample and provide evidence that it is indeed representative.Like almost all published surveys at that time, ours over-estimated Clinton's final vote tally: At the national level she was finally ahead of Trump by 2% while this margin was estimated in most surveys between 2 and 6%5 .
Participants were also separated into two groups.While a first group had to give their vote and opinion on a short set of four candidates, a second group was asked to do the same for a set of nine candidates.The four candidates of the short set were the nominees of their respective party: Hillary Clinton for the Democratic Party, Gary Johnson for the Libertarian Party, Jill Stein for the Green Party, and Donald Trump for the Republican Party.The five remaining candidates of the long set were either popular candidates in the Republican and Democratic primaries (such as Ted Cruz and Bernie Sanders) or independent candidates.Evan McMullin was only on one state ballot and Michael Bloomberg decided not to run because he feared splitting the left-leaning vote with Hillary Clinton.Finally, Darrell Castle was the nominee of the Constitution Party, although he was not on enough state ballots to theoretically win enough electoral votes. 61,198 participants were asked to answer the long-set survey and 1,169 were asked to answer the short-set survey.For the two sets, the four different voting rules were presented in random order, as were the candidate names.In all, 2,367 participants filled in the survey.Upon cleaning, a small number of responses (10 for the long set and 10 for the short set) had to be deleted-see appendix B for details-so what follows is thus based on a set of 1,188 + 1,159 = 2,347 respondents.
To check the statistical validity of our conclusions in a unified manner we report the standard errors or confidence levels obtained from bootstrapping the dataset 10,000 times.But note that the question of extrapolating the observations made with this particular survey to statement about what would have been the 2016 presidential election in other circumstances (under different voting rules) is a different question not of a statistical nature.First, the campaign was not over at the time of the survey and such a survey does not intend to forecast the actual election.Second, and more importantly, the campaign dynamics (and even the set of candidates) might be different depending on the rule.

Overall findings and assessment of the tested voting rules
Overall summary statistics offer us a quick overview of the participants' behavior under AV, RV, and HA.Under AV, for the participants who gave one or more approvals, the average number of approvals per ballot is 1.24 (out of four candidates) with the short set, and 1.73 (out of nine candidates) with the long set. 7  In a 1984 AV experiment where 300 Pennsylvanian college students were asked whether to support candidates from lists of eight or nine names-thus close to our long set -Koc (1988) obtained an average number of approvals of 1.8 and observed similarly that it was "low given the results of other AV experiments where 2.0 votes per ballot has been the norm", referring to Nagel (1984).European experiments have had more candidates: the average numbers of approvals observed in France in similar experiments were 3.15 for sixteen candidates in 2002, 2.23 for twelve in 2007, 2.63 for ten in 2012, and 2.42 for eleven in 2017 (Laslier 2019).
There is a general (and quite intuitive) correlation between the number of candidates running and the average number of approvals.But beyond this observation, compared to US data as a whole, it also seems that French voters are more inclined to give their support to several candidates.This might be due to the different voting rules used in these countries.In France, the official rule to elect the president is a two-round system, and elections often include more than two serious candidates in the first round, so voters are used to facing many significant candidates.That contrasts with the US in both the voting method and the number of candidates.In any case, these relatively low observed averages imply that a fraction of the 6 US ballot access varies drastically state-by-state based on whether the candidate is within a major party, minor party, or independent.Some states require a filing fee while others can require over 100K signatures for that state alone.Major party requirements are always equal to or easier than for other independent or third-party candidates.

7
With those who abstain from voting for any candidate, these averages fall to 1.15 and 1.61, respectively.voters approve of only one candidate ("bullet voting").When facing the short set, almost three participants out of four approved only one candidate and, for the long set, one out of two approved only one and one out of four approved two candidates.Again, this contrasts with the French experiments, where the percentages of participants who approved of only one candidate never exceeded 25% (Table A1 in the on-line appendix A).Still, even with the relatively high frequency of "bullet voting" in the US experiment, those voters who approved of more than one candidate had a large impact on the winner and on the support measured for other candidates, as will be seen.
Turning to range voting, the participants took advantage of the increased opportunities to express their electoral preferences.The average grades given to candidates are very close for the two sets: 1.78 for the long set and 1.77 for the short one.Further, the distributions of grades for both sets are also remarkably similar (Table A2 and Figure A2 in the online appendix A).This would support the theory that voters behave the same way whether they are facing a 4 or 9-candidate list under RV; this is an important observation that backs up the idea that actual votes under RV might be essentially "independent of eliminated alternatives".We will come back to this idea when studying the results at the candidate level.
One can further compare the distributions of grades under Range Voting (RV) with the ones obtained as an answer to the evaluation question regarding the running candidates ("honest assessment", HA), with the same range from 0 to 5.Under HA, the average grades per candidate are 1.41 for the long set and 1.58 for the short one, and the grade distributions are almost identical (Table A3).Also remarkable is the observation that the grade distributions under RV and HA are very similar (Figures A2 and A3).This spotlights the fact that participants do not overall behave according to the rational theory.Indeed, fully strategic voting under RV in a large election would involve giving either maximal or minimal grades but no intermediate ones (Núñez and Laslier, 2014).Here, strategic voting under RV would thus require participants either to give the grades 0 or 5; and the fact that the distributions of RV and HA are similar and are not degenerated support the view that respondents voted sincerely with little, or no, strategic consideration.
These statistics help explain how voters behave under the multi-nominal rules tested.The vote frequency distributions for AV and RV and the average number of approvals/grades per ballot show that voters have used both voting methods to more widely express their electoral preferences.Some remarks are in order: The fact that the number of approvals per ballot is lower than in previous experiments might be due to the way the AV instructions are stated (Koc 1988).In our survey, however, the wording of the AV ballots ought to have encouraged the participants to vote for more than one candidate, since they were asked "to select as many candidates as [they] want".It might well be the case that framing effects are not that important, and that "bullet voters" do so for deeper reasons, such as the perceived election dynamics or familiarity with candidates.For instance, a 2016 Gallup Poll 8 indicated that two-thirds of Americans didn't know who Gary Johnson and Jill Stein were.This voter ignorance is likely due to a number of causes, including the fact that the Commission on Presidential Debates9 excluded these two candidates from every presidential debate. 10n any case, we observe a nontrivial number of voters who chose more than one candidate.And, as we will see, these votes can have important consequences.

Results per candidate for the short and long sets
Table 1 provides the observed scores of the candidates for plurality and approval voting as percentages of the population and for range voting as average grades.Standard errors estimated by bootstrapping are provided in parenthesis.The candidates are ranked according to their plurality scores.respondents might either put unknown candidates at the bottom of their ranking, or put them in the middle (the latter move being suggested by the "feeling thermometers" from the ANES).Anyhow, both behaviors lead to noise in the answers to the survey.In spite of this limit and since one aim of this paper is to study the independence of eliminated alternatives, we have decided to keep all the candidates of the long set in our analyses.Under plurality with the short set of candidates, Clinton is the winner with 47.73%.Trump is second with 40.52% (statistical confidence 11 for the statement "Clinton wins" is actually 100%).With the long set of candidates, Clinton wins with 31.38%,Trump is second with 27.78% and Sanders is ranked third with 19.98% of votes (statistical confidence for the statement "Clinton wins" is 96%).No other candidate reaches 10%.There is no contradiction between the rankings of the long and the short sets.The results of plurality are thus in accordance with those of the "popular vote", providing further evidence that the sample is representative, 12 not only with respect to socio-economic variables but also with respect to politics.

b) Approval
The exact phrasing used in the survey for approval voting was the following: "If the US presidential election were held today, and your ballot asked you to select all the candidates you wanted, which candidate or candidates would you select?In the approval voting method, the candidate selected the most would win."

c) Range voting
The last two columns of Table 1 display the average grades obtained by the candidates under RV.The exact phrasing used in the survey was the following: "The next method is called score voting.If the US presidential election were held today, and your ballot asked you to score each candidate on a scale from 0-5, how would you score them?Please enter a number, where '0' means 'Worse' and '5' means 'Better'.In the score voting method, the candidate with the highest total score would win." Here the differences between the short and the long sets are even more obvious than under plurality-the winner, indeed, is different.For the short set, Clinton scores highest with a substantial advantage over Trump.For the long set, Sanders beats Clinton with an average grade of 2.72, while Clinton is second with 2.32.Note that, when present, Ted Cruz reaches the level of Donald Trump: their scores are respectively 2.00 ±.07 and 1.93 ± .06, and the confidence in the statement "Trump is ahead of Cruz" is only .85 from our sample.Another interesting trait of this voting rule-especially compared to AV-is that the average grades of the four candidates present in both sets are rather similar from one set to the other, thus showing a kind of "absolute" preference ("independence of eliminated alternatives").In other words, the way participants assess these four candidates under RV was not modified by the inclusion of five more candidates.Such was not exactly the case under AV even though the ranking of candidates by approval was consistent between the two sets.

d) Instant runoff
For instant runoff voting (IRV), the exact phrasing used in the survey was the following: "The next method is called ranked choice voting.If the US presidential election were held today, and your ballot asked you to rank your top three candidates, how would you rank them?Use a '1' for your top choice, a '2' for your second choice, and a '3' for your third choice.In the ranked choice voting method, the winner requires a majority of first-choice votes.If no candidate had this majority, then the candidate with the fewest first-choice votes would be eliminated and those votes would transfer to the next-preferred choices.This would repeat until a candidate had a majority."Table 2 reproduces the last rounds of elimination. 13The detailed percentages of votes obtained by each candidate after each round of elimination are provided as supplementary material in the appendix (Tables A7 and A8, Figures A3 and A4).See the note on the statistical robustness of these findings in the on-line appendix.

Sanders 25 28 Cruz 13
The main transfer to Clinton comes from Sanders's voters, while Trump benefits from transfers from Cruz's voters.At the first round, as under plurality, Clinton is ranked first, Trump second, and Sanders third.Then, after smaller candidates are eliminated, Sanders is eliminated, letting Clinton win against Trump.
IRV has, in our data, the property that very small candidates cannot spoil the election: once these candidates are eliminated, along the counting process the preferences stated by the voters as to the main candidates remain and are counted as if the eliminated candidates were not present.This explains the remarkable feature that Clinton beats Trump by an identical 54-46 margin under the short and long set conditions of IRV.
The subtlety of the run-off system lies in the way candidates are sequentially eliminated.The vote transfers that are sequentially observed here under IRV appear akin to the mechanism that favors "exclusive" candidates under plurality, i.e., candidates who receive strong support from some voters, but are also often rejected by others.The same process is often seen at work under two-round majority voting when a more "inclusive" candidate that would win any second round is eliminated at the first round by two more "exclusive" candidates.An "inclusive" candidate can be defined as follows: they get widespread support from the voters but with no strong feeling of rejection or attachment (on this issue, see the well-documented "squeezing of the center" often observed in French politics; Blais and Indridason 2007;Baujard et al., 2014).This is a reminder that, even though instant runoff voting allows rich expression to the individual voter, that the rule's algorithm doesn't necessarily use all the information.As such, the observation that Clinton wins over Sanders under IRV could be explained simply by the mechanics of IRV.But we can also see that a psychological effect may be at work, because the IRV data and HA data produce different Condorcet winners14 (on these two effects, see Duverger 1951).Indeed, IRV and plurality voting show similar results for the first round in both the long and short candidate sets.

The "honest assessment" exercise
Any voting rule may be subject to strategic or "tactical" voting.To elicit sincere preferences, we therefore simply asked the following question: "Regardless of their chance of being elected, how much do you honestly want the following to be elected?Please use a '0' for 'Do not want this person elected' to '5' for 'Very much want this person to be elected.'" The answer to this question is the "honest assessment" variable (HA).Table A3 in the appendix shows the average HA grades.As under RV and for the long set, Sanders is rated first, followed by Clinton and Trump.Sanders can thus be seen as the candidate who maximizes voters' utilities/assessments.As a whole, the average grades given to the different candidates under HA are a little bit lower than under RV, but the tendencies are remarkably similar, as were the global statistics for these two.
Taking the HA variable at face value, one can compute any social welfare criterion.For instance, the average scores in Table A3 correspond to a simple utilitarian computation, and Bernie Sanders appears as the "utilitarian optimum" among the long set of candidates.The overall domination of Sanders over all other candidates is clear in Figure 1, which shows the distributions of the grades obtained by each candidate with their de-cumulative functions.Denote by n1(x), n2(x), … , n5 (x) the number of voters who assign grade g = 1, 2, … , 5 to candidate x.The figure reports the percentage of voters who evaluate a given candidate x at level g or above, that is: Recall the standard result: if a distribution for x first-order-dominates the distribution for y, meaning that x's de-cumulative curve is above y's, then for any increasing function f: As seen in Figure 1, the candidates are almost perfectly ordered by first-order stochastic dominance.Therefore this ordering is robust to almost any distortion of the utility scale.The only important exception occurs for Clinton and Sanders, at the upper end of the scale.This means that a distorted utility function that put all its weight on grade 5 would conclude in favor of Clinton instead of Sanders.This observation is in line with the observation that Clinton beats Sanders according to plurality.

Figure 1: Comparing the distributions of honest assessments
Another social criterion, which rests on a different, if not contrary, philosophical basis (Riley 1990), is the notion of a Condorcet winner.From the HA grades one can infer strict preferences: for a given participant, Candidate x is strictly preferred to Candidate y if the HA grade for x is strictly larger than the HA grade for y.Because the grade scale {0,5} is not absolutely precise, it may be that a participant strictly prefers x to y but still grades them alike.Up to that technical difficulty, building the pair-wise comparison matrix among candidates from the HA grades leaves little doubt (Table A8 in the appendix); the honest preferences profile has a Condorcet winner, and it is Sanders.
Computing other social evaluation criteria on the same HA data leads to the same conclusion.Sanders is also ranked first under the Borda and the Bucklin rules.It thus seems fair to qualify him as the "best" candidate according to voters' true preferences.
For some voting rules, it is also possible to mechanically compute the outcome of the rule using the HA variable as input, thus net of strategic voting.It turns out that "Honest Plurality" elects Clinton (Table A11), and so does "Honest Instant runoff" (Table A12).Notice that it is not possible to do the same exercise for approval voting since the HA data does not split the candidates into two categories (approved/disapproved) in a straightforward manner.
Conversely, the RV and IRV data can provide pairwise comparisons.There, Sanders is the Condorcet winner using RV data and Clinton is Condorcet winner using IRV data. 15able 3 sums up our previous remarks and results.This shows the two main dimensions where our extended analyses will proceed.These two dimensions are (1) the chosen set of candidates, which implicitly raises questions about the primaries system, and ( 2) the chosen voting method with a more specific question regarding the mono-nominal vs. multi-nominal issue.Clinton Sanders

On strategic voting
Taking at face value the "honest assessment" exercise proposed to the participants, we can track down the sincere and strategic features of their votes.Although this issue is not this paper's main goal, we briefly report on the following tests of strategic voting.We work under From our sample, the approval rates of Clinton and Sanders are not statistically different, thus we cannot predict the winner.
the -reasonableassumption that the two front runners considered by the participants were Clinton and Trump.
Plurality: The fraction of voters who vote for a candidate whom they do not rank first under HA is 3.7% ± 0.6% in the short set and 5.2% ± 0.7% in the long set.This direct measure of insincere voting under plurality shows that the insincere voting shown here is a rather marginal phenomenon.Strategic voters should vote for the one they prefer between the two front runners.Among the participants who strictly prefer Clinton to Trump we find that X% do not follow the strategic recommendation.Among the participants who strictly prefer Trump to Clinton we find that Y% do not follow the strategic recommendation.One can see that for most voters, our data show no contradiction between "sincere" and "strategic" behavior.The most interesting strictly "strategic" effect we can observe is voters who prefer Sanders to Clinton but vote Clinton under Plurality.And the effect is small: it occurs with an estimated frequency of 3.8% ± 1.3% relative to the number of voters who honestly rank Sanders above Clinton.
IRV: The fraction of voters who put first a candidate that is not ranked first under HA is 3.0% ± 0.6% in the short set and 4.4% ± 0.6% in the long set.With respect to the "Clinton-Sanders" dilemma, we note that, among the voters who honestly rank Sanders above Clinton, only a small fraction rank Clinton first under IRV (3.9% ± 1.0%).Due to the sequential process of IRV ballot counting, it is not clear how to qualify in theory the behavior of this small group.A more demanding test of sincerity is the following: the fraction of voters who do not vote for their three preferred candidates in the same order is 11.7% ± 1.0% in the short set and 24.7% ± 1.4% in the long set.Going into more details for these 24.7% can be described as follows: o 13.3% involve minor candidates that the voter treats differently with IRV and HA, in one way or the other.o 4.3% can be interpreted as simple strategic moves with respect to the Clinton/Trump contest (putting the candidate you prefer in a higher position or the one dislike in a lower position).o 2.0% introduce the candidate they dislike among the pair (Clinton, Trump) into their top three.The natural interpretation here is that the voters fear that their rejection of a main candidate is not taken into account if that disliked candidates is not on their ballot.This shows the voter's misunderstanding of the way IRV ballots are counted.o 0.5% rank only Clinton, Sanders and Trump, but do it sincerely.o 4.5% are various other, hardly interpretable, cases.
Approval: A demanding rational recommendation is here: "among Trump and Clinton, approve the one you prefer and do not approve the other."The fraction of voters who do not follow this recommendation is 5.6% ± 0.7% in the short set and 19.4% ± 1.2% in the long set.These 19.4% can be described as follows: o 12.9% approve neither Clinton nor Trump, and approve exactly one other candidate.o 5.4% approve neither Clinton nor Trump, and approve several other candidates.o 0.9% approve both Clinton and Trump plus one or several other candidates.o 0.1% approve both Clinton and Trump and no others.o 0.1% approve one of Clinton and Trump in contradiction with their HA.
Here a conclusion can be drawn: in most cases (about 80%) there is no contradiction between sincere and strategic behavior under AV, and when there is contradiction, it is almost always in the same direction: voters use their approval ballot to express that they approve neither of the two front runners.

Range voting:
We analyze the deviations from sincerity by computing differences in the average score of a given candidate between Range Voting and Honest Assessment.The HA grades are generally low (no candidate reaches the middle grade 2.5/5) and the RV grades are slightly larger on average (see Table A14 in the Appendix).One can see here traces of the theory that dictates to overstate the evaluation of the candidates of your camp and to downgrade the opponents (Núñez and Laslier 2014).Starting from low HA grades, the upgrading mechanically tends to be more important than the downgrading.In particular, Sanders and Cruz gain .37 and .40points.
To go deeper, we wonder whether two candidates of the same camp are treated alike to this respect.Consider for instance the voters who do not give the maximum grade (5) as their honest assessment of Clinton.On average, in this population, the RV grade of Clinton is .27points higher than her HA grade, and the same computation made on the subset of population who give to Sanders the perfect HA grade (5) yields .33 instead of .27.Table A15 in the Appendix provides the same figures for similar cases.One can see that the figures here are not small: up to .65 points (out of 5) as the average of a difference is a sizable effect.By offering fine-tuned expressive possibilities, Range Voting offers more occasions of observing deviations from sincere evaluations (Baujard et al. 2020).These deviations occur in the direction of the "overstatement" strategic theory.
All this is about individual behavior.The general conclusion at this level is in line with what we know from previous studies (see Laslier 2019).Strategic and non-sincere behavior can be found; the phenomenon is rather marginal for Approval Voting, in particular with respect to the main candidates.But strategic voting is more easily detected under Range Voting, by the nature of the rule, which is based on fine evaluations.Still, even under Range Voting the striking fact is the great similarity between the "Honest Assessment" grades and the votes.Notice that an aggregate measure of the consequences of insincere voting is directly provided by the tables that count HA ballots using different rules (Table A3 for RV, A8 for Pairwise majority, A11 for plurality, A12 for IRV), one can see that the non-sincere voting is of little consequence at the aggregate level, in this data set.

Two interconnected issues: which set of candidates for which voting rule?
4.1 From the short set to the long set: the question of "eliminated alternatives" Our results make clear that the voters' expressed preferences for a given candidate might differ depending on the number and nature of other candidates running.Obviously, this could have direct consequences on the outcome.Here, while the winner under AV, RV, and HA is Clinton with the short set, Sanders is ranked first with the long set under RV and HA, and he is in a statistical tie with Clinton under AV.Recall that these rules, unlike plurality and IRV, give all voters the possibility to express their support for any candidate independently of the other candidates.To further study this point, consider Figure 2. Figure 2 is built from the figures in Table 1 and the results of IRV and HA. Figure 2 also shows how normalized scores 17 vary when going from the long to the short set of candidates.

Figure 2: Scores gaps between the short and the long sets
Under plurality and IRV, all the candidates have lower scores with the long set than with the short one.Under AV the effect still exists, but to a much lesser extent.Under RV and HA, the scores remain more or less identical.While one can easily explain why under plurality a longer set of candidates might decrease the number of votes for each candidate, since a voter can give her support to only one of them, the same logic does not apply to AV, where participants can approve of any number of candidates.Our data show that all the four candidates, and Clinton in particular, have lost around 10% of their support with the long set compared to their scores with the short set.This is explained by the fact that even with the long set we still observe that about half of the participants approve of one candidate only under AV.Therefore, both under plurality and AV, the four candidates present in both sets have suffered from a dispersion of the participants' supports for the long set.
From a theoretical perspective this issue is very important and is widely discussed in the literature.It refers to a condition termed "independence of eliminated alternatives".Concretely, this condition can be stated as follows: "a candidate that wins an election with m candidates must not lose the election if another candidate is no longer available" (Dougherty and Edward, 2011, p. 79). 18Clearly vote splitting makes it impossible for plurality to satisfy this condition, and this is what our data confirms.According to Brams and Fishburn (1983), AV should be less affected by this effect, for it allows voters to vote for as many candidates as they want.Accordingly, AV outcomes would be more stable and sincere than under plurality.

17
RV average scores are divided by the maximum grade, 5, and multiplied by 100.IRV "scores" are the percentages of first preferences.See Table A5.
alternatives" condition (Arrow 1963).Although similar in spirit, the conditions are not equivalent.According to Dougherty and Edward (2011, p. 79), Arrow's condition "requires that the social ranking between any two alternatives should be independent of individual rankings between one or more alternatives that are not part of the pair," thus "it does not refer to the addition or elimination of alternatives." But our results (see Table 1) show that some voters change their decision to approve or not a candidate depending on who the other candidates on the ballot are.How important and consequential is this effect?In the present case, AV does not directly violate the independence of eliminated alternatives condition because the presence of other candidates-while changing absolute approval percentages-does not change the candidates' winning order compared to the original short list results.Our results would suggest that AV is largely if not totally independent of eliminated alternatives.In the 1984 experiment conducted in Pennsylvania, Koc (1988) made this point and noted a difference when one adds or deletes an important candidate (like Sanders in our case).The differences in the support greatly depended on the "nature of the race" and the "nature of that [added] candidate" (Koc 1988, 704).Contrary to Koc, it appears here that under AV the dispersion of the support is homogeneous for the four candidates of the short set, who all lose comparable percentages of support, as is clear in Figure 2.
The short and long sets are totally different under plurality.For instance, the negative effects are clear on both Clinton's votes (due to Sanders) and Trump's votes (due to Cruz).Indeed, under plurality, the dispersion of votes between the two sets is highly heterogeneous.The same is also true with IRV's first-round preference results.

From plurality to multi-nominal voting rules: a change in the winner
In this subsection we focus our analysis on the long set to examine why RV, IRV, and AV produce different winners compared to plurality.More generally, we examine the kind of candidates favored by certain voting rules.Figure 3 represents the gaps in the scores between plurality, AV, and RV for the nine candidates of the long set with normalized scores as in Figure 2.For instance, Clinton got 31.38% of votes under plurality and was approved by 39.78% of participants under AV.Her score gap between these two voting rules is thus equal to 8.40.19

Figure 3: Scores gaps compared to plurality
These elements clearly demonstrate that Sanders benefits the most by switching from plurality to evaluative multi-nominal voting rules such as AV and RV.The previously unpopular candidates also make significant improvements under these rules.Conversely, the candidate who is least advantaged by evaluative rules is Trump, thus showing that the latter did not attract much support beyond the supporters who voted for him under plurality.The same applies for Clinton.As previously stressed, the scores of Plurality and IRV first choices are very similar. 20  The next stage is to shed light on the characteristics of the candidates who benefited the most and least.We do this to define "types" of candidates beyond their party affiliation.Here, we follow Baujard et al. (2014) and consider the distribution of grades under RV for Trump, Clinton, Stein, and Sanders, four candidates that provide interesting examples of grade profiles (Figure 4). 20Another way to compare the scores of plurality, IRV, AV and RV is exposed in the Appendix (Figure A6). Figure A6 shows an alternative normalization of the scores so that the total score is 100 for each rule ("normalized relative scores").For Plurality this is just the vote percentage and for IRV the percentage of voters who place the candidate first, but for Approval it is the percentage, among all approvals, of approvals received by each candidate and for RV, similarly, the percentage of all assigned points.On Figure A6, as in Figure 3, one can see that for all candidates, IRV first choices and plurality scores are hardly distinguishable (within statistical confidence).Above all, Figure A6 evidences the fact that RV, and to a lesser extent, AV produce a flatter image of the political landscape.
First, the profiles of the two main candidates of the 2016 election, Clinton and Trump, share the same characteristics: a U-shape profile, with very high number of grades 0, few intermediate grades, and a rather high number of maximal grades 5.These are the characteristics of a kind of candidate one might term "divisive" or "exclusive" as defined above: they receive strong support, but are also often rejected.In principle, exclusive candidates might be exclusive for at least two different reasons: (1) they might be candidates who generate intense feelings among sincere voters who thus grade them in an "extremist" way; or (2) they might be candidates who have a chance of winning and therefore are voted for in an "extremist" way by strategic voters, even though these strategic voters might otherwise have moderate feelings about them.Note that this second explanation does not fit well in our study, for HA shows the exact same tendencies as RV.

Figure 4: Grade profiles
Sanders's profile exhibits the reverse features, showing a flatter grade profile.Sanders represents another kind of candidate, whom one may call "inclusive": as said above, he does benefit from widespread support by the voters, but with lower levels of strong rejection or attachment.This kind of candidate is the one who loses more with plurality voting since this method is unable to take into account moderate support, contrary to the additive multinominal voting methods.
Finally, one can observe a third kind of candidate, illustrated by the Stein profile.Candidates such as Stein receive a large number of minimal grades, without benefiting from strong support otherwise.Thus, their grade profile is decreasing from bad to good grades.This third category of candidates typifies the "minor" candidates, who have no chance of winning and do not elicit much expressive support.Contrary to the exclusive and inclusive kinds of candidates, these minor candidates are often unknown to many voters and do not receive significant media exposure.This is especially common in the US because the characteristics of the plurality voting system and the Electoral College lead to the media neglecting minor candidates.In this third category, in addition to Stein, it would be fair to include all the candidates of the long set except the exclusive Clinton and Trump and the "inclusive" Sanders.
To go further, one can describe the proximity between candidates at the voter level.The "agreement" matrix based on AV data is shown in Table 5. 21 This provides the candidates' ability to attract voters who also support other candidates.For an election with the nine candidates of the long set, the agreement matrix for AV has 81 values.Each row in the matrix displays the percentage of the supporters of a specific candidate who approve of another (column) candidate.Consequently, the diagonal is always equal to 100%, but the matrix is not symmetrical.For instance, the proportion of Clinton's supporters who also support Bloomberg is 13%, while the proportion of Bloomberg's supporters who also support Clinton is 46%.
The columns of the table show the cross-over voter support of a candidate, i.e., a candidate's ability to attract voters who also support other candidates.The rows, on the otherhand, show the dispersion of a candidate, i.e., a candidate's propensity to share supporters with other candidates.From this, one can compute synthetic measures of average cross-over voter support and dispersion for each candidate as the simple average of the columns and rows outside the diagonal.The candidate with the highest cross-over voter support is Sanders with a measured cross-over support of 35%, consistent with his "inclusive" type.Regarding the dispersion, the lowest averages are obtained by Trump and Clinton (9 and 10%,respectively).Again, this is consistent with their "exclusive" type-their supporters are heavily focused on them and tend not to share support with many other candidates.In all, corroborating Baujard et al.'s results (2014), mono-nominal rules such as plurality favor "exclusive" candidates, compared with multi-nominal, evaluative, methods such as AV and RV that benefit "inclusive" candidates. 22In the 2016 US election case, it appears that the latter rules with a large set of candidates would indeed allow the election of the candidate Sanders-whom this study finds to be favored according to several different criteria.Neither direct plurality nor the official result, however, would have led to Sanders winning.One explanation for this situation is that, while Sanders did not win the actual Democratic Party primary, our data indicates that he had large approval voting crossover support with Clinton, Johnson, and Stein.Further, Sanders's crossover support from Trump voters was nontrivial at 13% (more than double the Trump crossover support that Clinton got).

22
In this comparison, IRV, despite the fact that it allows the voter to rank the candidates, appears closer to plurality.This is a reminder that the IRV counting process is a "single transferable vote" process whose effect might be close to the effects of two-round majority voting.

Concluding remarks
Based on a representative sample of more than 2,000 respondents, we found that, according to their stated true electoral preferences, before the 2016 US election Bernie Sanders appears both as the utilitarian optimum and the Condorcet winner.Our results suggest that multinominal evaluative voting rules such as RV and, potentially, AV would have led to the election of Sanders among a large set of candidates.This is in contrast to direct plurality and to the official voting rule.
In this study we have focused our inquiry on three main questions: (1) How do US voters use these alternative rules?( 2) What is the influence of the chosen set of candidates (short vs. long sets) on the outcome of the election?(3) Do different voting methods differ in their election outcomes?First, our results are consistent with the interpretation that the respondents really did use the tested multi-nominal rules to express their preferences more widely and sincerely.This is particularly true for range voting whose distribution of grades presents significant similarities with voters' honest assessments (HA).Second, we showed that the number of candidates running might have a large effect on the outcome depending on the voting method used.The multi-nominal rules, and RV in particular, are less sensitive to other "eliminated" candidates.This makes those methods closer to voters' true preferences and thus closer to sincere outcomes.Plurality voting, on the other hand is highly impacted by the number of candidates because of its mono-nominal feature.Third, the multi-nominal rules could indeed elect the "best" candidate according to voters' true preferences.While AV and RV favor the election of inclusive candidates who are appreciated by a large proportion of voters, plurality tends to prefer exclusive candidates who are either strongly supported or strongly rejected.In the 2016 US case, Sanders can be seen as an inclusive candidate while both Clinton and Trump were exclusive.Further, our data show RV and AV capturing much more support for minor party candidates compared to plurality and IRV (Table A5).AV was still able to do capture that minor party support despite most voters choosing only one candidate (Figure A1).
The present study is based on a single poll, made in specific circumstances so that, as explained above, does not allow an explicit inference on what American politics would be under other circumstances.That said, all the observations made are consistent with the empirical literature.It is thus legitimate to take stock of the received knowledge and to state in which direction our result point with respect to the question of electoral reform.According to our results, it appears that changing the official voting rule in the US in favor of multinominal methods would be a good reform for American voters because the outcome would be more faithful to their true electoral preferences.
Of course, many important issues remain unaddressed at this stage.One could legitimately wonder whether the American people are prepared to opt for another voting rule.One question in our survey deals with the likeliness to participate in a vote under one or the other voting method, and might be of some help to clarify this point.Figure 5 shows the percentages of voters who would agree to take part in a vote under plurality, AV, RV, and IRV.As one clearly sees, plurality incontestably remains the favorite voting method of the respondents.Complementary studies might put the emphasis on trying to understand the nuance of this phenomenon, which is in tension with the fact that some US cities have indeed adopted alternative voting rules following education campaigns.Given that, it appears that voters need to be educated about voting rules beyond just learning their mechanics.
Defining what it means to "understand a voting rule" is by no means trivial.In a survey on voting experiments Laslier (2011) distinguished between voters' difficulties in completing the task (choosing, ranking, grading…), understanding of the voting rule per se (how ballots are counted…) and understanding the political consequences of using one rule or the other.He argues that there is generally no problem with the first point, but that the two other points must be treated with care.
The present paper offers some hints there.We indeed spotted no practical difficulties.We found cases that seem to imply that the voter misunderstands the rule itself; these cases are 2% of IRV ballots which are both non sincere and irrational, due to the subtle way IRV iterative counting works.One cannot exclude that these cases are just due to inattentive respondents, so that strong conclusions should be avoided on that point.
AV ballots in general seem both sincere and rational, but we observed a large number of single-name approval ballots (about half of the ballots on the long set of candidates).This is more than what has been described elsewhere (as for the experiments on AV conducted during the French presidential elections from 2002 to 2017, the proportions of single-name approval ballots ranged from 11.1% to 26.2%, see notably Baujard et al. 2016), but is not surprising in the American context with its traditionally small supply of candidates.Lastly, the question of understanding political consequences is beyond the scope of this work.Clearly, further studies in these directions are needed.

Appendix A: Complementary statistics
In the tables, standard errors are in parenthesis and the pictures show the standard error bars.Note: The difference RV-HA is computed on the voters who provided both grades.The data set provided by GfK Custom Research was of good quality, but it is important, independent of the quality of the survey, to check to what extent the questions themselves were correctly filled in by the respondents.To do so, the consistency of the voters' answers has been assessed through the four tested voting rules plus HA.Accordingly, voters were split into five groups: (1) OK (auto): the data on the voters were automatically checked (all pairwise covariances between two voting rules were nonnegative); ( 2) OK (manual): the data on the voters were manually checked and no correction was necessary; (3) Corrected: some simple correction was applied and the result makes sense with no doubt; (4) Dubious: whether some correction was applied or not, the data remain dubious.(5) Excluded: the voter is completely excluded.
All in all, as shown in Table B1, more than 95% of the voters belong to categories (1), ( 2) and (3).To maximize the quantity of data, we have decided to include Category (4) into our analyses, while a very small percentage of our data has been excluded: 10 voters for the short set and 10 for the long set, which corresponds to about 1% of our observations.Let us offer a few examples of each category, which leads to a better understanding of the data, their meaning and scope.B2 illustrates the case of a voter in the long set whose votes are consistent all along the four tested voting rules plus HA with no need for manual checking or extra corrections.This voter supports Clinton under plurality, approves of Clinton and Sanders under AV, ranked Clinton first, Sanders second and Bloomberg third under IRV and gives the maximum grade to Clinton and Sanders under RV and HA.Table B3 also shows a consistent voter of the long set, but with manual checking.There is no need for corrections, for their approval of McMullin under AV and their ranking under IRV might be caused by strategic considerations.Table B4 illustrates a voter of the short set whose electoral choices required some corrections.Indeed, the maximum grades she gave to Clinton and Stein under RV and to Clinton under HA show that they misunderstood the instructions of IRV.Thus, we corrected their votes under IRV by reversing her ranking, which makes all their responses consistent.Table B5 presents a dubious case.Here, the voter's choice of Cruz under plurality (and AV) might seem weird, but remains plausible.Thus, the data from this voter are kept with no further correction.Eventually, Table B6 illustrates the case of a voter of the short set who has been excluded from our analyses on the grounds that her responses are systematically inconsistent.Now, regarding the importance and the kind of corrections we did, Table B7 clearly demonstrates that some voters have encountered difficulties understanding the instructions of IRV, RV, and HA for they reversed their ranking and grades as in Table B4 for IRV.Under RV and HA, they gave for instance the grade 1 to their favorite candidates instead of 5.Under IRV, a single ballot gave the rank 3 twice, so that we have removed both of them for it seems to be a simple mistake.Besides, some ballots were difficult to understand, but here this does not appear to be a misunderstanding of the instructions, we thus kept the data as they were.All in all, Table B7 shows that the quasi-entirety of our data are reliable, which have led to a very small percentage of exclusions.

Figure A1 :
Figure A1: Number of candidates approved per ballot Note on the statistical robustness of IRV results: Working on the long set by bootstrapping our data set, one finds the following.With confidence .962, the first two eliminated candidates are Castle and Stein.With confidence .988, the first three eliminated are Castle, Stein and McMullin: they are followed with full confidence by Bloomberg and Johnson or by Johnson and Bloomberg, and then by Cruz.Therefore, the last three remaining candidates are always Clinton, Trump, and Sanders.Three endings occur: (i) elimination of Sanders then Trump, victory of Clinton (frequency .981);(ii) elimination of Clinton then Trump, victory of Sanders (frequency .151);(iii) elimination of Sanders then Clinton, victory of Trump (frequency .039).The confidence for the statement "Clinton wins under IRV" is therefore .981.

Figure
Figure A4: Instant Runoff resultsshort set Figure A6: Normalized relative scores