Do Voting Advice Applications Affect Party Preferences? Evidence from Field Experiments in Five European Countries

ABSTRACT Voting advice applications (VAAs) are online tools that provide voters with personalized information on the extent to which their policy views match those of political parties or candidates. These tools have proliferated across advanced democracies in recent years and become integral parts of electoral campaigns, especially in multi-party systems. However, it remains unclear to what extent voters actually make use of VAAs to inform their voting preferences. We present new field-experimental evidence on the short-term effects of VAAs on party preferences from five European countries. We find consistent evidence that exposure to VAA advice leads voters to update their party preferences in line with the information provided. Furthermore, we find partial evidence that VAAs more strongly influence less politically interested and undecided voters. Overall, our results point to the potential value of VAAs as a mechanism to strengthen democratic representation and accountability.


Introduction
Among the most established findings in political science is that voters tend to have low levels of political information (e.g., Delli Carpini & Keeter, 1996). A worrying implication is reduced democratic accountability: effective democratic control requires that citizens conduct detailed evaluations of parties' and candidates' policy platforms and then cast their votes for the party or candidate whose issue positions are closest to their own (Enelow & Hinich, 1984). However, acquiring information about the policy stances of parties and candidates is costly and many citizens do not have strong incentives to incur this cost. As a result, many voters do not vote for the party or candidate which best matches their policy interests (Lau et al., 2014). In this article, we study a proposal for a partial remedy to the problem of the uninformed voter: online voter information tools known as voting advice applications (VAAs).
VAAs match voters with parties or candidates based on their policy views. Typically launched during election campaigns, their stated mission is voter education and information. As such, VAAs perform a function similar to the traditional mass media during election campaigns . However, VAAs go beyond newspapers, TV, and radio because they provide voters with personalized information on the congruence between their policy preferences and the programs of political parties or candidates. In that sense, VAAs are more similar to campaigning materials, such as leaflets or political ads. Yet contrary to the latter, VAAs are nonpartisan and their developers tend to strive to scientific accuracy. For example, VAA developers often spend considerable energy on the identification of relevant policy issues (cf. Walgrave et al., 2009) and the coding of party or candidate positions (cf. Garzia et al., 2017;Gemenis, 2015).
VAAs have proliferated across democracies in recent years and become integral parts of electoral campaigns, especially in multi-party systems. Given their significant popularity, it is no surprise that VAAs are increasingly attracting the attention of political scientists, communication scholars, psychologists, and even computer scientists (for a recent review of VAA research cf. Garzia and Marschall 2019). Nevertheless, the answer to one of the most foundational questions in VAA research -whether voters actually use them to inform their voting preferences -remains unclear.
In this article, we report new evidence from a series of field experiments designed to test the short-term effects of VAAs on party preferences. Contrary to most existing studies, we integrated our experiments directly into actual VAAs, which allows us to study the effects of real-world VAA usage. Furthermore, our samples are much larger compared to prior studies, thus alleviating concerns related to statistical power. While most prior experimental studies focused on a single case context, our study covers a total of five European countries (Bulgaria, Greece, Romania, Spain, and the UK) and two electoral contexts (supranational and national).
Our results suggest that VAA usage leads voters to update their party preferences in line with the information provided, at least in the short term. Notably, this finding holds across all case contexts we study. Furthermore, we find that VAAs more strongly influence less politically interested and undecided voters. However, the latter findings do not emerge in all cases and we fail to find support for several other sources of effects heterogeneity proposed in the literature. As discussed in the conclusion, this suggests that individual-level causal heterogeneity remains poorly understood. Still, our results provide clear evidence that voters are influenced by VAAs, at least in the short term. Overall, our study therefore points to the potential value of VAAs as a mechanism to strengthen democratic representation and accountability.

Voting Advice Applications
The first VAA was developed in the Netherlands in 1989. It took the form of a paperand-pencil test and was targeted at high school students (de Graaf, 2010). In the years since, VAAs have broadened their target audience to citizens in general, moved online, and diffused around the world. VAAs are now regularly made available before elections in many advanced democracies and have achieved considerable popularity, especially in multi-party systems (Cedroni & Garzia, 2010). For example, between 10% and 20% of eligible voters turned to VAAs in the run-up to recent elections in Belgium, Canada, Finland, Germany, Greece, New Zealand, and Switzerland. Even larger numbers of voters turned to VAAs before recent elections in Denmark and the Netherlands. In other countries such as Australia, France, Italy, Romania, Spain, the UK, and the U.S., VAAs have not reached the same level of popularity, but still often attract tens or even hundreds of thousands of voters (Germann & Gemenis, 2019;Marschall, 2014). In many of these countries multiple, competing VAAs are now developed for the same elections (Garzia & Marschall, 2019).
While there are differences between VAAs in terms of their design (e.g., Gemenis & van Ham, 2014;Germann & Mendez, 2016;Germann et al., 2015;Louwerse & Rosema, 2014;Walgrave et al., 2009), all VAAs share the same theoretical foundation, basic functionality, and principal goal. VAAs are rooted in issue voting theory, a strand of social choice theory which broadly posits that voters' electoral choices are informed by the match between their own positions on policy issues and those of political parties or candidates (Downs, 1957; for a recent review cf. Walgrave et al., 2020). More specifically, most VAAs are inspired by proximity voting theory, though there are also some VAAs that draw inspiration from directional models (Mendez, 2012(Mendez, , 2017. To our knowledge, there are currently no VAAs which directly incorporate other strands of issue voting theory, such as issue ownership theory (Budge & Farlie, 1983) or discounting theory (Grofman, 1985). 1 More generally, there are to our knowledge currently no VAAs which incorporate any theory of voting other than issue voting.
Models of issue voting and, in particular, proximity voting theory are often argued to be normatively desirable: if voters choose the candidate or party that is closest to them on policy issues, this is likely to strengthen democratic accountability (Enelow & Hinich, 1984). However, while there is a significant body of evidence suggesting that voters are to some extent guided by issue positions when they make electoral choices (Ansolabehere et al., 2008;Jessee, 2010;Westholm, 1997), there is also ample evidence that many voters have low levels of political interest and information. As a result, many voters are unable to identify the party or candidate that best represents their policy views (Achen & Bartels, 2017;Dassonneville et al., 2020;Lau & Redlawsk, 2006;Lau et al., 2014). VAAs' principal goal is to help voters with the identification of the parties or candidates that best match their own policy preferences (Anderson & Fossen, 2014). First, VAA users indicate their positions on a range of policy issues. Then, the application compares their positions with those of the different parties or candidates; and, finally, the application presents the results of this comparison to the users in the form of a rank-ordered list or a graph indicating how close the different parties or candidates are to the user.
Proponents of VAAs have pointed to several important potential benefits of these tools. First, VAAs may increase voters' political knowledge and, in particular, their knowledge about the positions of political parties and candidates (Anderson & Fossen, 2014). Second, VAAs may increase voters' ability and willingness to make electoral choices based on policy preferences and, thus, increase issue-voting. Indirectly, VAAs may therefore contribute to issue-based political representation and democratic accountability (Walgrave et al., 2008). Third, VAAs may remind citizens of the upcoming election and, by providing personalized voting advice, reduce the cost of voting. In addition, VAAs may also heighten voters' awareness of differences between parties or candidates and, therefore, of how much is at stake in an election. As a result, VAAs have been argued to increase electoral turnout (Germann & Gemenis, 2019;Marschall & Schmidt, 2008).

Existing Research
A large number of prior studies investigated the extent to which VAAs are able to live up to their various promises, many of which reported promising results. For example,  found that usage of the German VAA Wahl-O-Mat is associated with higher knowledge about party positions. Similar evidence is reported by Kamoen et al. (2015) and Heinsohn et al. (2016). Several other studies found indications that VAAs increase issue-voting, i.e., that voters adapt their voting preferences after VAA usage and vote for the party or candidate that was recommended by the VAA (Alvarez et al., 2014;Kleinnijenhuis et al., 2019;Ruusuvirta & Rosema, 2009;Walgrave et al., 2008). Finally, several studies found evidence for an association between VAA usage and higher electoral turnout (Gemenis & Rosema, 2014;Kruikemeier et al., 2014;Marschall & Schultze, 2012), especially among younger voters (Germann & Gemenis, 2019). However, most of the existing evidence in favor of VAA effects is based on observational research designs which cannot establish causal effects.
A variety of observational designs have been proposed in the literature, but endogeneity concerns have remained a key issue. VAA usage is known to be nonrandom: for example, it is well-established that VAA users tend to skew younger, have high educational attainment, and have above-average political interest (Marschall & Schmidt, 2008;Marschall, 2014). Especially in early studies, a common strategy has been to counter the resulting selection effects by adjusting for known predictors of VAA usage using regression adjustment or matching in the context of cross-sectional election surveys. However, not all predictors of VAA usage may be known or adequately measured in election surveys. In light of this, several stronger observational designs have been proposed including instrumental variable techniques (Pianzola, 2014a(Pianzola, , 2014b and panel set-ups (Alvarez et al., 2014;Heinsohn et al., 2019;Kleinnijenhuis et al., 2019;Walgrave et al., 2008). Yet, the instruments that have been suggested (e.g., gender and left-right self-placement) are unlikely to satisfy the exclusion restriction (Germann & Gemenis, 2019). Moreover, while panel data significantly reduces the risk of omitted variable bias by mitigating bias due to time-constant individual-level confounders, concerns about causal identification remain as panel data does not offer a straightforward way to account for time-varying confounders. Another suggestion has been to construct placebo and other plausibility checks, but while such tests can improve confidence in observational estimates, they cannot establish causality (Germann & Gemenis, 2019).
As elsewhere in the political science literature, concerns about omitted variable bias in observational research have motivated a turn to experimental designs in the VAA literature. Crucially, in stark contrast to observational studies, most existing experimental studies found little evidence for an effect of VAA usage on political knowledge, voting preferences, or electoral turnout (Enyedi, 2016;Mahéo, 2016Mahéo, , 2017Munzert et al., 2020;Pianzola et al., 2019; for notable exceptions see Garzia et al., 2017;Vassil, 2011). The frequent null results in experimental studies have given rise to suggestions that the VAA effects that were reported in the observational literature are owed largely or even entirely to omitted variable bias, and that hopes that VAAs would help to promote informed voting based on policy positions were therefore misplaced (Munzert & Ramirez Ruiz, 2021).
We argue that it is too early to close the case, for two main reasons. First, many existing experimental studies have drawn on comparatively small samples, suggesting a need for better-powered studies. Second, most existing experimental studies have drawn on the same experimental design -the 'encouragement design'-which raises questions about generalizability from the experimental setting to real-world VAA usage. More specifically, the encouragement design involves randomly assigning an encouragement early on during an election campaign asking people to use a VAA, which may take the form of a verbal appeal, a financial incentive, or a combination of both. The same subjects are then surveyed at a later point about their political knowledge, turnout, or electoral preferences. Because the encouragement is randomly assigned and therefore exogenous, it is possible to disentangle the effects of VAA usage from potential confounders. However, the encouragement design can only establish causal effects on a rather specific group of people, namely, experimental subjects who made use of a VAA solely because they were encouraged to do so by the researchers (i.e., "compliers") (Eckles et al., 2016). This raises two related concerns about generalizability. First, since the encouragement design's target estimand is people who would not have otherwise used a VAA, it remains ambiguous to what extent VAAs affect the millions of people around the world who use VAAs without further encouragement by researchers. Second, it is possible that the artificiality of being encouraged to use a VAA in the context of a survey, and potentially being paid to do so, affects estimates of VAA effects.
In this article, we build on an alternative experimental design which addresses these concerns, albeit, as discussed below, at a cost. This design was originally proposed by Garry et al. (2019) in a study of the effects of VAAs on party preferences in the context of a regional election in Northern Ireland. Rather than encouraging VAA usage, the central idea is to integrate an experimental manipulation directly into a real-world VAA. More specifically, the idea is to randomize the time at which VAA users are asked about voting preferences: either before they are exposed to the VAA advice or thereafter. This makes it possible to causally identify the effects of exposure to VAA advice on voting preferences. Because what is varied is the time at which users are asked about their voting preferences, we refer to this design as the "timing design".
An important benefit of the timing design is that it makes it possible to estimate the causal effects of actual, real-world VAA usage. Notably, the experimental manipulation is also much less obvious compared to the encouragement design, which is likely to further strengthen external validity. Furthermore, many VAAs are used by large numbers of citizens. Therefore, the integration of an experimental manipulation directly into a VAA ensures high statistical power, at least assuming the VAA is able to garner a moderate level of popularity. 2 Finally, another benefit of the timing design is that the integration of the experimental manipulation directly into a VAA guarantees straightforward access to the voting advice shown to experimental subjects, which is important when studying the effects of VAAs on voting preferences. By contrast, the advice provided by the VAA is difficult to identify with the encouragement design. As a result, researchers have either relied on recall measures, which could introduce selection effects due to cognitive biases (Walgrave et al., 2008;Wall et al., 2014); or they have circumvented the issue by focusing on outcomes that can be analyzed without access to the VAA advice, such as changes in voting intention over the course of the study. While less problematic, the latter approach does not allow for a comprehensive test of the hypothesis that voters align their voting preferences with the advice provided by VAAs.
It is important to note, though, that the benefits of the timing design come at a cost since it can only establish short-term effects. The reason is simple: ultimately all experimental subjects, including those in the control group, are exposed to the VAA advice. Thus, it is not possible to track causal effects beyond subjects' engagement with the application. This is different in the encouragement design since it randomizes VAA usage and not merely the timing of the outcome questions. Therefore, it is straightforward to experimentally establish both short-and long-term effects using the encouragement design by varying the point in time when experimental subjects are re-interviewed.
The short-term nature of the effects established in the timing design clearly constitutes a significant limitation. In principle, it is possible that VAA effects dissipate quickly. And, in practice, VAAs are only politically relevant if their effects endure over at least the medium term and may therefore affect electoral outcomes. That said, to our knowledge, the timing design is the only design that has been proposed in the literature that makes it possible to identify the causal effects of actual, real-world VAA usage. Furthermore, while there have been several studies using the encouragement design, the timing design has to date been used only once. Notably, that one study did find causal evidence that VAAs affect their users' party preferences, contrary to most other experimental studies (Garry et al., 2019). Yet, the results of the study are limited to the context of a single regional election. Therefore, we believe that there is value in applying the timing design across a broader range of case contexts as the next step in VAA effects research. We do so by integrating experimental manipulations directly into five different VAAs that were deployed before the 2019 European parliamentary elections, and a sixth VAA that was deployed before the 2019 UK general election. Notably, our samples are much larger compared to prior studies, thus alleviating concerns related to statistical power. Finally, we contribute by systematically investigating individual-level causal heterogeneity, which until now has mostly been investigated in observational research and with much smaller samples.

Hypotheses
We test several hypotheses. First, in keeping with prior literature, we expect that the advice provided by VAAs affects their users' electoral preferences (Munzert et al., 2020;Pianzola et al., 2019;Walgrave et al., 2008). The rationale for this hypothesis is simple: VAAs provide information to voters on how close they are to parties or candidates on a large number of political issues. In keeping with standard assumptions from issue voting theory, we expect that VAA users leverage this information to reevaluate their voting preferences (Enelow & Hinich, 1984). If users are informed that they are close to a party or candidate on political issues, they will be more supportive of that party or candidate; if they are informed that a party or candidate is far away from them in terms of political issues, they will be less supportive.

H1: VAA users align their voting preferences with the advice they receive.
However, not all users of VAAs may be equally likely to be influenced by the advice. In this study, we consider a total of five frequently suggested sources of individual-level causal heterogeneity: age, education, political interest, whether voters already have a vote intention, and whether issue positions are an important consideration to voters. First, age has been argued to moderate the relationship between VAAs and electoral preferences because younger voters tend to have lower political knowledge and less solidified voting preferences (Pianzola, 2014b;Vassil, 2011). In keeping with existing literature, we therefore expect that VAAs have stronger effects on the voting preferences of younger voters.
Similarly, education and political interest have been argued to act as moderators because less educated and less politically interested voters often have comparatively low levels of political information (Alvarez et al., 2014). Conversely, more interested and more highly educated voters may be more critical of the information provided by VAAs (Kamoen et al., 2015). Therefore, we expect that less educated and less politically interested voters are more likely to be influenced by VAAs.
Prior research suggests that some VAA users already have a relatively firm voting intention when they use the tool while others do not, be it because they do not have any strong preference or because they are oscillating between different candidates or parties (van de Pol et al., 2014;Wall et al., 2014). As voters who have already made up their mind may have a lower need for new information and more generally may be unlikely to revisit their decision, we follow the existing literature in expecting that VAAs have stronger effects on the preferences of undecided voters (Garry et al., 2019;Kamoen et al., 2015;Kleinnijenhuis et al., 2019). Finally, political issues are likely to be the decisive factor for some voters while others lay greater emphasis on other factors, such as the perceived competence of candidates, their gender, or their ethnic identity. Since VAAs provide information on issue congruence, we expect that they influence voters for whom issues are an important consideration more strongly than voters for whom other considerations are more important (Vassil, 2011).

Experimental Design
To test our hypotheses, we integrated randomized experiments into five VAAs that were deployed in the weeks prior to the May 2019 elections to the European Parliament (EP) in countries from Eastern (Bulgaria, Romania), Southern (Greece, Spain), and Western (UK) Europe. For replication in the context of a national election, we repeated the same experiment using a VAA that was deployed in the run-up to the December 2019 UK general election (GE). Table 1 provides additional information on the VAAs we study. All VAAs were made freely available online and promoted via print, broadcast, online, and social media. 3 The total number of users (see Table 2), after removing repeated attempts by the same people and other invalid entries, 4 was between 4,000 (Bulgaria) and 57,000 (UK EP). However, for reasons we detail below, we analyze only a subset of these responses. In all of the countries analyzed, the same VAAs had been deployed in similar form before previous elections. More generally, their design was similar to that of many other prominent VAAs. Specifically, upon accessing the tools, voters were first asked to answer a few general questions on their demographics and then to indicate their preferences on up to 30 policy statements (e.g., "Privatization leads to a more efficient provision of public services"). Issue statements were carefully selected to reflect important political issues across a number of policy areas (e.g., economy, immigration, climate change) and varied across countries. Users were asked to indicate their issue preferences on five-point scales ranging from "completely disagree" to "completely agree", with an additional "no opinion" option. Voterparty issue congruence was estimated by comparing the answers provided by users to the positions of the various parties as estimated by political scientists who examined primary sources through an expert survey. The results were shown in the form of a bar chart indicating the degree of congruence between the user and the various political parties. Congruence scores ranged from -100 (complete disagreement) to + 100 (complete agreement), and used a traffic light system to convey the degree of voter-party congruence: scores below 0 were shown in red and flagged as negative matches; scores between 0 and 40 were shown in amber and flagged as weak matches; and scores above 40 were shown in green and flagged as strong matches. In practice, scores close to the -100 to + 100 extremes were rarely achieved. We provide additional details on the design of our tools including screenshots, the selection of policy issues, the formula used for calculating issue congruence, and the coding of party positions in section 1 of the Supplementary Material.
The experimental manipulation consisted of the time when users were asked to provide information on their voting preferences. Users were randomly assigned to a control or treatment group upon accessing the online tool. The control group was asked about their support for parties before seeing their issue congruence scores. By contrast, users in the treatment group were asked about their support for parties after seeing their issue congruence scores (see Figure 1). This was achieved by a pop-up window prompt that appeared 30 seconds into the results screen, to which we refer as the opt-in page. We measured voting preferences using a battery of "propensity to vote" (PTV) questions asking how likely it is, on a scale of 0 to 10, that users would vote for the different parties contesting the election (van der Eijk et al., 2006). The null hypothesis in the χ 2 test is that respondents in the treatment and control groups are equally likely to complete the opt-in questionnaire. CT = control; TR = treated.
The randomization of the time when VAA users were asked to answer PTVs allows for between-subject comparisons of party preferences depending on whether or not subjects were exposed to new information on issue congruence or, expressed differently, treated with VAA advice. However, an issue we are facing is attrition. Some respondents in the treatment group may have already left the website by the time the opt-in page was shown, while others may have declined to answer the PTV questions. Therefore, we showed an analogous pop-up window prompt to users in the control group featuring an unrelated question about turnout in the upcoming election. In all our analyses, we restrict the sample to users who completed the opt-in pages (groups A and C in Figure 1). This ensures that attrition is random across the experimental groups analyzed and therefore not a threat to the internal validity of the experiment. As Table 2 shows, opt-in completion rates vary from around 20% to 44%, depending on the country. Notably, there are no statistically significant differences in opt-in completion rates between control and treated groups. Furthermore, control and treated opt-in takers are balanced in terms of key demographics and political attitudes (see Table 3). Despite the sample restrictions we need to employ, the relative popularity of our VAAs ensures sufficient statistical power, with the number of analyzable responses ranging from 1,700 (Bulgaria EP) to more than 20,000 (UK EP).
A possible concern with our reliance on opt-in takers is that these could be different from the typical user, limiting the external validity of the experiment. However, as we show in section 3 of the Supplementary Material, the differences between users who completed the opt-in page and those who did not are minor in terms of individual-level attributes including age, gender, education, and political interest. Similarly, the nature of the advice received and whether or not it is consistent with prior party predispositions is no strong determinant of whether or not users completed the opt-in page. The high similarity between our opt-in and non-opt-in samples strengthens confidence in our ability to generalize the results from our opt-in samples to all users of our VAAs.
Finally, our samples tend to over-represent younger, male, highly educated, and more politically interested voters (see Table 3). Therefore, another concern could be that our samples differ from general voter populations. However, our research interest is not in establishing VAA effects among general voter populations but among VAA user populations, and prior research suggests that it is common that VAAs are used disproportionately by, among other things, younger, male, more politically interested, and relatively welleducated voters (Marschall & Schmidt, 2008;van de Pol et al., 2014;Vassil, 2011). Therefore, we would argue that our samples are broadly representative of VAA user populations. In fact, a perhaps more important caveat is that while our VAAs did enjoy a certain popularity, the number of users they were able to attract is clearly below that of some other, more institutionalized VAAs. Therefore, an interesting extension of our study would be to replicate the timing design in the context of a VAA which regularly attracts millions of users, such as Stemwijzer in the Netherlands or Wahl-o-Mat in Germany.

Results
We begin by investigating the average effects of exposure to VAA advice on party preferences conditional on the issue congruence scores shown to users (H1). To do so, we estimate a total of six linear regressions, one for each experiment. The dependent variable in all regressions is the PTV, i.e., the propensity to vote for a given party (0-10). Depending on the electoral context, users were asked to rate between five and nine parties. Accordingly, we perform all analyses on stacked datasets where the unit of analysis is the user x party combination. The number of observations included in the analysis ranges from 5,800 (Bulgaria EP) to 140,000 (UK EP). Because the same users are observed multiple times in our data, we cluster standard errors at the user level.
Our regressions include a constant and three independent variables: (i) a binary indicator of the treatment status (i.e., whether a user was assigned to the treatment or control group); (ii) a user's VAA issue congruence score for a given party; and (iii) the interaction between treatment status and congruence score. Importantly, our interest is not in the VAA congruence score, but rather its interaction with the treatment status. Only users in the treatment group were actually exposed to the VAA congruence scores when they indicated their party preferences. Therefore, a significant-positive interaction term indicates that users in the treatment group updated their party preferences in line with the VAA advice. 5 Figure 2 visualizes the results in line with recommendations by Berry et al. (2012). The regression output is reported in section 5.1 of the Supplementary Material. A remarkably consistent picture emerges. Across all six cases we find a positive and statistically significant interaction effect between the treatment indicator and the VAA congruence scores (p < 0:001, except for Romania where p ¼ 0:0021). This suggests that exposure to VAA advice had a causal effect on users' party preferences in all countries and voting contexts examined. Turning to effect sizes, we find that exposure to information that a party constitutes a good (green) match on average led to increases in the propensity to vote for that party by 0.25 to 0.5 points on an 11-point scale, or 5% to 20% of a standard deviation (see Table 4). At the same time, a bad (red) match tended to decrease the propensity to vote by a similar amount, though it is worth mentioning that in Bulgaria and, to a lesser extent, Greece, the effect of bad matches is statistically significant only at relatively extreme values.
These results provide consistent evidence that VAAs affect their users' voting preferences. Notably, this finding holds across a variety of democracies from Eastern, Southern, and Western Europe. Furthermore, in the case of the UK, the effect estimates are virtually identical in the VAA that was deployed before the EP elections and the VAA that was deployed before the general election. It is worth adding that Garry et al. (2019) reported similar effects in a prior study that used the same design to study the impact of VAAs on party preferences in the context of a regional election. Overall, this suggests that VAAs have similar effects on voting preferences independently of the electoral and country context. Of course, it is important to keep in mind that we are measuring effects immediately after exposure to the VAA advice. Furthermore, the effects are moderately sized, suggesting that there are limits in terms of the realignment of preferences VAAs can cause. Still, in line with H1, our results suggest that voters do take VAAs seriously and use them to inform their voting preferences, at least in the short-term.

Sub-Group Analysis
Next, we test whether exposure to VAA advice affects some people more than others (H2). Specifically, we consider whether the effects of exposure to VAA advice are conditioned by users' age (measured in years), education (university degree vs no degree), and political interest (high vs low). Furthermore, we investigate whether the effects of exposure to VAA advice depend on whether users already had a vote intention before seeing the VAA advice; and whether they see themselves as issue voters. We count users as issue voters if they indicated that the reason for their vote intention was that they are close to the party on political issues, as opposed to other reasons including leader competence or tactical voting. All moderators were measured pre-treatment (see Figure 1).
To investigate individual-level causal heterogeneity, we estimate a total of 30 linear regressions, each including a three-way interaction between the treatment indicator, the VAA congruence score, and one of our five moderators (5 moderators x 6 experiments = 30 models). In all models, the dependent variable remains the propensity to vote and the unit of analysis the user x party combination. Standard errors are clustered at the user level. Evidence for causal heterogeneity emerges when the three-way interaction term is statistically significant (p < 0:05). Figure 3 visualizes the results of all models where this is the case. The complete regression output including non-significant results is reported in section 5.2 of the Supplementary Material. We find only limited evidence for individual-level causal heterogeneity. The most significant exceptions emerge in the case of our hypotheses about political interest (H2c) and undecided voters (H2d). Specifically, consistent with expectations we find that exposure to VAA advice affects users with low political interest more strongly in the cases of Greece and the UK; and that undecided voters are more strongly affected in the cases of Spain and the UK (see Figure 3). Taken together, this suggests that voters with lower ex-ante levels of political information as well as voters with unclear preferences are more likely to adjust their voting preferences as a result of VAA usage. However, it is important to note that these results do not replicate in all cases. Furthermore, it is worth noting that while the point estimates suggest that VAAs affect less politically interested and undecided voters more strongly in both UK cases, the differences are statistically significant only in the case of the VAA we deployed for the UK general election.
This table shows the effects of exposure to VAA advice in terms of a standard deviation of the dependent variable (propensity to vote). n/a = VAA congruence score was never achieved; * p < 0.05, ** p < 0.01, *** p < 0.001. At the same time, we have to reject all other hypotheses about individual-level causal heterogeneity. Contrary to H2a, we do not find any evidence that younger voters are more affected by VAA advice. Turning to education, we find that the differences between voters with and without a university degree fail conventional levels of statistical significance in 5 of the 6 contexts we study. The only exception is Bulgaria, where we find that exposure to VAA advice had a stronger effect on the preferences of voters with a university degree. The latter result directly contradicts H2b, which predicted that voters with higher education should be less affected by VAAs. Similarly, we do not find statistically significant differences between self-declared issue voters and voters who stated that other considerations, such as leader competence, are more important to them in 5 of the 6 contexts we study. The only exception emerges in the case of the UK (EP), where we find that exposure to VAA advice had a weaker effect on self-declared issue voters. Based on the existing literature, we expected the exact opposite and therefore have to reject our H2e.

Robustness Checks
We report a series of robustness checks in section 6 of the Supplementary Material. First, we replicate all models while adjusting for a large set of covariates including, among other things, demographics (age, gender, and education), political attitudes (political interest and left-right self-placement), and vote intention in the upcoming election. Second, we repeat all analyses while dropping users who rushed through the VAAs in super-human speed. Finally, we also estimate non-linear interaction models, which allows us to relax the assumption of linear interaction effects (Hainmueller et al., 2019). The results remain similar, with the most notable exception being that we find evidence for the expectation that undecided voters are more strongly affected by the VAA advice in an additional country after accounting for covariates (Bulgaria), providing additional support to H2d.

Conclusion
VAAs significantly reduce the cost of acquiring information about issue congruence with political parties and candidates. The results of this study suggest that voters from different European countries and in different electoral contexts engage with this information and update their party preferences after VAA usage in line with the advice provided. At 5% to 20% of a standard deviation, the effects we found are modestly sized, though it is worth noting that we are measuring average effects. Notably, the scope for large VAA effects is likely to be more limited among political aficionados, given that their preferences are likely to be already broadly in line with the VAA advice. In keeping with this, we found partial evidence that exposure to VAA advice has significantly stronger effects on users with comparatively low political interest as well as users who are unsure who they should vote for. Furthermore, it is worth noting that small individual-level effects may translate into substantial changes at the aggregate level. In particular, this would apply to highly institutionalized VAAs which, unlike our more research-focused applications, are frequently used by hundreds of thousands or even millions of voters.
Notably, our results differ from most prior experimental studies, which have tended to report null effects. A potential explanation is that most prior experimental studies have estimated the effects of artificially induced VAA usage, which may not generalize to real-world behavior. In this study, we estimated the effects of actual VAA usage and our samples were also much larger, alleviating concerns related to statistical power. However, an important limitation of the timing design we used is that we could only establish short-term effects. Therefore, an alternative explanation for why our results differ from most prior experimental studies is that VAA effects are short-lived. As a result, a key avenue for future research will be to investigate the extent to which VAA effects endure over time. In this context, it is worth noting existing observational evidence from, inter alia, panel studies suggesting that VAA effects can endure over significant time spans (e.g., Heinsohn et al., 2019;Kleinnijenhuis et al., 2019). Given challenges with the establishment of causal effects in observational research, we however identify a need for improved experimental designs which would make it possible to causally identify the medium-and long-term effects of real-world VAA usage.
Finally, another important avenue for future research suggested by our research concerns individual-level causal heterogeneity. First, the differences between users with low and high political interest as well as users who do or do not already have a vote intention did not replicate in all contexts. Second, we found no support for several other common expectations, such as that VAAs impact younger and less educated voters more strongly. Overall, this suggests that individual-level causal heterogeneity remains poorly understood. Future research should therefore theorize in more detail what kind of voters are most likely to be influenced by VAAs. A promising way forward could be to think less about individual characteristics and more about combinations of characteristics. VAAs should be most likely to influence individuals who are both in need of and receptive to information on issue congruence. Accordingly, our ability to capture individual-level causal heterogeneity may profit from a turn to multi-dimensional measures combining indicators of uncertainty, political interest, and political efficacy (cf. van de Pol et al., 2014).

Notes
1. That said, VAA designers often try to balance issues pushed by left-and right-wing parties out of a concern that issues "owned" by a certain party might benefit that party (cf. Walgrave et al., 2009). In that sense, VAA designers have indirectly incorporated insights from issue ownership theory. 2. Of course, power concerns could also be addressed in the context of encouragement designs by collecting larger samples. 3. In two countries (Greece and UK), the VAAs were promoted using paid advertising on Facebook. 4. See section 2 of the Supplementary Material for details. 5. Meanwhile, the VAA congruence score does not have a clear causal interpretation because it is likely correlated with other explanations of party preferences, such as party identification or perceived competence. Note that as a result of randomization, treated and control subjects on average received the exact same issue congruence scores (see section 4 of the Supplementary Material for supporting evidence).