Be there or be square – The impact of participation and performance in the 2017 Dutch TV debates and its coverage on voting behaviour

TV debates are often seen as the most important events that provide the electorate with information about leading candidates and key issues during electoral campaigns. Research provides evidence for various debate effects, showing both a direct and indirect influence on voting decisions. There is, however, only scant evidence on the relative impact of TV debates when examining these effects at the same time. To fill this gap, our study aims to analyse whether and to what extent a candidate’s participation in a debate, their performance in the debate or the related media coverage influence the electorate when examined simultaneously. We consider the case of the 2017 Dutch general elections, which offers an almost ideal setting due to the broadcast of several TV debates of different formats and candidate compositions throughout the campaign period. To distinguish the effects of single debates, we use original Dutch panel survey data. We find a weak overall influence of the debates; the most significant effects are decreasing vote intentions for the two main competitors (VVD and PVV) after both candidates refused to participate in the first TV debate, and a ‘winner-effect’ for one of the two main candidates in a head-to-head debate.


Introduction
TV debates are an integral part of election campaigns in many countries. They not only have the highest coverage compared to other televised campaign events, but are often seen as the most important events in any election campaign. TV debates provide information about the leading candidates and important issues that inform the electorate's voting decision, they are also easily accessible within a relatively short period of time (Wiegand and Wagner, 2016;Maier et al., 2014). Televised election debates can thus be considered the "focal point" for election campaigns (Carlin, 1992, p. 263;McKinney and Carlin, 2004).
TV debates may exert a direct influence on the electorate when people actually watch a debate, but they also have indirect influence insofar as citizens read or hear about them afterwards (e.g. Scheufele et al., 2005). Furthermore, the direct watching effect can be distinguished into (i) the mere (non-)presence of a candidate in a debate, and (ii) their related performance during the discussion with political opponents. So far, most studies have analysed the various debate effects in isolation, without examining their simultaneous impact on citizens' behaviour. The main goal of this study is thus to examine the relative influence of debate effects by answering our overall research question: To what extent is voting behaviour influenced by candidate participation and performance in TV debates, and by the subsequent coverage of those debates in the media ? We analyse the influence of these different debate components using the case of the Netherlands. The comparatively high number of TV debates, along with their multi-format system in the run-up to the 2017 Dutch general elections (i.e. several pre-election debates with different candidate compositions), enables a fine-grained analysis of the different aspects of a TV debate. Importantly, the fact that the main contenders refused to participate in some debates allows us to study the influence of a candidate's refusal to debate on citizens' voting behaviour. The resulting differences in the number and composition of debate contenders, and the related and more general format differences, as well as the timing of the debates in the campaign, allows for a more general analysis of whether the various debates differ in their overall influence on voting behaviour. For the main analysis we draw upon original panel survey data matching the different timings of the debates. We complement survey data with a content analysis of newspaper coverage about the debates, including evaluations of the candidates.
Our study contributes to the literature in several ways. It is one of the first analyses to capture the various debate effects simultaneously. Second, due to the multiparty system of the Netherlands where many parties receive similar vote shares, the study enlarges the existing literature by validating previous findings from bipartisan systems, like the US, or multiparty systems with two major parties like Germany. Finally, the panel logic of our data with repeated measurements of voting behaviourintended party choice before the election and actual party choice thereafterenables us to examine at what point in the campaign period TV debates are most influential, if indeed they are at all.

TV debates as campaign events
After the question of whether campaigns matter has been answered positively, research has moved to tackle the questions of which campaign events matter, for whom and in which contexts (cf. Hillygus and Jackman, 2003). Whereas in the 1970s and 80s, several studies attribute only limited influence to TV debates on voting behaviour (Blais and Boyer, 1996), more recent research has revised this opinion. Today, televised election debates are seen as key components of (presidential) election campaigns, since debates provide the audience with information about the candidates and their positions, as well as being able to reach a broad viewership (Benoit and Hansen, 2004). The media, and television in particular, are often the main source of information for voters (Aalberg and Jenssen, 2007). Thus, TV debates possess an educational function by helping citizens to make a more considered or potentially better-informed voting decision (Holbert et al., 2002;Benoit and Hansen, 2004).
Debate viewers can evaluate candidates more or less simultaneously, since they see different candidates elaborate on the same topics and issues. This enables the viewer to immediately compare opinion statements, and much more easily compared to other campaign events that focus on one party or candidate only (Benoit and Hansen, 2004). While it can be beneficial for voters to be exposed to candidates' direct responses to their opponents when it comes to forming a voting intention, other authors argue that such counter-arguing makes TV debates less influential, since effects are equalled out (Hillygus and Jackman, 2003).
Subsequent to the first ever televised debate in the United States between Nixon and Kennedy in 1960, research on TV debates and their voter impact has predominantly been conducted in the United States (Klyukovski and Benoit, 2006;Pattie and Johnston, 2011). Following the introduction of TV debates and their growing importance in other countries, today, a growing body of research also examines TV debates outside the US (e.g. Blais et al., 2003;Maier et al., 2014;Scheufele et al., 2005;Van der Meer et al., 2016). These studies examine various aspects of TV debates that may matter for the voting-eligible citizen. Broadly speaking, one can distinguish between direct debate effectsthat is, effects that occur because a citizen watches a debate and afterwards changes her voting intention/decisionand indirect effects through intermediaries, most prominently news coverage of the debates, usually after the debates have taken place (Blais and Boyer, 1996).

Participation and performance in a TV debate (direct effect)
Many US American studies conclude that watching debates can influence vote preferences, but in the form of strengthening a given vote preference instead of changing existing vote intentions or candidate evaluations (e.g. Sigelman and Sigelman, 1984;Katz and Feldman, 1962). In contrast to this long-prevailing view of the rather limited effects of TV debates, a growing number of studies find evidence for a considerable impact of televised campaign events on voters (e.g. Pattie and Johnston, 2011). For instance, research on primary debates in the United Statesmore comparable to the Dutch context considering the presence of multiple candidates with smaller issue differencesshows that debate exposure not only leads to voters learning about the candidates' policies, but also influences candidate evaluations (Yawn et al., 1998). More importantly, primary debates are suggested to influence vote preferences, as well as confidence in vote choices after watching the debate . A typical aspect of TV debates in the US, with its bipartisan system and respective presidential candidates, is the strong focus on a debate winner and a debate loser, possibly due to the easier identification of each with only two major candidates debating (Anstead, 2016).
Research from Germany shows that TV debate participation can be beneficial for every participant. Instead of a zero-sum game, as in US presidential elections, candidates can leave a positive impression irrespective of whether they are the overall winner of the debate (e.g. Bachl, 2013;Maier and Faas, 2003;Maier et al., 2014;Maurer and Reinemann, 2003). Maier and Faas (2011b) conclude that TV debates in Germany are more persuasive than debates in the United States. While the strongest influences can be identified for undecided voters or voters without party attachment, "significant parts of the electorate changed their opinions about the chancellor candidates and even revised their voting decisions" (Maier and Faas, 2011b, p. 77), also among voters with previous party identification. Experimental and panel research finds that up to one third of debate watchers change their vote preferences in response to the debate (e.g. Maier, 2007;Maier and Faas, 2011a;Hofrichter, 2004;Maier et al., 2014).
The direct effect of watching a debate comprises at least two aspects. First, the most obvious fact for a viewer is which candidates participate in the debate. This is far from trivial, since in the European context not all (invited) politicians take part in every debate. For most parties and candidatesparticularly the most important onesparticipation in a debate may be expected (unless not invited). Hence, especially declining a debate invitation may have severe consequences. As Blais et al. (2003) argue, a refusal can be seen as avoiding one of the "most democratic exercise(s)" (p. 49) in a campaign. Thus, a first crucial factor for citizens' voting decisions is a candidate's presence in a debate, and even more so that the candidate has not refused to participate.
A second (related) aspect is the performance of the candidates during the debate. Scholars such as Maier et al. (2014) argue that it is not the sheer exposure to the debate, but the performance of the candidates and the viewers' evaluation of it, which influence the voting decision. For instance, in the British 2010 election context, Pattie and Johnston (2011) find debate performance influences citizens' feelings and attitudes towards party leaders as well as towards the parties themselves, even when controlling for partisanship and pre-election vote intentions. Despite the strong impact of pre-existing voting preferences, the authors further show that debate performance also has the potential to change voting intentions. Relatedly, based on each candidate's performance, debate viewers may also identify a winner or loser of the debateand this is easier in head-to-head debates that include only two candidates. Studies have shown that the perception a candidate is a debate winner can directly increase the probability of voting for that winner by up to 30 to 40 percentage points (Maier and Faas, 2011a;Maier et al., 2014).

News coverage of the debates (indirect effect)
According to contagion theories, individuals participate in various communication networks (Van der Meer et al., 2016). As part of these networks, citizens experience election campaigns through interpersonal conversations as well as discussions in different media channels (Blais and Boyer, 1996;Coleman, 2000;Monge and Contractor, 2003;Van der Meer et al., 2016). The debate's impact can therefore be indirect, as individuals are influenced by what they hear in the media or from friends and acquaintances about it, and irrespective of whether they watched the debate (Blais and Boyer, 1996). Drew and Weaver (2006) point out that besides attention to televised debates, attention to the news is an important correlate of voters learning about candidates' issue positions and voter interest in the campaign. For the German case, several scholars show that debate effects strongly depend on contextual factors, among others on the follow-up media coverage (Maier, 2007;Maier et al., 2014;Wiegand and Wagner, 2016). Maier et al. (2014) argue that besides the immediate direct perceptions of the candidates, the media interpretation, and first and foremost who is presented as the winner of the debate, can influence the citizens in the formation of their voting decision (see also Fridkin et al., 2008;Tsfati, 2003).
Which candidate counts as 'the debate winner' is determined partly by the viewers' first-hand perceptions while watching the debate, and partly by the subsequent media coverage (Lanoue, 1992;Lang and Lang, 1976;Tsfati, 2003). As shown in the US case, the interpretation of who is the winner or who is the loser may vary strongly between a viewer's subjective impression while watching the debate and the media interpretation afterwards. This is even more relevant when assuming that the most evaluative statements of the press are published in the days after the debates are broadcast (Reinemann and Wilke, 2007). For instance, after the two TV debates in the 2002 German elections, both which took place on a Sunday, the subsequent Monday and Tuesday newspapers contained one third of all evaluative statements about the candidates, Schr€ oder and Stoiber, published in the last four weeks before the election (Wilke and Reinemann, 2003).
Comparing direct and indirect debate effects, Scheufele et al. (2005) find for Germany that both the debates themselves as well as the follow-up media coverage produce immediate effects on candidate impressions and evaluations as to each debate's winner. Further, Blais and Boyer (1996) find both direct and indirect effects of televised debates on voting in the Canadian 1988 election, as voters who did not watch the debate were still influenced in their vote choice by subsequent media coverage. Apart from these previous findings, research on the comparison and relative impact of direct debate effects and indirect effects of the follow-up media coverage is still relatively scarce.
Generally, research on the effects of televised debates is particularly context specificthat is, an application of e.g. US findings to most European parliamentary democracies is problematic. As Anstead (2016, p. 520) shows, televised debates "reflect the institutional logic" of a political system, meaning that environmental factors shape the development of TV debates in each respective country. The Dutch context provides an interesting angle from which to re-examine the effects of televised electoral debates and related media coverage, especially considering the rather neutral media coverage found for previous debates (Walter and van Praag, 2014b).

The case of the 2017 Dutch National Elections
Dutch TV electoral debates were established in 1963, just after the 1960 Kennedy-Nixon debate in the US, and are now an essential part of Dutch electoral campaigns. Compared to, for instance, US presidential elections, Dutch election campaigns are relatively short with a duration of around two months ( Van der Meer et al., 2016). Notwithstanding the rather short campaign period, several debatesat least two in totalare organized by the public broadcaster and since 1989 also by a commercial broadcaster, so broadcasting is shared between public and private channels van Praag, 2014a, 2014b). The debates include different formats: head-to-head duels between the two main contenders, but also larger discussions comprising candidates from all relevant parties. These can include up to 13 candidates, a fact that reflects the fragmentation of the Dutch multiparty system with its many small and medium-sized parties (e.g. Pellikaan et al., 2018).
In the run-up to the 2017 National Elections (15 March), four live broadcasted TV debates took place, of which we are able to analyse three. Similar to the earlier debates of 1971 and 2012, the top candidates refused to participate in some debates van Praag, 2014a, 2014b). This specific 2017 setting thus allows the rare opportunity for us to study 'refusal' effects. Although a rather rare phenomenon, an internationally prominent recent example of refusal is the case of Theresa May, who did not take part in the 2017 pre-election head-to-head debates broadcast by BBC.
In line with the tradition of various debate formats, and due to the aforementioned refusals, the four debates in the Netherlands varied in candidate compositions and with respect to viewing figures, format and distance to the election day. Table 1 presents an overview that includes relevant characteristics for each debate. In the first RTL4 debate, the leading candidates of the two biggest parties, Mark Rutte (VVD) and Geert Wilders (PVV), had been invited but refused to attend. They did so since RTLagainst an earlier agreement including the biggest four partiesat short notice decided to invite an additional fifth candidate from GroenLinks (GL) because of the party's likelihood of being involved in the multiparty government formation. After the refusals of the VVD and PVV, the debate took place with five smaller parties only. In the second, so-called Carr� e Debat, Geert Wilders (PVV) again refused to participate because of an interview published with Wilders' brother that portrayed him negatively. Since we do not have data about exposure to this debate, we had to exclude it from our analysis. The third EenVandaag debate was the first (and exclusive) encounter of the two main contenders, Rutte and Wilders, and two days before the election it "brought some life into the previously dull campaign" due to a previous diplomatic conflict with Turkey that influenced the debate (Holsteyn, 2018(Holsteyn, , p. 1367). Besides the more common roundtable and head-to-head formats in the first three debates, the final NOS Slotdebat had a different format. In one and a half hours, the leading candidates of the eight (generally-expected-to-be) biggest parties discussed various topics in several head-to-head rounds, with two different candidates each (with five smaller parties having had a pre-debate discussion among themselves).
Apart from the specific setting of the 2017 debates, the Dutch context is characterised by other, more general characteristics. Unlike in the US presidential system, in parliamentary democracies like the Netherlands the party positions are still at the forefront of the electoral campaigns, since "the premiership falls to whoever leads the party commanding a majority in the elected assembly" (Coleman, 2000, p. 8). Factors like party platforms, political issues and the party's ability to "stand out" against the others represented in the debate may be more important than the personality and charisma of the contestants, as is the case in more candidate-oriented bipartisan system like the United States (Aalberg and Jenssen, 2007, p. 116). Typical of Dutch debates is the important role of the moderating journalistand little to no possibility for the public to ask questionsand the limited rules during debates, which results in more actual 'debate' between the candidates compared to other countries van Praag, 2014a, 2014b).

Hypotheses
The variation in candidate composition in the debates allows us to test the direct effect of candidates' participation in TV debates. We particularly expect that an active refusal of parties/candidates to participate has a negative influence on citizens voting for that respective candidate/party. More generally, though, being absent from a debate, e. g. smaller parties not being present in the head-to-head meeting of the two main competitors, VVD and PVV, may also lead to negative effects for smaller parties since the attention is drawn away from them. Our first hypothesis is thus more generally formulated and states:

H1. Citizens vote less likely for parties/candidates who are absent from a TV debate.
Secondly, watching the debate and being exposed to the performance of the participating candidates may affect voting behaviour. This most likely happens through evaluations by the debate viewers of who was 'good' or 'bad', with better candidate performances and being considered the winner of the debate expected to result in the greater likelihood of voting for that respective party: H2. Citizens vote more likely for parties/candidates who perform well in a TV debate.
Thirdly, we expect indirect effects of the news coverage of the debates. Independent of watching a debate, subsequent exposure to debate coverage in the media may alter people's voting behaviour. For instance, positive media coverage of a candidate's performance, e.g. presenting them as the winner of the debate, should result in more votes for that candidate's party. In turn, more negative media coverage about a candidate's performance should result in lower voting rates for their party. Furthermore, as the media also extensively covered the meta-debate about the withdrawal of the top candidates, Rutte and Wilders, (Holsteyn, 2018), negative effects may also stem from this more general media coverage without a direct link to a particular candidate's performance in a debate. We thus formulate a general hypothesis without specifying the exact content feature of the news coverage:

H3. Exposure to positive TV debate news coverage about a candidate increases citizens' likelihood to vote for this candidate's party, while exposure to negative coverage about a candidate decreases the respective voting likelihood.
Testing all three hypotheses simultaneously enables us to examine to what extent candidates' participation in, performance in and related media coverage of the TV debate influence voting behaviour. Further, we may find these effects for only one of the debates because of their different formats, but also due to their timing. Whereas literature in the US and British context suggests that the first debate of a series is most influential due to its relative novelty (Holbrook, 1999;Benoit et al., 2003;Pattie and Johnston, 2011), this might be different for the multi-format Dutch case with the first debate taking place without the two most prominent candidates, Rutte and Wilders. A German study by Klein (2005) regarding the 2002 elections shows a stronger influence on voting (intention) by the second debate than the first debate, potentially because the second debate was closer to the election itself. A similar effect is possible for the Netherlands, as the debates closer to election day included the two top candidates. Both the one-on-one debate of Rutte and Wilders and the final Slotdebat can be seen as American-style debates. Such head-to-head formats may lead to stronger effects following the more confrontational character of the debate. Additionally, the fewer number of parties in a debate may display differences between the parties more clearly. A higher number of parties and related higher number of topics, in contrast, may be less helpful for the electorate to observe party differences and to form clear voting preferences (Van der Meer, 2017). Given the conflicting expectations due to timing and format/candidate composition of the debates, we formulate a research question:

Data and methods
We rely on a four-wave panel survey from the Netherlands, including three pre-and one post-election wave (Van Praag and de Vreese, 2017).
The first wave was collected in October 2016 (12th-20th October), the second at the beginning of 2017 (27th January -6th February), the third pre-election wave in March (2nd-6th March) and the final post-election wave (17th-21st March) just after the 2017 Dutch National Elections which took place on March 15th. The original sample was drawn from the Kantar database, which consists of 159,000 respondents who were recruited through multiple recruitment strategies. Light quotas (on age, gender, education, household size, region and party voting) were enforced in sampling from the database. The subsequent survey was conducted using Computer Assisted Web Interviewing (CAWI). Of the initial 2144 respondents in the first wave (AAPOR RR1 of 0.66), 1351 have completed all four waves (retention rate of 0.63) and serve as our final sample of respondents. The panel structure allows us to examine variations in voting behaviour over the whole campaign period, relying on voting intentions in the pre-election waves and the reported final party choice in the post-election wave. Importantly, the survey includes questions about three of the four broadcasted TV debates (excluding the Carr� e Debat) in the third and fourth survey waves, and also about exposure to its related media coverage. We thus exclude the first wave for most of our analyses.
Furthermore, to get an idea of the actual media coverage of the TV debates we conducted a manual content analysis of newspaper articles. In contrast to social media, it was the "traditional mass media that gave the campaign its sounds and dynamics" (Holsteyn, 2018(Holsteyn, , p. 1366. Moreover, a recent study by Peterson (2019) finds newspapers to (still) significantly contribute to political awareness, even for those with little political interest. Thus, and following the examples of Scheufele et al. (2005) and Walter and van Praag (2014b), we focus on newspaper coverage, including seven major Dutch newspapers 1 (Algemeen Dagblad, NRC Handelsblad, Financieel Dagblad, De Telegraaf, Trouw, de Volkskrant, Metro) and covering two periods around the TV debates (12th February -2nd March and 10th-15th March). 2 The content analysis is restricted to news items about any of the TV debates retrieved through LexisNexis. 3 The final content data consists of 187 (relevant) news items, with the whole article forming the unit of analysis, and includes variables measuring candidate evaluations and the winner/loser of the debate (see Table 2 in the Appendix for inter-coder reliability). Fig. 1 presents an overview of our data structure:

Operationalization
The dependent variable for our analyses is party choice, which is repeatedly measured by asking which party respondents intend to vote for in the upcoming National Elections, and in the post-wave represents the reported party choice (for all question wordings see Table 3 in the Appendix). We include only voters and due to a sufficiently large N can distinguish the seven biggest Dutch parties (VVD, PvdA, SP, CDA, GroenLinks, D66 and PVV) and a summary category for all other smaller parties chosen by very few respondents each. Further, as a robustness check we use a second, related operationalization capturing the change between two waves from one party or from intended abstention to any of the just mentioned eight party choices.
Our key independent variables concern the TV debates. To measure debate exposure, respondents were asked if they have seen each of the three debates analysed here (RTL4, EenVandaag and NOS Slotdebat). We have dichotomized the original answer options into yes (also includes only partly seen) and no (nothing at all). As a measure of candidates' performance, respondents were asked about whom they perceived as the winner of the debate. The survey question offered the names of all participating candidates as answer options and we recoded all nonwatchers as the reference category. This question, though, was only asked for the last two debates (EenVandaag and NOS Slotdebat). 4 For two of the debates (RTL4 and EenVandaag) the survey further asked about exposure to subsequent media coverage of the debates, which again is measured as a dummy variable (yes ¼ 1). No such question exists for the NOS Slotdebat since this debate took place on the evening before the elections. As control variables we include gender (female), age (continuous) and education (recoded into low, middle and high).

Method
As an analytical tool we use multinomial logistic regression models. Our main model specification uses party choice as DV while controlling for the intended party choice in the previous wave, 5 i.e. a lagged dependent variable model. We calculate separate models for waves 2 & 3 and waves 3 & 4, though only including respondents having completed all four waves. A classical panel model is not feasible, as we do not have the same variables repeatedly present, i.e. we have different types of TV debates in between the different waves. For an easier interpretation of the regression results and respective debate effects, we rely on graphical presentations of average marginal effect plots, which present changes in predicted probabilities to vote for a given party.
As robustness checks, we run a second model specification considering the switch to a party as DV. This second specification excludes all respondents with a stable vote intention between waves and is thus more restrictive for finding debate effects. Further, we test whether debate effects may be stronger for undecided voters by running additional models excluding respondents with (very) certain vote intentions/ decisions.

Descriptives
To find debate effects, two conditions must be fulfilled. First, a sufficiently large number of citizens have to watch the TV debates and/or expose themselves to the related media coverage. Second, there must be some variation in reported party voting over time. Fig. 2 provides evidence of the first condition, since between 25% and almost 60% of survey respondents reported being exposed to at least one debate under study or any of the related media coverage. Exposure to the EenVandaag debate (47%) and corresponding media coverage (57%) was the highest among the three debates under study.
As context information regarding media coverage, our content analysis revealed that Mark Rutte (56.6%) and Geert Wilders (58.8%) were mentioned most often (see Table 4 in the Appendix). In line with previous findings showing the neutrality of Dutch news coverage of TV debates in the 2012 campaign (Walter and van Praag, 2014b), relatively few news articles evaluated the candidates in relation to the 2017 debates. 6 The most frequent evaluations, in a negative sense, occur for Rutte (7% of articles) and Wilders (11%). With regards to the first, RTL4 Debate (n ¼ 127), the percentages of negative evaluations for Rutte (9% of the articles) and Wilders (13%) are slightly higher, mostly due to their refusal to participate. Furthermore, none of the candidates was presented as a clear winner of a debate. It thus remains to be seen if the rather neutral media coverage allows for the expected effect of reported debate coverage exposure in our third hypothesis. 7 Fig. 3 provides evidence of the second condition, the variation of party voting over the four waves. Several aggregated party vote preferences differ strongly between waves, e.g. the significant drop of the PVV between the second and fourth (post-election) wave of over five percentage points and equivalent increase for the VVD, which is a lot given the comparatively low absolute vote shares in the fragmented Dutch multiparty system. The variations are less strong for other parties, but party voting is generally rather volatile over time and only for some few parties more or less stable, e.g. the PvdA.
However, we also want to mention that the volatility in voting behaviour is relatively similar between all waves (proportion of respondents who changed their preference between w1-2: 26.6%, w2-3: 27.2% and w3-4: 24.3%), which indicates that the TV debates may not have resulted in increasing preference changes. Although these numbers should be interpreted with caution given the (very) different time intervals between the waves, it is interesting that least change occurred just before the elections and more between waves 2 and 3 (twotailed t-test: p ¼ 0.070), i.e. around one month before the elections. We further tested whether volatility is higher among respondents who watched a debate or who were exposed to related media coverage (only waves 2 to 4). Although we find a higher volatility among debate watchers, the differences with non-watchers do not reach statistical significance. Being exposed to media coverage or not does not result in significant differences of volatility, either. Although these descriptive results suggest that overall the TV debates may not have played a decisive role on voting, there may be still significant effects from specific debate aspects and for certain parties, which we examine in more detail in the following. 8

Regression models
We now turn to the results of our regression models and start with the 4 Unfortunately, no other common measure of performance has been included in the survey, such as "how well did candidate A perform?". 5 This is particularly important for our "winner" variable in the sense that citizens might perceive a candidate as the winner, whom they wanted to vote for already before the debate. While controlling for party vote at t-1, we exclude or at least reduce this problem. 6 In fact, and despite our more restrictive content coding, i.e. we coded overall evaluations per article and did not count all separate evaluations within a given article, the 2017 media coverage still included more evaluations than in the previous 2012 campaign (with an almost similar N ¼ 187 compared to N ¼ 172 in Walter and van Praag, 2014b for the 2012 campaign). 7 The mostly descriptive character of the articles and related too little (variation in) evaluations did also prevent a linkage between the content analysis and survey data. Such an approach would have allowed for a more detailed testing of H3. 8 See also the methodological criticism of these simple group comparisons by Blais and Boyer (1996, p. 161-162), who argue that regression models are superior because they allow for the possibility of non-watchers to be (indirectly) affected by the debates.
effects of the first RTL4 debate that took place between the second and third wave. Fig. 4 displays average marginal effects (changes in predicted probabilities including 95% CI) to vote for any of the seven major parties or another smaller party while excluding intended non-voters (regression results in Table 5 in the Appendix). As can be seen in the left graph, debate watching did not significantly increase or decrease voting for any party except a decrease for "other" parties not participating in the debate. The right graph shows the effect of the related news exposure, where a significant negative effect (p ¼ 0.038) for the VVD can be observed. This is in line with our expected "punishment" by citizens as Mark Rutte (VVD) refused to participate in this debate. However, we do not find a similar effect for Geert Wilders (PVV), who also refused. We subsequently repeated the analysis for wave 4. We ran models with and without the inclusion of the RTL4 debate to spot potential longterm effects of this earlier debate. As the inclusion of this debate does not affect the results, we present the results of the full model including all debates under study and related coverage. Fig. 5 displays the now five marginal effect graphs (regression results in Table 8 in the Appendix). Considering long-term effects from the RTL4 debate, Figures a) and b) show mostly negative effects for the two absent parties, stemming from   A.C. Goldberg and C. Ischen watching the debate for the PVV and from related media exposure for the VVD. However, both effects fail to reach the 0.05 significance level. For the later debates and media exposure, the marginal effects in c), d) and e) do not show significant effects either. An exception is the significant effect for the PvdA due to news coverage related to the Een-Vandaag debate (d), although this party did not even participate in this debate, since it was the Rutte-Wilders duel. In our content analysis of newspaper coverage, we find three cases that evaluate PvdA-leader Asscher (two balanced and one positive evaluation) in articles regarding this debate. These mostly discuss a potentially strong electoral performance of Asscher due to his win of party leadership earlier that year (what the media called an 'Asscher effect'). Although thus not directly related to the specific debate, the generally positive portrayal of Asscher might explain the positive media effect.
For the fourth, post-electoral wave, we could additionally measure the impact of performance evaluations. For this we ran models replacing the dichotomous debate watching variable by the perceived winner of the debate variable (see Blais and Boyer, 1996, for a similar approach). Fig. 6 presents the effects of perceiving a certain candidate as the winner compared to non-watchers of the debate (regression results in Table 9 in the Appendix). In the top graph, we can observe a significant 'winner-effect' for Wilders (p ¼ 0.024) in the head-to-head debate represented by a positive marginal effect for Wilders' PVV. Important to remember is that the underlying models control for vote intention in the previous wave, i.e. the effect is no artefact of PVV supporters seeing Wilders as winner and voting for him. In the bottom graph, though, we do not see any significant positive effect from the final debate including all major candidates. In this debate setting, we can only spot some negative effects in the sense that perceiving a certain candidate as the winner prevents the vote for certain parties. This is most obvious for perceiving Wilders as the winner, which lowers the likelihood to vote for the PvdA, CDA and D66.

Robustness checks
As a first robustness check, we tested the same models using vote switching as our dependent variable. Looking at the respective models for wave 3 (see Fig. 7 and Table 6 in the Appendix, note the change in xaxis scale) and wave 4 (see Fig. 8 and Table 10 in the Appendix) mostly confirms the findings. Two important differences are, first, the just not significant effect (p ¼ 0.060) of media exposure about the RTL4 debate on VVD in Fig. 7. Second, in Fig. 8 we can see a significantly positive effect for the VVD (p ¼ 0.033) stemming from media exposure about the Rutte-Wilders duel (b). Despite the mostly neutral media coverage in our content analysis, Rutte was mentioned in most of the articles without often being evaluated negatively. The sole (non-negative) mentioning might be already sufficient for a positive influence on readers' candidate perceptions.
Due to the exclusion of stable (non-)voters in the vote switching models and the resulting small N (262), a replication of the models including the perceived winner effect (Fig. 6) is not feasible as the number of respondents per winner-vote switch combination is too small. Relatedly, the detection of significant effects for single parties is generally demanding given the comparatively large number of parties. As a further robustness check we thus tested the effects when merging parties, i.e., combining all five smaller parties that participated in the RTL4 debate vs. the two parties that did not (VVD & PVV). These simple binary logit models (see Table 7 in the Appendix) reveal that being exposed to debate coverage indeed increases the probability to switch to a party participating in the debate by almost 16 percentage points (p ¼ 0.007) compared to switching to the two absent parties.
As a final robustness check, we examined whether we find stronger effects when focusing on respondents less certain about their voting decision. So, we repeated the analyses excluding all respondents with very certain vote intentions/decisions. 9 Results remain very stable for this subgroup; that is, for this most likely affected group as well, we do not find more systematic and significant debate effects.
In sum, the TV debates under study and the related media coverage did not exert strong and systematic influence on voting behaviour. Our hypotheses are therefore only partly confirmed. Whereas we expected that voters would tend to vote less for parties absent from a debate (H1), the results do not show such a direct effect from watching the debate. We could, however, partly see this 'punishment' in relation to media coverage on the absences from the first RTL4 debate. This result is further strengthened by an analysis that uses merged parties as the DVbeing present or not being present in this debatewhich reveals a significant media coverage effect (H3). Regarding our second hypothesis, that voters prefer parties whose candidates perform well in a debate (H2), we find a positive effect on PVV voting by perceiving Wilders as the winner of the head-to-head debate. The fact that we find this winner effect for only one specific debate format, and not for the other format, points to a potentially varying influence of the different TV debates during the campaign. However, as a preliminary answer regarding the differing importance of the timing and formats of the debates (RQ1), the overall rather few significant effects do not clearly indicate one specific debate (format) that stood out as being the most relevant one. The . We excluded voters with the two highest certainty scores in wave 3 (47%) and the single highest score in wave 4 (38%). Regression results are available upon request.

A.C. Goldberg and C. Ischen
Electoral Studies 66 (2020) 102171 8 following discussion will provide some possible interpretations for our mixed results.

Discussion
The study set out to examine the simultaneous impact of the various potential effects that stem from TV debates in an election campaign. Our aim was to disentangle direct and indirect effects in order to answer the question of to what extent the participation of a candidate or party in a debate, their related performance during the debate and the related media coverage matter for the electoral decisions of the voting-eligible population. We tested these effects in the 2017 Dutch general elections context, which due to various debates in different formats at different points during the campaign represented a well-suited case. The analysis relied on panel survey data with accompanying manual content analysis data. The latter revealed a mostly very neutral media coverage of the debates and candidates, a common finding in the Dutch context (e. g. Walter and van Praag, 2014b).
Overall, we find few significant effects of any of the three effects under study. One significant effect results from the withdrawal of the two top candidates, Rutte and Wilders, from participating in the first RTL4 debate. People who were exposed to media coverage about this debate tend to vote less for these two parties. This effect especially holds when not distinguishing between these two parties, but considering both as one category of absent parties compared to a second category of parties present in the debate. In this case, being exposed to media coverage significantly increases the chances of switching to one of the participating parties instead of switching to the two absentees VVD and PVV.
A second significant finding is the 'winner-effect' for PVV voting by rating Wilders (PVV) as the winner of the head-to-head duel with Rutte (VVD). This result is in line with our expectation of well-perceived debate performances resulting in a higher voting likelihood for the respective candidate. As we find no similar 'winner-effect' for the debate including a larger number of candidates, the found effect in the head-tohead duel may be due to the easier identification of a clear winner when there are fewer candidates to evaluate. Hence, the general lack of other debate watching effects may be due to the complexity of TV debates in the Dutch multiparty system (see also Van der Meer et al., 2016). The higher number of candidates in the other debates and the partly complex formats, e.g. various head-to-head duels sorted by issues in the final debate, may hinder clearer performance evaluations and subsequent voting effects. Alternatively, the hyper-competitive environment may produce multiple effects that counter-balance each other. The winner-effect we found in the head-to-head debate, by contrast, results from the format that matches the ones in other countries such as the US or Germany, for which studies repeatedly have reported significant TV debate effects.
Coming back to our formulated research question (RQ1) about the higher relevance of a certain debate timing and composition, we can only provide a tentative answer based on the 2017 Dutch case. At first glance, it seems like the very final debate that took place one day before the election had less impact than the earlier debates. The comparatively strongest effects stem from the first RTL4 debate, especially the significant effect on vote switching to a present party more generally. The result that the first debate is the most influential is in line with other A.C. Goldberg and C. Ischen studies examining the British and US context (e.g. Holbrook, 1999). On closer inspection, though, it may be that the timing and partly also the format was less influential in the 2017 Dutch case. Rather, most of the few significant findings appeared in relation to the two main competitors, Rutte (VVD) and Wilders (PVV). Whenever they played a central role for a debate, by either refusing to participate or by being the only candidates in a head-to-head debate, we found significant effects stemming from the respective debates. This suggests that even though the Netherlands represents a multiparty system, when it comes to an event like a TV debate, the most interesting or the cognitively easiest thing to process for a voter is still the behaviour and competition between the two main candidates.
Overall, the identification of rather few TV debate effects matches the character of a rather "dull campaign" for the 2017 Dutch elections (Holsteyn, 2018(Holsteyn, , p. 1367 and is also in line with studies showing rather few TV debate effects in the previous 2012 campaign (Walter and van Praag, 2014b). Still, the few significant findings may be also due to the limitations of our study. First of all, we could measure the impact of TV debates for only three of the four debates during the campaign. The yet other composition of the second so called Carr� e debate may have had more or less influence than the three debates we studied, particularly given the repeated refusal of one of the two main competitors to debate. A second limitation is the lack of some (important) used indicators for specific debates. Unfortunately, the winner/loser perception was only available for the last two debates and not for the first one that excluded the two top candidates. Also, the immediate perception of a winner after watching a debate may be changed by post-debate coverage during the days until the survey started (but see the conceptual inclusion of both these aspects in Lang andLang, 1976 andTsfati, 2003, and the even opposite argument by Walter and van Praag, 2014b, that the media identifies a winner based on changes in pre-election polls). In that context, the advantage of a panel survey enabling the study of multiple debateswhich are standard in many countries nowadaysamong a large sample comes at the price of more detailed measures for specific debate effects relying on other methods such as experiments. A third limitation is the missing information about the actual content of each person's self-reported debate exposure in the media. Our media content analysis solely focused on newspapers and suggests a rather neutral coverage, in line with findings from other campaigns in the Netherlands and across Europe (e.g. de Vreese et al., 2006;Walter and van Praag, 2014b). Whereas we interpret that data as a proxy for the overall media coverage, other media channels such as television or online coverage may be more relevant, particularly as this news reaches the citizen much faster after the debate took place, i.e. a discussion on TV subsequent to the debate itself or even live commenting on online media.
Future research should repeat our analysis in countries with similar party settings and also similar debate formats to confirm or contradict our findings. In particular, the importance of head-to-head debates in a multiparty system as found in our study would be an interesting area for future research. Ideally, one would actually link survey data with media content data to delve deeper into this indirect effect stemming from TV debates. Given the low and neutral coverage of the debates, which is common in the Netherlands, such a link was not possible in our study. Particularly the found 'punishment' effect following candidates' refusal needs more research to reach more general conclusions, especially in light of other prominent 'refusals' such as Theresa May in GB in 2017. Despite the mentioned shortcomings and ideas for future research, we still think that our results add important insights to the literature, in particular as they represent a divergent case with many parties 'debating' for votes and (mostly) not only the two major parties.

Acknowledgement
We wish to thank Katjana Gattermann and Lukas Otto for their help and feedback while preparing the manuscript. For providing us with the data, we thank Claes de Vreese. We would further like to thank the reviewers and the editors for their thoughtful comments and efforts towards improving our manuscript. Part of the research time used for this article was funded by a Grant from the European Research Council (ERC), Grant No. 647316.      Standard errors in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; reference category is "other party". Standard errors in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; reference category is "other party" Standard errors in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; reference category is vote switch to VVD or PVV.  Standard errors in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; reference category is "other party".  Standard errors in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; reference category is "other party"; respondents having perceived SP candidate Emile Roemer as winner were excluded as too few observations. Standard errors in parentheses; *p < 0.05, **p < 0.01, ***p < 0.001; reference category is "other party".