Adding Fuel to the Flames? Politicisation of EU Policy Evaluation in National Parliaments

Under which conditions does national parliamentary discourse politicise European Union (EU) policy evaluation? In times of multiple crises and uncertainty, the alleged ‘democratic deficitʼ of the EU, defined as an apparent lack of legitimacy, has regained scholarly and popular attention. The European Commission and academic commentators consider policy evaluation a specialised and targeted tool to improve the ‘output legitimacyʼ of the EU by assessing policy effectiveness and efficiency. While evaluation can strengthen output legitimacy directly via learning, evaluation can be particularly effective when it becomes part of the national communicative discourse on the EU. This discourse is most likely to take place in national parliaments, as they are the forums in which the government can be held to account. This paper relies on an automated content analysis of the share of keywords related to EU policy evaluation in debates in six national parliaments, covering a period of 20 years. The findings show that the combination of popular and party Euroscepticism is crucial in determining parliamentary debate on EU policy evaluation. Pro-European parties generally do not refer to policy evaluation. However, if political parties are critical of European integration, EU policy evaluation is mentioned more frequently. Under these conditions, members of parliaments also refer more frequently to EU policy evaluation as the public becomes more Eurosceptic. These findings suggest that EU policy evaluation is used as a tool for domestic political contestation, with potential negative normative implications for the output legitimacy of the EU and for evaluation as a tool for learning.


Introduction
In recent decades, the European Union has been affected by several crises relating to highly salient policy areas, such as the common currency. These crises have had a profound impact on democratic outcomes and economic performance in many member states, and thus the perceived problem-solving capacity of the EU. In this context and against the background of long-standing debates on the alleged democratic deficit, calls to improve the 'output legitimacy' of the EU have been voiced. The emphasis here lies on what the EU delivers to its citizens (Ward 2010, p. 123). Arguably, this form of legitimisation is particularly crucial to the EU multilevel system given the regulatory or efficiency-oriented focus of many policies (Majone 1999). The importance for the EU of 'Doing Less More Efficiently' has been stressed by Commission President Juncker and by national politicians from across the political spectrum, particularly following times of fiscal austerity in several member states (European Commission 2017).
Policy evaluation is a potential tool to demonstrate the effectiveness, efficiency, and hence 'output legitimacy' of the EU. Evaluations provide detailed information on the extent to which the actors involved have reached their policy goals (Corbett et al. 2011, p. 318-319 cited in Zwaan et al. 2016. As such, evaluation has a twofold aim: facilitating policy learning and demonstrating accountability (Versluis et al. 2011, p. 206). While learning can take place within an organisation, accountability requires exposure to a public forum (e.g. Zwaan et al. 2016, p. 678). Because output legitimacy of the multilevel system of the European Union has to be demonstrated by national governments to their respective national publics, national parliaments are likely forums for such justification to take place. However, besides its main purpose of providing accountability and policy learning, evaluation can also be used to manipulate political opportunity structures (Schoenefeld and Jordan 2019, p. 372). By being made public-for example, by being discussed in the plenary-policy evaluation becomes part of national parliamentary discourse and hence becomes politicised. The extent to which politicisation takes place has potentially important normative and policy implications. Therefore, this paper aims to answer the following research question: Under which conditions is EU policy evaluation mentioned in national parliamentary debates?
This paper uses dictionary-based computer-assisted content analysis of parliamentary debates to analyse the extent to which EU policy evaluation is debated in the plenaries of the parliaments of Austria, France, Germany, Ireland, Spain, and the United Kingdom for the time period from the early 1990s until 2012. The findings suggest that, overall, Eurosceptic parties mention policy evaluation more frequently than pro-European parties do. Furthermore, Eurosceptic public opinion can lead to an increase in the number of mentions of EU policy evaluation. However, the extent to which members of parliament (MPs) talk more or less about evaluation in reaction to changes in public opinion is moderated by the composition of parliament. When parties in parliament are generally pro-European, an increase in public Euroscepticism as measured by Eurobarometer surveys does not have an effect on the frequency with which EU policy evaluation is debated. However, when issues entrepreneurs, parties that are Eurosceptic and for which Europe is salient (De Vries and Hobolt 2012), are strongly represented in parliament, policy evaluation is mentioned more frequently in reaction to Eurosceptic public opinion. Arguably, this implies that policy evaluation is not generally used by pro-European parties to underline the output legitimacy of the EU in reaction to Eurosceptic public opinion, but only becomes part of parliamentary discourse when a Eurosceptic party uses it to criticise the EU. Pro-European parties might then be forced to refer to evaluation as well in response.
EU policy evaluation thus seems to be used as a politicised tool in domestic political contestation. The findings have important normative implications. Concerns about how policy evaluation could be used strategically by political actors were first voiced in the 1960s and 1970s. Weiss (1970) argues that with a growing number of social intervention programmes, evaluators would be subject to growing political pressures, and evaluation results themselves might be used for partisan purposes. Potentially, an increase in evaluation activity-intended as a means to counter populist dynamics with technical arguments-might in fact help to further undermine the output legitimacy of the EU if it becomes used in parliamentary discourse by Eurosceptic parities. Moreover, this form of engagement with evaluation might undermine the credibility of evaluation studies themselves, further weakening evaluation as an instrument for accountability and learning.
The remainder of this paper is structured as follows: The next section elaborates on the role of policy evaluation in improving output legitimacy in the EU and its link to accountability. The third section presents the hypotheses. The fourth section describes the paper's chosen method and independent variables. It also explains the rationale for the selection of country cases. The fifth section contains the model specification and analysis. The final section discusses the results and presents conclusions.

Policy Evaluation, Output Legitimacy, and Accountability in the European Union
The question of legitimacy of the multilevel system of the European Union has been subject to much scholarly debate. At the core of discussions on the 'democratic deficit' of the EU lies the assertion that national parliaments have lost power in the process of European integration, which has strengthened national executives-a weakness that the European Parliament supposedly is unable to compensate, even after its role has been strengthened in subsequent treaty reforms (e.g. as discussed in Moravcsik 1994). These critiques thus relate to a perceived lack of 'input legitimacy' of the European Union, or the extent to which citizens participate and are adequately represented in the political process (Schmidt 2010, p. 17). Moreover, the absence of a Europe-wide media, parties, and political competition more generally is said to contribute to this lack of legitimacy (e.g. Decker 2002;Follesdal and Hix 2006). Therefore, forms of legitimation that are not dependent on citizen participation and representation but rather on 'the problem-solving capacity of the multilevel European polity' gain particular importance, but are themselves undermined by the institutional structures and decision-making rules of the EU (Scharpf 2009, p. 198). Ward refers to output legitimacy as legitimacy "through the delivery of results for the citizens" (2010, p. 123). This form of legitimacy is furthermore salient given the historic core competencies of the EU in regulatory policy areas and their increased contestation (Schmidt 2010, p. 11). Thus, the legitimacy of the EU as an organisation with high consensus requirements relies to a large extent on the justification of EU policies in the 'communicative discourse' in the national political arena (Scharpf 2009, p. 189). Policy evaluation as an important factor in evidence-based policy making plays a crucial role in providing output legitimacy in the European Union (Widmer 2009). Indeed, the use of policy evaluation and the formalisation of evaluation practice have increased in the European Union since the 1980s and 1990s (Widmer 2004, p. 35)-even though the coverage of legal acts and policy areas is by far not universal (Zwaan et al. 2016). Evaluation can contribute to output legitimacy of the European Union via two mechanisms. Traditionally, evaluation has been seen as a tool for learning or to improve the efficiency and effectiveness of EU policies (Versluis et al. K 2011, p. 211-216). Most of the literature to date has focused on the role of policy evaluation in improving legitimacy in this context (e.g. Borras and Højlund 2015;Leeuw and Farubo 2008). However, there is a second avenue through which policy evaluation can have an impact on output legitimacy of the EU-and in particular the public perception thereof. The communication of the results of policy evaluation in public forums can reinforce the role of evaluation in influencing output legitimacy by making it part of the political discourse on the EU. This second mechanism is of particular importance in the EU context as output legitimacy does not rely merely on 'objective' measures of policy success or failure, but also on the effectiveness with which policies are justified by elites and the extent to which this discourse resonates with the citizens (Schmidt 2010, p. 7). In this context, mentions of policy evaluation can help to increase the perceived output legitimacy of the European Union-in case evaluation is used by political actors to demonstrate the effectiveness and efficiency of EU policies, or can serve to undermine it-if parliamentary actors refer to evaluation studies to point out supposed failures of the EU multilevel system in producing adequate policy outcomes. This public use of policy evaluation is thus closely linked to the concept of accountability, i.e. a process whereby an agent is held to account by its principal (e.g. Curtin et al. 2010). The role of policy evaluation in fostering accountability has been acknowledged (e.g. Versluis et al. 2011, p. 206) but has not been widely discussed or empirically tested in the literature (but see Zwaan et al. 2016).
Arguably, national parliaments are the most suitable institutions for the communication and justification of EU policy outcomes. After all, national parliaments are the venues in which national governments can be held publicly to account by the opposition and by their own backbenchers for activities in European affairs and for their voting behaviour in the Council of Ministers (e.g. Raunio and Hix 2000;Auel et al. 2015). Perhaps more so than the European Parliament, national legislatures are focal points of (national) media attention and public interest in political discourse (Auel and Raunio 2014;Hoerner 2017). In addition, Scharpf's conception of discursive justification of policies focuses in particular on the justification of national representatives in front of national publics, given the absence of a European demos discussed above (Scharpf 2009, p. 173). Demonstrating and challenging the legitimacy of the EU is thus a prime feature of national parliamentary activity in EU affairs. In this context, policy evaluation is arguably a highly pertinent tool in parliamentary discourse.
We would thus expect MPs to mention policy evaluation at least occasionally when debating the European Union and the value of the respective countries' membership in the latter. Discussing policy evaluation reports in the plenary would in part be a strategy to demonstrate accountability, i.e. to demonstrate and justify the behaviour of an agent (the European institutions) to the ultimate principal (the citizens) by referring to the findings of an independent actor (evaluator) along the chain of delegation in a public forum (the national parliament). Policy evaluation reports could support this mechanism primarily by providing information, which is crucial to overcome the information deficit of the legislature vis-à-vis the government in EU affairs; such a deficit is considered one of the underlying reasons of the alleged democratic deficit and the loss of influence of domestic legislatures (e.g. Moravcsik 1994;Raunio and Hix 2000).
Under which conditions, then, is EU policy evaluation mentioned in the plenaries of national parliaments? And what can we infer from these patterns regarding the purpose of its use? So far, only a few studies have focused on the politicisation of policy evaluation in the form of parliamentary debates. Focusing on oral questions in the European Parliament (EP), Zwaan et al. (2016) argue that members of the European Parliament (MEPs) use references to evaluation primarily for agendasetting purposes rather than to foster accountability. They find that interinstitutional conflict between the European Commission and the EP is the best predictor of a future reference to policy evaluation in a parliamentary question (Zwaan et al. 2016, p. 688). Focusing on the use of reports of the European Court of Auditors in the EP, Stephenson (2017, p. 47) argues that 'MEPs are not objective evaluators; they are biased politically-driven actors who may use ex-post evaluation to advance their political agendas and interests, often for short-term goals, including re-election'. Thus, strengthening the accountability mechanism might not be the only motivation for MPs to refer to policy evaluation in the plenary. They might also want to use EU policy evaluation strategically, to signal a particular message about the European Union to their constituents or to expose the position of the government or other parties on the latter. European Union policy evaluation might become 'politicised', i.e. itself become part of a discourse brought about by 'an increase in polarization of opinions, interests or values and the extent to which they are publicly advanced towards the process of policy formulation within the EU' (De Wilde 2011, p. 560).
We would thus expect MPs to be reactive to public opinion when referring to EU policy evaluation and EU affairs more generally. Moreover, we could expect this pattern to be pronounced in national parliaments given their higher level of media exposure and more established electoral connection. Furthermore, party political considerations might play an important role in determining the MEPs' willingness to refer to policy evaluation when discussing the EU. The exact conditions under which MEPs could be expected to mention policy evaluation in the plenary are discussed in the next section.

Hypotheses
In the past decades, the 'permissive consensus' said to have characterised European integration for many decades has given rise to a 'constraining dissensus' (Hooghe and Marks 2009, p. 13). Attitudes about the EU have become more critical, and citizens' assessments of the EU's performance vary greatly across countries based on a number of socio-economic factors and national benchmarking (De Vries 2018). Thus, if MPs refer to policy evaluation in an attempt as part of an accountability mechanism, we would expect to see a rise in the number of mentions of evaluation as a reaction to a rise in Eurosceptic public opinion. Evaluation provides the information necessary to demonstrate or challenge accountability, with the aim to highlight or criticise the output legitimacy of the European Union. In particular, when public attitudes towards a country's membership in the European Union are negative, national elites could be more likely to refer to EU policy evaluation in a public parliamentary forum in order to underline or challenge the citizens' perception of the output legitimacy of the European Union. We thus formulate the following hypothesis: H1: European Union evaluation is mentioned more frequently as public Euroscepticism increases.
However, besides the attitudes of the public towards the EU, the extent to which MEPs refer to policy evaluation is also likely to be strongly influenced by party political factors. The EU has become a contested issue in many countries in which most mainstream parties do not openly compete on European issues (Hooghe and Marks 2009, p. 10). Thus, the issue is often exploited by extremist parties on the left and right (De Vries 2007, p. 267;Szczerbiak and Taggart 2008). These parties that seek to put an issue not extensively discussed previously onto the parliamentary agenda have been termed 'issue entrepreneurs' (De Vries and Hobolt 2012, p. 250). Eurosceptic parties often focus on institutional or general EU issues rather than on specific polity-orientated criticisms of the EU (Hoerner 2017;Senninger 2017). Hoerner finds that even when issue entrepreneurs scrutinise specific policies, they still use this as an opportunity to criticise the EU at a more fundamental level (2017, p. 308). They might use policy evaluation to provide information on the weaknesses or failures of specific policies but relate those to more general critiques of the EU and its output legitimacy. Simultaneously, pro-European parties might be inclined to make more reference to policy evaluation as well when issue entrepreneurs are present. As Eurosceptic issue entrepreneurs challenge the legitimacy of the EU, pro-European mainstream parties might in turn refer to policy evaluation to highlight the output legitimacy of the European Union. Hence, a stronger presence of issue entrepreneurs might lead to more mentions of policy evaluation at the aggregate level. 1 Thus: H2: European Union evaluation is mentioned more frequently when issue entrepreneurs are represented in parliament.
Moreover, we expect the presence of issue entrepreneurs to have a moderating impact on the effect of Euroscepticism. When faced with a Eurosceptic public, MPs of pro-European parties might want to avoid mentioning EU policy evaluation as they face potentially high costs by debating technical issues on which they are likely to diverge from their voters. By contrast, issue entrepreneurs might be particularly incentivised to undermine the output legitimacy of the EU when public Euroscepticism is high. Moreover, mainstream parties might in turn be forced to refer to policy evaluation as well to bolster and underline the output legitimacy of the EU in such a scenario. Overall, policy evaluation might thus be mentioned more frequently when both the public and parliament are politicised on the issue of the EU. Thus:

H3:
The effect of public Euroscepticism on the mentions of EU policy evaluation in the plenary is negative at low levels of issue entrepreneurship and positive at high levels of issue entrepreneurship.

Method and Data
Parliamentary debates can be considered a crucial aspect of parliamentary activity in EU affairs because they fulfil a 'communication function' vis-à-vis the citizens (Auel and Raunio 2014, p. 2;Norton 1993). As such, they can also be considered an ideal venue to underline or challenge the output legitimacy of the EU by referring to EU policy evaluation. Legislative debates are generally a useful resource for researchers since they are publicly available and MPs use them for a variety of purposes (Proksch and Slapin 2010, p. 335). Debates thus present an excellent opportunity to observe different preferences and emphases given by MPs to the EU in different countries in a highly visible forum. To assess how the independent variables impact the extent to which EU policy evaluation is debated in the plenary, a content analysis was undertaken (Neuendorf 2016). The advantage of this approach is that it captures mentions in all debates. The rationale is that if more EU evaluation keywords come up in debates, this indicates that the parliament attributes more attention to the output legitimacy of the European Union. 2 In order to improve comparability across countries and parliaments in which debates take place at different frequencies and different lengths (possibly for linguistic reasons), the proportion of evaluation keywords in all debates in a certain month was calculated instead of the proportion of evaluation keywords in individual debates. For reasons of feasibility, two months per year were analysed: March and October. These months are characterised by strong parliamentary activity in all countries under analysis, and usually no breaks take place in these months. The time frame of the analysis at the level of parliaments is 1992 (ratification of the Maastricht Treaty) until 2012. This time frame was chosen because the Maastricht Treaty has frequently been described as the starting point for significant politicisation and stronger European integration (Börzel and Risse 2009;Marks et al. 1996).
Two dictionaries were constructed for the present analysis: one containing evaluation keywords and one containing general keywords from a variety of policy areas (foreign affairs, taxes, etc.). The dictionaries were then applied to the documents for each country/month using the programme QDA Miner/WordStat (Provalis Research, Montreal, Canada). Examples of the keywords can be found in the online Appendix. The proportion of all evaluation keywords of all keywords was then calculated and its logarithm used as the dependent variable. 2 The independent variables themselves are derived from a number of existing data sets.

K
The following countries were chosen as country cases for the analysis: Austria, Germany, France, Spain, Ireland, and the UK. These six countries represent an excellent institutional spread and the highest possible variation regarding the independent variables of the study. The aim was to select a diverse set of cases (Gerring 2017, p. 97). Thus, the analysis includes countries with a very Eurosceptic electorate, such as Austria and the UK, as well as countries with generally more pro-European voters, such as Ireland. Moreover, countries with strong formal scrutiny powers, such as Austria, and those with rather weak formal scrutiny powers, such as Ireland, are included (Winzen 2012). There is also strong variation regarding the average dissent within parties on European integration, with Austria and Germany showing very low values and the UK with very high values. The same holds true for the presence of Euroscepticism in the party system and the salience of the EU, as expressed by the issue-entrepreneurship score. Finally, the counties differ in their economic trajectories over the past decades, including the extent to which they were affected by the Euro crisis. Arguably, this implies that the output legitimacy of the EU has been challenged to varying degrees in the national communicative discourse. 3 The independent variables to test the hypotheses are operationalised as follows: Euroscepticism is considered as the share of respondents in Eurobarometer surveys who hold that the membership of their country is 'a bad thing' minus the share of those who think it is 'a good thing' (Eurobarometer, 2012). Issue entrepreneurship is operationalised following the approach of De Vries and Hobolt (2012). The issueentrepreneur score is generated by multiplying the salience score for each party in parliament with the sum of the mean party position of all parties in parliament on the EU minus the party position of the party (De Vries and Hobolt 2012, p. 256). The salience score and the party position on European integration are both included in the Chapel Hill Expert Survey (CHES) (Polk et al. 2017;Bakker et al. 2015). They are measured on score scales of 1-5 and 1-7, respectively, with higher values indicating higher salience and a more positive position on European integration, respectively. The distance between the position of a party on the EU and the mean party position is thus negative when the party is more pro-European than the mean of all parties and positive if it is more Eurosceptic (De Vries and Hobolt 2012, p. 256). For the analysis at the level of parliament, the sum of the issue-entrepreneur values for all parties in parliament was calculated. The issue-entrepreneurship score used here is thus an aggregate measure and a continuous variable. The composition of the different legislatures was derived from the ParlGov database (Döring and Manow 2012).
The control variable concerning the formal rights of national parliaments in EU affairs is based on Winzen's data (2012, p. 663), which focus on information rights, the involvement of European Affairs Committees (EACs) and sectoral committees, and mandating rights. 4 The strength of the parliaments' formal powers in EU affairs K be particularly active in highlighting evaluation reports related to the Presidency's priorities in plenary debates or call the government to action in certain areas. Finally, a monthly time-trend variable is included to account for a potential rise in evaluation output and activity over the period studied. The rationale here is that evaluation has become more prominent in the EU policy process since the 1990s. In the absence of data on the overall number of evaluation studies published per month, a linear time-trend variable can be used as a proxy for an overall increase in evaluation activity, which might drive an increase in references made to EU policy evaluation in national parliaments. For the statistical analysis, a two-level random-intercept model was applied. A multilevel model was chosen given the highly structured nature of the data, with two monthly observations clustered in each country for each year. A lagged dependent variable was included in the model to account for temporal autocorrelation, as recommended by Becks and Katz (1995). Since the dependent variable was a proportion and highly skewed towards zero, a log transformation was undertaken. Descriptive statistics can be found in Table 1.
As Fig. 1 shows, there is some interesting variation between countries and over time. Overall, there seems to be a strong increase in mentions of policy evaluation over time-in line with the documented increase of evaluation activity in the EU. Furthermore, there seems to be a significantly lower level of mentions in Ireland-perhaps a puzzling finding, as the country has benefited significantly from EU structural funds, which were at the forefront of policies to be evaluated (e.g. Bachtler and Wren 2006). Nevertheless, the general pro-European attitude at the level of the public and of political parties as well as the status as a recipient country (and with strong support for EU-funded projects) might help to explain this pattern.

Analysis and Results
In the following, the main effects and the interaction of Euroscepticism and issue entrepreneurship are discussed in the order of the hypotheses. As the results of the analysis in Table 2 show, the main effect for Euroscepticism is not significant at conventional levels of significance. This might indicate that MPs generally do not universally make reference to policy evaluation as a tool to highlight accountability and 'output legitimacy' of the EU when faced with a Eurosceptic public in order to demonstrate accountability. However, an increase in the issue-entrepreneurship score is associated with a higher share of evaluation keywords, with a coefficient of 0.07 significant at the 0.05 level. This indicates that Eurosceptic parties for which EU affairs are salient are much more likely to refer to policy evaluation than pro-European parties are. This seems to indicate that policy evaluation is politicised, i.e. used as a political tool in partisan discourse when Eurosceptic parties are present. As mentioned above, the underlying mechanism could be that Eurosceptic parties refer to policy evaluation as an attempt to demonstrate weaknesses of EU policies-they are more likely to use policy evaluation as a political tool. However, the current method does not allow us to distinguish between the positive and negative mentions of policy evaluation, so the overall increase of mentions of EU policy evaluation could also be due to pro-European parties making more references when faced with Eurosceptic public opinion. We now turn to the interaction of Euroscepticism and issue entrepreneurship. Figure 2 shows the marginal effects for the interaction in terms of Euroscepticism and issue entrepreneurship. The y-axis shows the marginal effect of Euroscepticism on the share of evaluation keywords out of general dictionary keywords given the issue-entrepreneurship score of the party system. On the x-axis, higher values indicate a higher issue-entrepreneurship score of the party system. At the lowest level of issue entrepreneurship-all parties represented in parliament are pro-European-the effect of Euroscepticism is negative but not significant. This implies that pro-European parties are not referring to policy evaluation in reaction to changes in public opinion, for example to underline the output legitimacy of the European Union. However, at the highest level of issue entrepreneurship, an increase in Euroscepticism is associated with an increase in the share of evaluation keywords. The effect is significant at the 0.05 level. Substantially, at the highest level of issue entrepreneurship, a 1% increase in public Euroscepticism leads to an increase in the share of evaluation keywords of around 1.64%. Thus, policy evaluation seems to be referred to in national parliaments in reaction to an increase in Eurosceptic public opinion when (Eurosceptic) issue entrepreneurs are present. This finding further underlines the argument that policy evaluation is used as a politicised tool in national parliaments. Rather than being used to underline the output legitimacy of the EU in reaction to critical public opinion, potentially by highlighting the effectiveness of certain policies, policy evaluation only becomes an element of parliamentary debates when it is activated in the presence of political actors-issue entrepreneurs-that might expect political gain from criticising EU policies against the backdrop of a Eurosceptic public that is likely to be receptive to such arguments. Again, this triggering of evaluation mentions by issue entrepreneurs might also reinforce references by pro-European parties to neutralise the issue entrepreneurs' claims-even though a verification of this mechanism and the balance of positive and negative mentions is beyond the scope of this study.
As for the control variables, neither formal scrutiny rights nor agenda control power of the parliaments are statistically significant. Thus, it does not seem to be the case that MPs compensate for a lack of information on EU affairs received via institutionalised channels by referring to policy evaluation. Neither does it seem to matter whether the government has a firm hold on the plenary agenda (for example, by preventing issue entrepreneurs from speaking on controversial issues). By contrast, evaluation keywords seem to be mentioned more frequently when a country holds the Council Presidency. Arguably, this might indicate that MPs may want to highlight the EU's performance in areas that are particularly salient for the government at the particular point in time, or to which it has committed publicly. However, the present approach cannot fully provide support for or against this claim. Finally, the linear time-trend variable is positive and significant at the 0.05 level-even though it is substantially small. This indicates that overall and controlling for the substantive variables of interest discussed above, the mentions of EU policy evaluation in the plenaries of the national parliaments studied here have increased in recent years, perhaps as the result of an increase in the number of evaluations conducted.

Discussion and Conclusion
This paper has analysed the factors determining the extent to which EU policy evaluation is mentioned in debates in national parliaments. Theoretically, the paper started from the premise that output legitimacy-the effectiveness and efficiency of EU policies-is particularly important in the European Union as more input-oriented pathways to provide legitimacy are arguably underdeveloped, following the classical 'democratic deficit' argument. Policy evaluation can be seen as an important tool to safeguard output legitimacy, for example by facilitating policy learning and improvement. However, evaluation can also contribute to (perceived) output legitimacy of the EU via a second pathway-when evaluation results are communicated publicly to the citizens. This use of evaluation is closely linked to accountability, as evaluation studies can provide information on EU policies that allow political actors to assess, challenge, or support their success or failure. Arguably, in a multilevel system such as the EU, the communication to national publics is particularly important-especially given the alleged lack of a European demos. Thus, national parliaments are likely to be important venues in which output legitimacy can be demonstrated or challenged in front of national publics, with members of parliament holding the government to account over the EU's performance. In this context, MPs might be reactive to public opinion on European integration to demonstrate or challenge output legitimacy in reaction to the preferences of the citizens as their ultimate principal.
Methodologically, this paper provided an original data set on the mentions of EU policy-evaluation-related keywords in the plenaries of six Western European countries, covering a 20-year period. The findings of the paper suggest that the mentions of policy evaluation only increase in reaction to public opinion when issue entrepreneurs, Eurosceptic parties for which the EU is salient, are represented in parliament. When issue entrepreneurs are present, policy evaluation is generally mentioned more frequently than when parties are pro-European. The mechanisms behind this finding might be that issue entrepreneurs themselves mention policy evaluation to criticise EU policies, pro-European parties are forced to underline output legitimacy of the EU by referring to evaluation studies, or there is a reinforcing combination of the two pathways. This indicates that rather than being used as an accountability tool to respond to citizens' preferences in general, EU policy evaluation is used in national parliaments in a politicised way, with references being the result of party-political considerations.
These findings have important implications for debates on policy evaluation in general and use in a parliamentary context in particular. First, they demonstrate that policy evaluation is indeed used as a political tool by political actors at the national level. The potential for evaluation to demonstrate value for money, safeguard accountability, and foster policy learning has to be seen in a partisan context, with political actors using evaluation studies strategically to react to and influence public opinion. European Union policy evaluation is thus not only, or perhaps indeed primarily, used to demonstrate accountability to the citizens, but it becomes politicised under conditions of partisan conflict. Second, policy evaluation is particularly likely to be referred to in the presence of issue entrepreneurs-parties that are often Eu-rosceptic or populist. Policy evaluation as a 'technocratic' solution to problems of accountability and legitimacy in the European Union can thus also be understood as a tool that might be intended to demonstrate the output legitimacy of the European Union but can also fuel populist and Eurosceptic critiques of the latter. This might be problematic from a normative point of view and might suggest that the use of policy evaluation to highlight the output legitimacy of the EU bears potential risks. Smismans (2017) sees an increased politicisation of evaluation as problematic and warns against attempting to use it as 'a tool to appease populist discourse' (p. 28). Eurosceptic and populist parties themselves could use evaluation findings to stress the-from their perspective-excessive regulatory burden of the EU (ibid.). This could not only further undermine the output legitimacy of the EU but might also erode the perception of policy evaluation as an objective, technical tool that provides trustworthy information on policy success or failure. A decline in the credibility of evaluation could then also have negative consequences for the ability of EU policy evaluation to provide and facilitate policy learning. Arguably, this could be particularly problematic in times in which strategic disinformation and the trustworthiness of public statements are a major concern.
This study thus has made an original contribution to the analysis of the strategic use and political contestation of EU policy evaluation. However, there are some limitations of the present study. For example, the geographic and temporal scope of the study could be expanded by including more member states and using a longer time period. Moreover, while this study has shed light on the relationship between policy evaluation and public opinion in parliamentary debates, a number of questions remain. For example, the current study was carried out at the level of parliaments, so it could not distinguish between different parties in terms of the frequency with which they mention EU policy evaluation. Moreover, future studies could analyse the extent to which positive and negative mentions of EU policy evaluation occur, i.e. distinguish instances in which evaluation is used to support (supposedly by pro-European parties) or challenge (supposedly by issue entrepreneurs) the output legitimacy of the EU. Such an analysis could get a closer handle on the causal mechanisms outlined in this paper and could be carried out either qualitatively or with more advanced quantitative text-analysis methods. A number of other interesting questions remain, such as looking at differences between policy areas and focusing on the use of evaluation in different venues (for example, in committees compared to the plenary) or institutions (in the European Parliament compared to national parliaments). All of these points provide ample potential for future research at the intersection of public policy, legislative studies, and political behaviour.