Predicting Paris : Multi-Method Approaches to Forecast the Outcomes of Global Climate Negotiations

We examine the negotiations held under the auspices of the United Nations Framework Convention of Climate Change in Paris, December 2015. Prior to these negotiations, there was considerable uncertainty about whether an agreement would be reached, particularly given that the world’s leaders failed to do so in the 2009 negotiations held in Copenhagen. Amid this uncertainty, we applied three different methods to predict the outcomes: an expert survey and two negotiation simulation models, namely the Exchange Model and the Predictioneer’s Game. After the event, these predictions were assessed against the coded texts that were agreed in Paris. The evidence suggests that combining experts’ predictions to reach a collective expert prediction makes for significantly more accurate predictions than individual experts’ predictions. The differences in the performance between the two different negotiation simulation models were not statistically significant.

In the first half of 2015, the global climate negotiations arrived at a crossroads. Would the high expectations for an international agreement by the end of 2015 at Paris be met? And if an agreement were reached, what would be the contents of such a global climate treaty? There was a great deal of uncertainty regarding the answers to these questions before the Paris negotiations were concluded. Amid this uncertainty, we generated forecasts of the negotiation outcomes based on three distinct approaches: an Ex Ante Expert Survey of expected results and two negotiation simulation models. Each of these approaches produced forecasts well in advance of the start of the final round of the Paris negotiations. In this article we report on the relative accuracy of the predictions generated by each of the three approaches.
The global climate regime originates from scientific efforts to elevate the issue of climate change to the global, diplomatic level in the 1980s, which ultimately culminated in the 1992 United Nations Framework Convention on Climate Change (UNFCCC) (Luterbacher & Sprinz, 2001, in press). The UNFCCC enjoys universal support, perhaps because it is mostly declaratory. By contrast, the 1997 Kyoto Protocol to the UNFCCC has been marked by more controversy. The Kyoto Protocol mandated all industrialized countries to manage their absolute emissions and to reduce greenhouse gas (GHG) emissions by about 5% between 1990 and 2012. Developing countries were not obliged to undertake mitigation obligations. The USA signed, but never ratified the Kyoto Protocol due to fears regarding the impacts on its domestic economy and the lack of emission-reducing obligations for emerging economies . Canada left the Kyoto Protocol just before the end of the first compliance period of 2008-2012. While a second compliance period of the Kyoto Protocol was ultimately agreed in 2012, it obliges only European countries and Australia to reduce emissions until 2020. The limitations of the Kyoto Protocol meant that the urgency of formulating a new global climate agreement grew.
A first attempt to agree on a successor to the Kyoto Protocol with universal participation was scheduled for December 2009 in Copenhagen (Dimitrov, 2010). Those hoping for a global agreement were bitterly disappointed. Before the Copenhagen conference took place, Stokman (2009Stokman ( , 2015 conducted an analysis of the negotiations similar to the one we perform here. He applied the Exchange Model, which correctly predicted that two issues would block a comprehensive agreement in Copenhagen, namely whether or not the proposed treaty would be an extension of the Kyoto Protocol and whether or not developing countries would be obliged to reduce CO2 in a measurable, reliable, and verifiable way. A similarly pessimistic prediction was made by Bueno de Mesquita (2009). Regrettably these pessimistic predictions were borne out by the 2009 Conference of the Parties at Copenhagen.
The period prior to the Paris conference in December 2015 was characterized by considerable uncertainty about whether more progress would be made this time. There were some signs to warrant optimism. Notwithstanding the failure to reach an agreement in Copenhagen, those talks did lead to a new bottom-up approach, which arguably laid the foundations for a future agreement. Since Copenhagen, countries have been strongly encouraged to develop Intended Nationally Determined Contributions (INDCs), 1 which are essentially national 1 Future national commitments will be laid down in Nationally Determined Contributions (NDCs). climate policy plans to be shared with the UNFCCC's membership. Furthermore, the failure at Copenhagen led to an impetus to avoid a repeat. The United States' government also took a markedly different approach in the preparations for the Paris Conference of the Parties (COP) compared to the Copenhagen COP, displaying a greater commitment to making multilateral negotiations work. This stronger commitment to reaching an agreement at Paris was shared with the Chinese government, and embodied in a joint US-China presidential statement in September 2015, in which Presidents Obama and Xi emphasized their personal commitment to finding an agreement. 2 Despite these positive signs, large differences remained between the negotiating positions of the world's largest countries and regions.
It was in this uncertain context that we began a study in early 2015 with a view to predicting the outcomes of the Paris negotiations. We employed three distinct methods for generating these predictions, one based on experts' predictions, and two based on negotiation simulation models, all of which will be described in more detail below. Our research team consists of researchers from two international climate institutes (CIC-ERO-Center for International Climate and Environmental Research-Oslo, and PIK-Potsdam Institute for Climate Impact Research) and three universities (New York University, University of Groningen, and University of Strathclyde). To ensure the comparability of the predictions from these three different approaches, it was important to identify and assess a common set of issues, and to design the study in such a way that the analyses could be performed using a common set of inputs into the simulation models. We published our predictions in October and November 2015 on an academic, open access internet platform-well before the final round of global climate negotiations-which were concluded by 12 December 2015 (Kallbekken & Saelen, 2015;Sprinz & Bueno de Mesquita, 2015;Stokman & Thomson, 2015). Here, we revisit the methods for predicting the Paris outcomes, which is a combination of a decision of the Conference of the Parties of the UNFCCC and the annexed legally binding Paris international agreement. 3 Arild Underdal's contribution to the study of international climate policy is profound and our work has been clearly influenced by his contributions. In effect, Underdal was "present at the creation" of this article on at least two occasions. First, in the early 2000s, Harold K. Jacobson suggested using simulation models to forecast global climate negotiations, and Underdal, Bueno de Mesquita, and Sprinz were part of the team that further developed the idea; yet progress at the time was stalled by the untimely passing away of Jacobson. Second, Underdal served as chief applicant of the Centre for International Climate and Energy Policy (CICEP), and Sprinz approached him in late 2014 with the idea to follow up on the earlier ambition. As a result, Arild Underdal and other CICEP members 4 contributed to the derivation of the scales employed in this article. This approach to employing multiple methods to predict the outcomes of multilateral negotiations represents a novel approach to research on global climate change negotiations, which with some notable recent exceptions (e.g., Genovese, 2014;Michaelowa & Michaelowa, 2012;Weiler, 2012), has been characterized by qualitative case studies.
In the following, we provide brief overviews of the approaches used (Section 2) and the assessment of the results (Section 3), while the final section offers concluding observations.

Three Approaches
In this section, we outline the procedure for identifying the substantive issues to be predicted and the three methodological avenues chosen to make predictions on these issues. In addition, we describe the procedure for obtaining the input information for the negotiation simulation models, which consists of the list of main stakeholders and several key attributes of each of these stakeholders.

The Issues at Stake and Scaling
We identified 13 key negotiation issues that together address the main components of the global climate change regime. The negotiation issues fall under the headings of the mitigation of greenhouse gases (reducing emissions), adaptation (coping with damages due to climate change), and compensation. In addition, the issues address the overarching question of differentiation of obligations, and issues concerning climate finance as well as legal form. For each issue, a range of possible outcomes was identified and placed on a scale from 0-100. This was undertaken based on interviews with UNFCCC negotiators, the initial draft negotiation text for the Paris Agreement as of 25 February 2015, 5 parties' submissions to the negotiations process, consultation with scholars, and the authors' knowledge of the process. The 13 issues were labelled as follows:  differentiation (of obligations)  mitigation-monitoring, review, and verification (MRV) as well as compliance arrangements  the legal form of obligations on mitigation  the legal form of adaptation  institutional setup for adaptation  climate finance-volume  climate finance-who is obliged to contribute?  adaptation reserved finance  loss & damage  the mechanism to determine future mitigation obligations (progression principle)  mitigation goal for 2050  mitigation goal for 2100 and  ex ante assessment of future Nationally Determined Contributions.
The scaling of possible outcomes on each issue implies that alternatives are ranked on a single dimension (e.g., from least to most ambitious). The numerical difference between alternative outcomes is assumed to be interval scale and related to the political difference between them. All issues and respective scales can be found in Appendix 1.
In the first of the three approaches to generating predictions, we conducted an Ex Ante Expert Survey (see below), in which we asked experts to make straightforward predictions of the outcome on each of these 13 scales. The simulation models, however, require more information. They generate predictions of outcomes using information on the main stakeholders and some of their key attributes, including stakeholders' positions on each of the issues. Our first task was to identify the relevant stakeholders. While we recognize the importance of NGOs in the global governance of climate change, the consensus among the experts and participants we consulted is that the COPs are primarily intergovernmental affairs. We therefore decided to focus on major countries and groups of countries as stakeholders. A range of negotiating groups are formally recognized by the UNFCCC secretariat. 6 We followed these groupings, while recognizing the political reality that major countries have to be included separately from their groups. The resulting 16 stakeholders were chosen to include the most prominent individual countries and negotiating blocks within the UNFCCC. To the list of major emitters, we added country groups based around regional affiliation or shared interests so that virtually every Party to the UNFCCC is represented and overlap avoided. We do not include the G77 as a separate actor, for instance, because its members are represented by other stakeholders, and the G77 does not take a coherent position on all issues. Our stakeholders consist of the following: The selection of these stakeholders implies the assumption that all domestic and transnational actors influence the international negotiations by way of these 16 stakeholders. By necessity, this simplifies the more complex reality, including the fact that each of these stakeholders includes several factions. This is a defendable simplification in that each stakeholder can only represent a single negotiating position on each issue. However, the lack of information on each faction's position means that the information is less nuanced than recommended by the proponents of the negotiation simulation models.
We gathered estimates of the negotiation positions of each of the stakeholders on each of the 13 issues, and in doing so placed each of the stakeholders on a position (between 0 and 100) on each of the issues. These position estimates were based on analysis of stakeholders' submissions and statements to the negotiations on the Paris Agreement since the launch of that process in 2011-in total 185 documents. This analysis was supplemented with interviews of key negotiators, and the authors' experience from closely following the negotiations process. Not all stakeholders took a position on each of the issues. For instance, neither Brazil nor China had a clear position on the issues concerning mitigation goals for 2050 and 2100. In their working paper published prior to the Paris conference, Sprinz and Bueno de Mesquita (2015) applied the Predictioneer's Game to the set of issues excluding the 2050 and 2100 issues, arguing that the data on these issues are incomplete. For the purposes of comparison, we include these two issues but note their earlier concern and the fact that the substantive findings are the same regardless of whether these issues are included. We derived estimates of the level of salience that each stakeholder attached to each issue and the flexibility of each stakeholder on each issue. Again, these salience and flexibility estimates were quantified on 0-100 scales. These judgements were derived from assessments by the authors, which were informed by how often and strongly stakeholders had expressed positions in their submissions and statements, and on interviews with negotiators. Finally, the models also require estimates of the relative influence of each of the stakeholders. We formulated two sets of influence scores, which turned out to be highly correlated: one from a team of negotiators and one from a subgroup of the authors based on their own scholarly judgement. In the working papers published prior to the Paris conference, Stokman and Thomson (2015) applied the Exchange Model based on the influence scores from negotiators, while Sprinz and Bueno de Mesquita (2015) applied the Predictioneer's Game based on the influence scores from the authors. Here, we compare the predictions using the authors' set of influence scores, but note that the main findings remain the same regardless of which set of influence scores we use.

The Expert Survey
The first approach to prediction was based on a survey of experts, which was held during 9-20 September 2015, more than two months before the Paris conference began on 30 November 2015. We issued an online survey to a convenience sample of 104 experts whom the authors identified though several scholarly projects and events that closely followed the then current negotiations. Although previous experiments (Tetlock, 2005;Tetlock & Gardner, 2015) have shown that experts perform no better-sometimes even worsethan amateurs, we selected experts because our survey focused on detailed sub-topics in the negotiations, meaning that a substantial knowledge of the process was required to provide well-formed predictions. A total of 38 respondents (36.5%) provided predictions, and almost all respondents gave predictions on all of the issues. The survey questionnaire used the same issue scales that were used for the input data for the negotiation simulation models (Appendix 1). Respondents were asked to give their expectations on outcomes of the Paris negotiations as positions on each of the 13 scales, employing the ordinal scale points mentioned in Appendix 1. We emphasized that they should enter the outcome they expected even if it deviated from the positions they advocated. We assured the respondents that their responses would be anonymized. We refer to these experts as the "Ex Ante Experts." In addition to the 13 substantive questions, we also asked respondents about their regional affiliation and their role in the global climate negotiations. While we did not expect to obtain a representative sample, it is useful to know whether the responses might be biased in any particular direction. The invited experts were fairly well distributed across regions, but those who responded were primarily (82%) from the UNFCCC region "Western Europe and others (including the USA)". Simple tests indicate that responses of the predicted outcomes per issue do not differ between this dominant group and respondents from other regions. One third of respondents were researchers, and one quarter were country delegates (negotiators). The rest consisted of consultants, NGO representatives, former country delegates, and journalists.
The respondents were also given the opportunity to provide comments on each question and to the overall survey. Many of the comments expressed a desire for more nuanced response options. These responses are understandable given that the set of alternatives had to conform to a monotonically increasing scale to ensure comparability with the two simulations models, thereby imposing some limitations on the range of possible alternatives. The questionnaire informed respondents about this limitation and asked them to pick the alternative corresponding most closely to their expectation in cases where none of the labeled scale points fitted perfectly.

The Simulation Models
Collective decision-making is the process in which stakeholders have to transform different preferences into a single collective decision that binds all actors within a social system. In doing so, all actors try to influence the decision outcome, including efforts by some to prevent decision-making and maintain the status quo. The dynamics in collective decision-making processes result from the simultaneous efforts of stakeholders with different policy positions to build coalitions in support of their own positions. This implies that stakeholders may be willing or forced to support positions that differ from those they advocated at the outset of the negotiations. In the literature, such shifts in positions are attributed to three main processes: persuasion, logrolling, and enforcement (Stokman, Van der Knoop, & Van Oosten, 2013), and each of these processes is associated with a specific type of network (Stokman, 2014). Previous research has applied models that are representative of these processes to international negotiations in the context of EU decision-making (Thomson, Stokman, Achen, & Koenig, 2006). The present study extends this work to the global level by applying two such models: the Exchange Model, which represents the logrolling process; and the Predictioneer's Game, which represents the enforcement process.

Exchange Model
The Exchange Model encapsulates the intuitively plausible idea that negotiations are driven by a process of political exchange, whereby stakeholders make concessions on some issues in return for concessions on other issues. The result is that stakeholders are willing to support another position on an issue that is of relatively less importance to them in exchange for support from another stakeholder on an issue that is relatively more important to them. The model formalizes the conditions under which political exchanges take place and provides a tool for analyzing complex negotiations in which many stakeholders and issues are involved.
The Exchange Model assumes that each stakeholder has complete knowledge of the positions, issue saliences, and influence of all other stakeholders. We further assume that all stakeholders share a common view on what the expected outcome on each issue will be if each issue were considered separately. This expected outcome is a variant of the Nash Bargaining Solution (NBS), which is approximated by the average of the initial policy positions of the stakeholders, weighted by the product of each stakeholder's influence and salience (Achen, 2006). This expected outcome can be considered a collectively optimal outcome for all actors if each issue is considered separately. Position exchanges link pairs of issues and provide pairs of stakeholders with opportunities to reach decision outcomes that they prefer to the expected outcome. Therefore, position exchanges allow the actors involved in those exchanges to optimize the expected decision outcomes in line with their own individual interests.
Each stakeholder may have one or more possible exchange opportunities. If a stakeholder has more than one opportunity, it must select the one it tries to realize. A potential exchange is realized only if both stakeholders agree to realize it. This will happen only if neither of them has a better alternative exchange. When an exchange is realized, both stakeholders may make deals with other stakeholders only if the outcomes of such deals have no negative effects for the first exchange partner. This condition, of course, limits future exchange possibilities in the bargaining process. In other words, when stakeholders realize an opportunity for an exchange they enter into a binding commitment, which is what makes the Exchange Model a cooperative bargaining model. Within each round of the simulated negotiations, the model works through each exchange opportunity and calculates the resulting shifts in stakeholders' positions. The round ends after all exchanges have been realized. At the end of a round, there usually remain differences among actors' positions. The expected outcome based on actors' revised positions is taken as the predicted outcome after that round of exchanges. The model assumes that the stakeholders then commence a subsequent round of negotia-tions, starting with initial positions somewhere between their initial positions in the previous round and their negotiated positions at the end of the previous round. The higher the salience of an issue to an actor, the greater is the weight of the former initial position relative to the negotiated position. Extensive experience in applying the Exchange Model shows that ten rounds give a good estimate of final positions and outcomes in negotiations (Stokman & Van Oosten, 1994).
Modeling position exchanges requires careful consideration of the nature of these exchanges. In particular, a choice has to be made about which exchange rate to use. The exchange rate determines the extent to which each stakeholder shifts its position. The present Exchange Model uses an equal, absolute utility gain for both exchange partners. This has the advantage that exchanges have the same utility for both partners, and that the exchanges can be ordered in terms of their relative attractiveness to both exchange partners. The disadvantage of the equal utility gain assumption is that it involves an intersubjective comparison of utility, which is theoretically problematic (Arrow 1951(Arrow /1963. 581), however, review - Roth and Malouf (1979, pp. 580 rong tendency for several studies that report a st outcomes of bargaining games to give players equal payoffs when those outcomes differ from the Nash prediction. More recent evidence results from splitting resource pool experiments (Dijkstra & Van Assen, olutions for the alternative s 2008). Furthermore, exchange rate lead to different orderings of exchanges for each stakeholder, facing the problem of deadlock, whereby no two stakeholders prefer, and therefore can . realize, the same exchange Bilateral exchanges also have important side effects or externalities with respect to the utilities of other stakeholders as exchanges result in shifts in the expected outcomes on issues. Externalities arise when stakeholders who are not involved in an exchange are either positively or negatively affected by it (Dijkstra, Van Assen, & Stokman, 2008;Van Assen, Stokman, & Van Oosten, 2003). If over all simulated exchanges between stakeholders, the positive externalities for each stakeholder are greater than the negative ones, we may expect overall agreement. If, however, important stakeholders experience substantively higher negative externalities of other stakeholders' exchanges than positive ones, including their own exchanges, this may result in opposition to the negotiated outcomes. In such cases, the final interests of the stakeholders are likely to be insufficiently complementary to reach overall agreement.

The Predictioneer's Game
The Predictioneer's Game is a model designed to address policy problems for which there is the possibility of a negotiated compromise but there is also the possibility of threat or actual use of costly, coercive pres-sure (Bueno de Mesquita, 2011). The model is not appropriate, however, for market-driven decisions since these do not involve either negotiation or coercion. The Predictioneer's Game assumes that people are rational in the sense that they do what they believe is in their best interest. They may learn later that the negotiations lead to different results. The model is both predictive and prescriptive. For instance, one feature of the model as a practical tool is that its output can also help decision makers better anticipate what would happen if they alter their pattern of action in specific ways designated through the model's logic. Based on hundreds of applications in peer-reviewed outlets (and many more in confidential settings), the evidence shows that this model and its predecessors accurately predicts issue outcomes over ninety percent of the time (e.g., Ray & Russett, 1996). Hence, it is a reliable and practical tool for policy analysis.
The Predictioneer's Game solves N(N-1) two-player games for t-periods of play where N is the number of players, with third-party interests included in each player's calculations. The game assumes two dimensions of uncertainty for each player. Each player is uncertain regarding each other player's type on two dimensions. Specifically, is another player the type that, given the opportunity, prefers to coerce or negotiate and, if coerced, prefers to retaliate or give in? Players update beliefs about each other's types following Bayes Rule and is solved for the Perfect Bayesian Equilibria for each stage game. The stage games are repeated t times, where t, the number of iterations, can be selected by the user. The model signals the period when the "super" game for all players is expected to end based on two conditions: (1) looking ahead one period, the average player expects her welfare to decline or, (2) if there are veto players, at least one of them believes it is better to stop the game than to continue to the next "round." The sequence of play for player pair i,j when i moves first is as follows: (1) Player i decides whether to make a proposal whose content is endogenously derived. A proposal requests a shift in j's position on the issue in dispute; (2) If a proposal was made, then the recipient chooses to accept or make a strategically chosen counter-proposal. If no proposal was made, then j has the opportunity to follow the sequence of moves initially available to i (following the sequence described for i); (3) Following a proposal and counter-proposal, player i can offer a compromise settlement with j or i can coerce j, imposing costs; (4) If a compromise offer was made, then j can negotiate, producing an expected agreement, or j can coerce i; (5) Following any coercive move, the target can re-taliate or capitulate to the other player's demanded outcome.
The model relies on the mean voter theorem to generate estimated predicted outcomes in each round, using the average of the mean-voter prediction in the first round in which one of the game-ending conditions has been met plus the average of the mean predicted outcome in the round before (if there is one) and the round after. Unlike the Exchange Model, the Predictioneer's Game is a non-cooperative bargaining model and relies on the assumption of issue-by-issue decisions rather than concessions across issues.

Overview of Results
In the following, we report our results for the three approaches used to predict the outcomes of the climate negotiations and the accuracy of each of these approaches. The information on the point predictions derived from each of the three approaches to predicting the outcomes of the Paris negotiations is provided in Table 1. From the first approach to prediction, which is based on the 38 Ex Ante Experts, we take the average of these 38 predictions as the collective prediction of our group of experts. We are also, however, interested in the predictive accuracy of individual experts compared to predictions from the other approaches.
The information in Table 1 shows not only the average of the Ex Ante Experts' predictions, but also the range and standard deviations of the experts' predictions. This information clearly shows a great deal of variation among experts in their expectations about the outcomes of the negotiations. Note that it is entirely possible for individual experts' predictions to be far from the actual outcome, while the average of their predictions is close to the actual outcome: For example, if two experts predict 0 and 100 on a policy scale while the actual outcome is 50. This is a possibility we examine below. From the second approach to prediction, which is based on the Exchange Model, we derive two sets of predictions. These two sets of predictions differ with respect to the assumption about which issues are linked to each other in the process of negotiations, which in effect leads to two distinct variants of the Exchange Model. In the Inclusive Exchange Model, we assume that exchanges are possible across all 13 issues. The Restrictive Exchange Model by contrast assumes that exchanges can only be made across issues within substantively related subsets of issues. From Table 1, these groups are: (1) mitigation and adaptation issues: The reason for specifying these distinct variants of the Exchange Model was that both before and after the Paris negotiations we obtained evidence that the financial issues were negotiated relatively independently from the rest of issues. For that reason, we published predictions from both the Inclusive and Restrictive Exchange Models before the Paris conference (Stokman & Thomson 2015; Tables 2 and 3). By contrast, in the third approach to prediction, based on the Predictioneer's Game, issues are not linked with each other at all. We therefore present only one set of predictions from that approach. The various predictions shown in Table 1 should be interpreted in light of the issue-specific scales reported in Appendix 1. As noted earlier, the predictions we assess here differ marginally from those we published prior to the Paris conference because we revised the input data to ensure that the analyses are as comparable as possible. 8 8 As noted earlier, the predictions of the Exchange Model published prior to the conference were based on estimates of influence provided by negotiators, while the predictions of the Predictioneer's Game were based on similar estimates from a subgroup of authors. Here we use the estimates from our authors. The predictions from the Predictioneer's Game excluded the issues of mitigation goals for 2050 and 2100 due to concerns about missing data, while those presented here include these issues. The results are substantively the same if we exclude the two ambition issues. Using our own coding of the COP-21 texts as the benchmark, we obtain the following mean errors (and standard deviations) for the remaining 11 issues: Average Ex Ante Experts 11.64 (8.90); Individual Ex Ante Experts 21.45 (14.65); Inclusive Exchange Model 22.45 (9.43); Restrictive Exchange Model 17.10 (7.99); Predictioneer's Game 18.02 (7.45). Table 1 also contains our coding of the actual outcomes of the Paris negotiations. Initially, we asked 12 independent experts from around the world, across a broad range of disciplinary backgrounds, to individually score the outcomes of the Paris negotiations in an email survey. This ex post sample of experts did not overlap with the ex ante sample. Half of the invited experts scored the outcomes on the scales reprinted in Appendix 1. This Ex Post Expert Survey unexpectedly generated considerable variance across experts for a broad range of issues. Since the range of responses was very substantial, we ourselves undertook two complete codings of the outcomes of the issues. Our two codings produced nearly identical results, and we retained one of them as our ex post assessment of the negotiated outcomes (Table 1), substantiated, by direct reference-for each issue-to the core UNFCCC COP-21 decision and the Paris Agreement (see Appendix 2).
To assess the accuracy of our predictions across three approaches, Table 2 contains the mean absolute errors across the 13 issues as the benchmark for assessing the accuracy of the predictions given in Table 1.
To calculate the errors of the predictions of "Average Ex Ante Experts," we first calculated the average prediction made by the 38 Ex Ante Experts on each of the 13 issues. We then calculated the absolute difference between this average (collective) prediction and the actual outcomes, and then calculated the average of these absolute differences across the 13 issues. By contrast, to calculate the error of the predictions of "Individual Ex Ante Experts," we first calculated the absolute difference between each of the 38 Ex Ante Experts' predictions and the actual outcomes. We then computed the average error across the 38 experts, before calculating the average error across the 13 issues. A comparison of the errors from the Average and Individual Ex Ante Experts shows that the Average predictions are considerably more accurate than the Individual predictions: The Average Ex Ante Expert prediction has an error of 14.92 compared to Individual Ex Ante Experts of 20.75.
Table 2 also shows that the errors of the models' predictions are generally somewhat higher than the errors of the Average Ex Ante Experts' predictions, but not necessarily higher than the Individual Ex Ante Experts' predictions. The Inclusive Exchange Model makes the least accurate predictions. However, the average errors of the Restrictive Exchange Model are slightly lower than those of the Predictioneer's Game.
Another perspective on accuracy of predictions can be gained by focusing on the degree of accuracy, i.e., by grouping the magnitude of errors into absolute errors that are 10 points or less, more than 10 and up to 20 points, more than 20 and up to 30 points, and more than 30 points (see Table 3). Focusing on rather accurate predictions with an average error of up to ten points, the Average Ex Ante Experts perform best (six Table 2. Mean errors of each of the predictions (13 issues).  Note: The four entries per cell reflect the distribution of absolute errors: ≤ 10, > 10-20, > 20-30, > 30. Note: Figures refer to the numbers of issues on which the row prediction is better, worse, or the same as the column prediction in terms of predictive accuracy. P-values are from the non-parametric Wilcoxon-Cox sign test; two-sided tests that the medians of the errors are equal. issues), followed by the Predictioneer's Game (four issues), the Restrictive Exchange Model (two issues), and Individual Ex Ante Experts (one issue) -while the Inclusive Exchange Model performed worst (zero issues). If we instead focus on major mispredictions exceeding 30 points, the Inclusive Exchange Model shows the most pronounced weakness (3 issues), followed by the Predictioneer's Game (two issues), while all other approaches only generate one major misprediction each.

Our Coding of COP-21 Texts
In addition, Table 4 presents pairwise comparisons of the accuracy of each of the predictions with a simple non-parametric test (the sign test). A non-parametric test is arguably appropriate given both the small numbers of observations and the fact that the issues are interdependent. The sign test allows us to test the hypothesis that the difference between the median of the two sets of prediction errors is zero. A small pvalue (by convention when p≤.05) allows us to reject the null hypothesis, thereby inferring that one set of predicted errors is significantly lower than the other. The first inference to draw from Table 4 is that the predictions of the Average Ex Ante Experts are "better" (i.e., more accurate) than the Individual Ex Ante Experts on all 13 issues. This difference is highly significant (p =.00). Moreover, there are no significant differences between the accuracy of the Individual ex ante predictions and those of the three sets of predictions from the negotiation simulation models.
The predictions of the Inclusive Exchange Model are worse than those of the Restrictive Exchange Model, but this finding is not statistically significant at conventional levels. Thus, there is limited evidence in favor of exchanges within substantively related subsets of issues. The remaining pairwise comparison between the Inclusive Exchange Model and the Predictioneer's Game is insignificant. Finally, there is no substantive or statistically significant evidence of differences in the performance of the Restricted Exchange Model and the Predictioneer's Game.

Concluding Remarks
We conclude with several noteworthy observations from our investigation. Although the Paris agreement has been widely lauded as a great success for the global governance of climate change, the evidence suggests that the contents of the agreement reached is highly ambiguous. For each of the 13 main controversial issues that formed the agenda in Paris, we went to considerable lengths to describe in detail the possible different outcomes that might be reached to resolve the differences among the stakeholders' positions. In early 2016, we held an online survey of a small group of highly expert observers to assess what had been agreed in Paris a few months earlier, and found substantial differences among their answers in eight of the 13 issues (their answers ranged more than 20 points on the 100-points issue scales). This may partly reflect the limitations of an email survey. But it also points to the inherent ambiguity in the Paris texts that were agreed. One member of a large negotiating team stated that much of the subsequent conference held in Bonn in May 2016 focused on figuring out exactly what had been decided the previous year (personal interview, 28 June 2016). Introducing ambiguity in negotiation outcomes is one way of achieving the semblance of agreement and progress, which allows a broad range of participants to claim victory. However, in this policy area where countries need to make specific commitments to mitigate, adapt to or compensate for the effects of climate change, ambiguity is highly problematic. We decided to offer our substantiated assessment of the agreement reached at Paris (Appendix 2). Future efforts to conduct a large-scale survey on interpreting the outcomes agreed at Paris in late 2015 might be instructive.
The main finding from comparing the predictions with the actual outcomes is that the Average (collective) predictions of the Ex Ante Experts are significantly more accurate than the predictions of Individual experts. In other words, prior to the COP, individual experts tended to either under-or overestimate the ambitiousness of the outcomes that would be reached in Paris. However, on average their over-pessimistic and over-optimistic expectations cancelled each other out in the average predictions. This finding resonates with de Caricat's classic jury theorem (de Caricat, 1785(de Caricat, /1994; loosely stated, the theorem proves that as the size of a jury increases from one to infinity, the likelihood that it will reach a correct verdict by collective majority vote approaches one. Similarly, public opinion researchers have found that public opinion at large appears to be better informed than individual voters, because errors of judgement made by individual voters cancel each other out in the process of aggregation (e.g., Page & Shapiro, 1992). The average predictions of the Ex Ante Experts also performed well in comparison to the predictions of the negotiation simulation models, but not significantly better. While experts' predictions are a relevant benchmark for comparison, they offer no theoretical insights into the processes through which negotiations took place.
By contrast, the Exchange Model and Predictioneer's Game give detailed accounts of the negotiation process based on cooperative and possibly coercive negotiation processes, and our model comparisons provide some insight into the negotiations that took place in Paris. We found evidence that the Inclusive Exchange Model (which posits that all issues can be combined with each other in profitable exchanges) performed somewhat worse than the Restrictive Exchange Model (which posits that exchanges take place only within substantively related subsets of issues). This points into the direction that logrolling takes place, and it is limited to subsets of substantively related issues. This challenges the idea that the COPs are forums in which "thinking is joined up" (Schroeder & Lovell, 2012, p. 26), by suggesting that there are constraints to making such linkages. One of these constraints may lie in the structure of the negotiating teams, which given the complexity of the negotiations typically involve subgroups of officials working on different topics. These officials are often located in different ministries at the national level, such as foreign affairs, environment, and finance departments. Future research might examine the effects of such institutional constraints on both the ways in which delegations formulate their negotiating positions and the process of negotiations at the global level.
The limitations of the present study highlight opportunities for future research. The evidence did not enable us to make statistically significant distinctions between the accuracy of most of the predictions we assessed. It is noteworthy that the evidence from the negotiated outcomes is consistent with predictions from two quite different negotiation models: the Restrictive Exchange Model and the Predictioneer's Game. The former model offers a cooperative account based on limited logrolling across issues, while the latter offers a non-cooperative account in which issues are dealt with separately and actors may attempt to coerce others. Developing research designs to test the micro-level predictions of these models is still largely open ground for future research. Unlike the Ex Ante Experts, these models make not only predictions of the decision outcomes, but also of actors' behavior and perceptions during the negotiation process, including predictions of changes in the negotiating positions of each actor over time. Given the largely closed negotiations at Paris, systematic outside observation of relevant processes was not practically feasible, yet we hope that future research will overcome such limitations.
Future research should also consider more refined designs that depart from our simplifying assumption that countries and groups of countries are unitary actors. This simplification was arguably justified by the fact that these actors take partially coherent positions in the UNFCCC negotiations. However, some authors would have preferred a more disaggregated approach that tried to identify factions within countries as the relevant actors. Further, we focused squarely on governmental actors, agreeing with participants who observed that COPs are primarily intergovernmental affairs. However, the lobbying efforts of environmental and business interests are undoubtedly also worth to be explicitly included in the analyses. We recommend that future research in this area is explicitly comparative in design, which means that it makes comparisons involving different theoretical approaches, different COPs, and possibly also negotiations in other settings. A degree of quantification strengthens our ability to make such comparisons. This represents a radical departure from common research practice in this field, which as noted in a recent review by Genovese (2014) is dominated by qualitative case studies with some notable exceptions. The strength of qualitative case studies lies in the richness of the substantive knowledge they convey. By combining this strength with the comparative method and a degree of quantification, we will be able to generate cumulative knowledge about the conditions under which distinct negotiation processes are triggered and under which progress in international negotiations is achieved.

Acknowledgements
We greatly appreciate the comments received from two anonymous reviewers and Frederica Genovese on an earlier version of the manuscript as well as the guidance offered by the journal editors. We also appreciate comments received during presentations on occasion of the 16th Annual Policy Conference "Designing Effective Climate Policy in the EU and the U.S.," 3-  Norway, Project No. 209701). The Exchange Model employed in this article was developed in close cooperation with Reinier Van Oosten, who also developed the software, financed by the company Decide (now part of the dutch group). Moreover, we appreciate the advice and feedback we have received from scholars and political practitioners throughout this project. Finally, we acknowledge that institutional funding received by our respective home institutions allowed us to undertake this project. Tetlock, P., & Gardner, D. (2015)

Steffen Kallbekken is a Research Director at CICERO and Director of CICEP-Strategic Challenges in
International Climate and Energy Policy. He holds a PhD in economics from the University of Oslo. His research has focused on international climate policy, public support for climate policy instruments, pro-environmental behavioral change, long-term climate targets, and the mitigation of short-lived climate forcers. (1977-present) and specialized in social network analysis as well as models of collective decision making. He has served as Board Member of the Research School SOM of the Faculty of Economics and Business, University of Groningen (2003-Present); Director of DECIDE (dutch) company (1994-present); Co-Founder and Board Member of Energy Cooperative Grunneger Power (2011-Present) and of the Foundation "Samen Energy Neutraal" (Together Energy Neutral) (2013-present). Stokman also served as Director of the Interuniversity Center of Social Science Theory and Methodology (ICS, 1993(ICS, -2003.

Frans Stokman is Professor of Sociology at the University of Groningen
Håkon Saelen received his PhD in Political Science from the University of Oslo, Norway, in 2016, and holds a joint appointment as Senior Researcher with CICEP-Strategic Challenges in International Climate and Energy Policy across CICERO and the University of Oslo. He publishes on international climate cooperation, international negotiations, climate policy, agent-based modeling, and behavioral experiments.
Robert Thomson is Professor of Politics at the University of Strathclyde, Glasgow, UK. He previously held positions at Trinity College Dublin, and at the Universities of Groningen and Utrecht in the Netherlands. His research focuses on international comparisons of democratic representation, as well as negotiations and policymaking. He is author of Resolving Controversy in the European Union (Cambridge University Press), and dozens of peer-reviewed articles and book chapters on national, EU, and international politics.