Predicting, deciding, learning: can one evaluate the ‘success’ of national climate scenarios?

Scenarios may be understood as products and/or processes. Viewing scenario exercises as productive tends to emphasize their tangibility: scenario products may acquire value unrelated to the processes of their creation. Viewing scenario exercises as procedural tends to emphasize their modes of formation: the process of constructing scenarios may have benefits irrespective of the value of ensuing products. These two framings yield different expectations about how one might evaluate the ‘success’ or otherwise of scenario exercises. We illustrate three approaches to evaluating the success or otherwise of scenarios using the example of the series of national UK climate scenarios published between 1991 and 2002. These are: predictive success (has the future turned out as envisaged?), decision success (have ‘good’ decisions subsequently been made?) and learning success (have scenarios proved engaging and enabled learning?). We reflect on the different ways the ‘success’ of national climate scenarios might be evaluated and on the relationship between the productive and procedural dimensions of scenario exercises.


Introduction
The concept of scenarios was originally developed in the 1960s as a way of aiding strategic thinking and decision-making within the Shell oil corporation (Van der Heijden 1997). Rather than seeking to 'predict the future', scenario exercises were originally designed to sensitize an organization to a wide range of possible futures. Scenarios have subsequently become central in the framing, analysing and negotiating that surrounds the idea of climate change. Introduced into climate change studies by Flohn (1977) with his publication of the first climate scenario 5 , we now find scenario-construction exercises and scenario analysis existing in many different areas of climate change debate, operating across many different scales and applied to many different sectors. Thus we have greenhouse gas emission scenarios, climate scenarios, land use scenarios, socio-economic scenarios, adaptation scenarios, policy scenarios, and so on. These scenarios may be portrayed and analysed at scales ranging from global and continental to national, regional and local. The ubiquity of scenarios is parallelled by a range of construction methodologies and of institutional ownership arrangements. In the case of climate scenarios, two methodological elements have become dominant in their design: one or more scenarios describing future emissions of greenhouse gases and other climate forcing agents, and one or more climate models used to quantify the climatic consequences of this (these) emission scenario(s). The procedural and institutional aspects of how climate scenarios are constructed-who authorizes the models? who designs the scenarios? who manages the process?-are less often emphasized (see Hulme and Dessai 2008).
One of the clarifying messages to emerge from a workshop held at Brown University in March 2007-Global environmental futures: interrogating the practice and politics Table 1. Summary characteristics of the four generations of UK climate scenarios (as published in Hulme and Dessai (2008) of scenarios-was that scenarios can be understood either as products or as social processes (O'Neill et al 2008). Valuing scenarios primarily as useable products is more likely for those operating within the natural sciences or economics; valuing scenarios primarily as learning processes is more likely for those operating within the social sciences or within organizational settings. Recognizing these twin attributes of scenarios offers a number of different ways in which scenarios may be used, valued and evaluated as they circulate around scientific, social and policy worlds. It is our suggestion in this paper that, first, the production aspects of climate scenarios have received greater attention than have the procedural aspects of their creation and that, second, the procedural aspects have received much greater attention than has the evaluation of the 'success' of the ensuing scenarios. Despite the ubiquity of climate scenarios within climate change debates there remain remarkably few analyses which have reflected on how one might evaluate their success, and even fewer studies that have actually conducted such an evaluation.
In this short contribution, we offer a perspective on how such evaluation might be approached, using the example of four generations of national UK climate scenarios, in the production and use of which the authors have been directly involved (CCIRG 1991, 1996, Hulme and Jenkins 1998, Hulme et al 2002. Table 1 offers a summary of the main characteristics of these scenarios, the construction and context of which have been described in an earlier paper . In addition to table 1 we refer readers to this work for further details. In the present paper we offer a threefold framework for thinking about scenario evaluation in the context of climate change. We ask, and seek to answer, three questions about these specific national climate scenarios: has the future turned out as envisaged? (what we call predictive success); have 'good' decisions subsequently been made? (decision success); have scenarios enabled participation and learning? (learning success). These three questions might broadly be related to the evaluation criteria of credibility, saliency and legitimacy offered by Cash et al (2003) in their assessment of science-policy interfaces. As we summarize elsewhere (Hulme and Dessai 2008, p 56), '. . . credibility is concerned with the scientific adequacy of the technical component of the scenarios, salience is concerned with the relevance of the scenarios to the needs of decision-makers and legitimacy is concerned with the process and transparency of the scenario design, construction and distribution'.
We conclude the paper with a discussion about whether this is a helpful framework for thinking about climate scenario evaluation and what is thereby implied about the future role of climate scenarios in climate change decision-making.

Predictive success
This criterion of success-has the future turned out as envisaged?-emerges from a restricted view of scenarios as products and from the still narrower view of regarding scenarios as quantitative or semi-quantitative predictions of the future (notwithstanding that the formal definition of a scenario-a plausible representation of the future-excludes it possessing such predictive properties). In this sense, a single scenario is only 'successful' if it can retrospectively be shown to have described reality with some adequate level of verisimilitude. Multiple scenarios may be 'successful' if reality subsequently can be shown to have fallen within the scenario range. How scenarios are communicated and explained will have a considerable bearing on the extent to which this success criterion is relevant. This predictive success criterion might be applied within some scientific and many policy circles, for example, and this criterion also likely has intuitive resonance with the general public.
One justification for such a criterion flows from a view of adaptation to climate change summarized by Füssel (2007, p 265): 'The effectiveness of pro-active adaptation to climate change often depends on the accuracy of regional climate and impact [scenarios].' In this view, good climate adaptation decisions can only be made if climate scenarios are 'accurate', a position that is consistent with optimization approaches to decision-making (see Dessai et al 2009 for a discussion of this position). A contrasting view of climate scenarios, however, emphasizing their innately weak predictive power is implied by the disclaimer which has accompanied successive versions of national climate scenarios for Australia. Thus their most recent report states on the front cover that '. . . no responsibility will be accepted by CSIRO or the Bureau of Meteorology for the accuracy of the projections in or inferred from this report, or for any person's reliance on, or interpretations, deductions, conclusions or actions in reliance on, this report or any information contained in it' (CSIRO 2007). This cautionary warning about the limits of predictive accuracy has not accompanied any of the UK climate scenarios reviewed here.
There have recently been attempts to evaluate IPCC scenarios using this criterion of predictive success-for example Van Vuuren and O'Neill (2006) and  evaluated IPCC's global emissions scenarios against observed emissions trends, and Rahmstorf et al (2007) and  evaluated IPCC's global temperature and sealevel rise scenarios against recent observations. A recent study has similarly examined the original Limits to Growth scenarios commissioned by the Club of Rome in the early 1970s (Turner 2008).
Evaluating the predictive success of a climate scenario (or climate scenarios) is different from the evaluations that are made of numerical weather forecasts. Daily weather or seasonal climate forecasts are amenable to verification; once a forecast has been produced it is possible the following day or season to assess how well the model has performed. In the case of daily weather forecasts this can be repeated many hundreds of times, and robust indices of forecast accuracy can be constructed. Such verification is not feasible for climate scenarios because of the long time scales involved-in the order of decades up to a century and sometimes beyond.
Evaluating the predictive success of a climate scenario (against recent observations) is also a very different exercise from evaluating the performance of a specific climate model (against historical observations). A 'good' climate model with a defined level of predictive skill does not necessarily translate into 'good' climate scenarios; the assumptions and manipulations that take place in the process of climate scenario construction  mean that the performance of a climate model cannot necessarily be equated with the predictive success of scenarios that derive from it.
We have demonstrated elsewhere  how one might evaluate the predictive success of successive generations of UK climate scenarios. We compared various scenario projections-dating from 1991, 1996, 1998 and 2002-against observations for the period 1961-90 to 1978-2007 for national-scale indicators of temperature and precipitation. Our analysis showed that recent trends in observed UK climate have indeed fallen broadly within the range of published climate scenario projections, the greatest ambiguity occurring with summer precipitation.
Scenario evaluations such as these may be important to undertake, but they also raise as many questions as they answer. For example, the relatively poor fit of observed UK summer precipitation to the scenario trends might be for three different reasons: deficiencies in the underlying climate model(s) used-in this case mostly from the Hadley Centre; inadequacies in the way the scenarios were derived from the model(s); or simply due to high levels of natural multi-year variability in UK summer precipitation which cannot be easily represented in climate scenarios purporting to reveal long-term trends in anthropogenic climate change. As noted by Oreskes et al (1994), falsification of such model 'predictions' may have greater learning value than any number of confirmations of model veracity, a point we return to in the discussion below. Even though we have between 5 and 15 years of observations against which we can evaluate the scenarios, such periods may not be sufficiently long to provide robust answers to the question: were these scenarios a good description of the way future UK climate evolved?
A second question raised by this type of retrospective evaluation of scenarios' predictive success concerns the multiplicity and nature of climate scenarios that may be involved.
Is one evaluating the predictive accuracy of an individual scenario, a family of scenarios or a set of probabilistic scenarios? For example, in the case of the UKCIP98 scenarios (published in 1998) four different national climate scenarios were portrayed, each of them based on a different combination of emissions scenarios and climate model parameters. One might conclude one decade later, as shown in the Dessai and Hulme (2008) study, that the actual UK climate has indeed fallen within the projected scenario range, but this implies that the scenario family was 'accurate', not necessarily any single scenario. Thinking of scenario success in terms of predictive skill forces us to conclusions in which we have to judge one scenario in a family of scenarios as 'better' than the others. And yet all four UKCIP98 scenarios were created through the same construction process and were claimed to be 'equally plausible'.
This problem of retrospectively evaluating the predictive success of multiple scenarios emerging from a single scenario exercise becomes even more acute when the family of scenarios is presented in probabilistic form. This is the approach taken in the forthcoming UKCIP09 national climate scenarios (UKCIP 2008) and increasingly with other new climate scenario products (e.g. CSIRO 2007). As long as observed climate reality subsequently falls within the stated scenario probability density function, the probabilistic scenarios can be claimed to have predictive success. This again suggests that falsification is the aim rather than confirmation: scientists learn more if reality falls outside a probability density function than if it falls within it.

Decision success
This criterion of scenario success asks a different question: have decisions made on the basis of the scenario(s) subsequently turned out to be 'good' ones? This is an important question to ask because climate scenarios are increasingly being used to climate-proof multi-million pound infrastructural investments and to develop new risk and resource management strategies.
At one level, and as with predictive success, this criterion can only be addressed with the benefit of retrospection.
For example, since 1990 the environment ministry in the UK has adopted a recommendation, derived from climate and sea-level scenarios, that a future rate of sea-level rise of 6 mm yr −1 should be incorporated into decisions and designs about coastal defence infrastructure in the UK (MAFF 1999). Whether such a recommendation has contributed to effective decision-making in the coastal zone can only be assessed with the benefit of hindsight. For example, after one, two, or more decades one could attempt to evaluate whether new coastal infrastructure thus designed has reduced the economic or social damage caused by coastal flooding. The difficulty is that this question can only be answered on the basis of some counterfactual scenario-what damage would have occurred if the original infrastructure design or decision had not incorporated the scenario adjustment? This counterfactual question can only be answered quantitatively using models to simulate alternative realities-simulations of unfolding realities with and without the scenario-informed infrastructure.
These methodological difficulties in applying this success criterion as initially stated lead us to modify our framing of decision success. We instead evaluate climate scenarios by asking the question: do the scenarios contain a sufficient representation of knowable climatic uncertainties to offer the prospect that decisions taken in the light of the scenarios will prove to be robust? The justification for framing the criterion this way emerges from a view of decision-making summarized by Groves and Lempert (2007, p 76): 'Robust decision-making proceeds from the observation that decisionmakers often manage deep uncertainty by choosing strategies whose good performance is relatively insensitive to poorly characterized uncertainties.' The focus of success here is less on the (retrospective) accuracy of the climate scenario(s) or the (retrospective) efficiency of the decision, but more on establishing an enabling condition for 'good' (robust) decisions to be made; i.e., in which a wide range of relevant uncertainties have been considered.
This appears a more tractable approach to evaluating decision success than dwelling solely on decision outcome. For example, the CCIRG1996 climate scenarios sampled the known uncertainties affecting future UK climate much more narrowly than did the UKCIP02 climate scenarios (see Hulme and Dessai 2008). This offers an a priori reason for arguing that subsequent decisions made on the basis of the CCIRG1996 scenarios would be less robust than later decisions made on the basis of the UKCIP02 scenarios. Yet we have also shown that in some circumstances the UKCIP02 scenarios may not score very highly on this success criterion (Dessai and Hulme 2007). Proposed investment decisions for managing future drought risk, made by a water company in eastern England informed solely by the UKCIP02 scenarios, only proved robust to the stated climate uncertainties because of fortuitous circumstances. The climate model underpinning the UKCIP02 scenarios simulated much drier conditions over the long term than all the other models that could have been used. Investment decisions that had similarly relied on the UKCIP02 scenarios, but related to managing flood risk rather than drought risk, would not have proved so robust.

Learning success
Both of the above two criteria take an instrumental view of evaluating the success of climate scenarios: did they turn out to be accurate and did they enable robust decisions? Our third suggested success criterion asks a rather different question: did the scenarios prove engaging and did they enable learning? This criterion sits more sympathetically with the view of scenario exercises as, primarily, processes of shared enquiry and mutual learning rather than an emphasis on the practical utility of any tangible scenario products. The justification for this criterion emerges from a view of scenarios summarized by Pulver and VanDeveer (2007, p 4): ' [Scenarios]. . . can serve to build networks of individuals linked by common concerns, generate shared understanding, or stabilize interaction between different social worlds.' Rather than using scenarios to optimize decisions, or even to facilitate robust decisions, this criterion of success emphasizes the heuristic, pedagogic and social roles that national climate scenarios can play.
Developing formal metrics of scenario success in this case becomes harder still, but again from the case of the UK climate scenarios we can illustrate some elements that might constitute such a metric. The UKCIP98 scenarios were published shortly after the UK government established the Climate Impacts Programme to enable a stakeholderled assessment of nation-wide climate change impacts and adaptation options. As reported by Hedger et al (2006, p 210), these national climate scenarios helped to engage and consolidate a community of public and private sector organizations wanting to consider climate change in their decision-making: 'the climate change scenarios have been powerful tools for engagement purposes on their own'. By acting as a shared product, promoted and disseminated through UKCIP acting as a 'boundary organization' (see Guston 1999), these scenarios raised awareness, stimulated participation amongst diverse stakeholders and forged a community of learning. In contrast, the previous generation of national climate scenarios-CCIRG96-pre-dated the formation of UKCIP and were not nearly so successful in establishing wide user engagement. The existence of a boundary organization to exploit climate scenarios for facilitating social learning at a national scale across very diverse organizations may therefore be a necessary condition for this criterion of scenario success. This reasoning is part of the case made by Miles et al (2006) for a National Climate Service in the USA.
One metric for evaluating the learning success of national climate scenario(s) may therefore be exposure, uptake or usage. The more widely communicated or used a particular set of climate scenarios becomes, the greater the potential for those scenarios to promote social learning amongst strategists or decision-makers. A wide range of organizations have used the UKCIP02 climate scenarios for communication purposes or to contribute to strategic or design planning (Gawith et al 2009). A recent survey of UKCIP stakeholders recently undertaken by the UK government's Department for Environment, Food and Rural Affairs (Defra) showed that over 90% of respondents had made use of the UKCIP02 climate scenarios, the highest uptake of any of the eight surveyed tools and products (e.g. scenarios, costing methodologies, decisionframeworks, adaptation wizards) developed by UKCIP (Defra 2008). Furthermore, in the judgement of users these climate scenarios exhibited the highest degree of coherence and the most appropriate level of detail of all eight tools and products.
Although usage statistics do perhaps reveal something about the saliency of scenarios amongst stakeholders, on their own they are admittedly a crude measure of learning success. There may remain many organizations that have not used national climate scenarios in any process of learning, and mere familiarity with scenarios does not guarantee that organizations assimilate climate change information into their strategic planning. Even less does it guarantee that 'good' or more robust decisions result (see the section above). The same Defra survey asked respondents to self-evaluate the benefits of the tools and products to their organization. Over 90% of respondents claimed there were organizational 'benefits' of using the UKCIP02 climate scenarios, and over 50% claimed 'significant benefits'. These responses were again the highest of all the UKCIP tools and products surveyed. Exactly what these self-evaluated benefits were would need more detailed investigation, but they seem unlikely to be related to our previous two evaluative criteria of predictive or decision success. There is some sense then that in the perception of these stakeholders these climate scenarios have acted to promote awareness-raising, networking and organizational learning. The UK water industry is another, more sectoral, example of how the UKCIP02 scenarios promoted organizational learning (Dessai and Hulme 2007).

Discussion
Whether one sees the primary value of climate scenarios as products to be used or as processes to be learnt from, there is no easy way of evaluating their 'success.' In this paper we have suggested three different criteria by which scenarios could be evaluated and illustrated their application through the specific example of the four generations of national UK climate scenarios published between 1991 and 2002. We offer three final observations on the value of these different approaches to evaluation.
If national climate scenarios are treated primarily as quantitative products seeking to predict the future, we must recognize the limitations of attempts to evaluate their success. Climate scenarios result from a process of design and construction which always occurs at a particular time and in a specific context. Scenarios as products are ephemeral and are always likely to be displaced by later scenarios. The original CCIRG91 national UK climate scenarios (published in 1991), although projecting climate futures out to 2010, 2030 and 2050, are thus perceived to have little value today. Similarly, the UKCIP02 scenarios (published in 2002) have been displaced by the UKCIP09 scenarios (published in spring 2009), and already a new set of scenarios (UKCIPnext) is being contemplated. Even if the CCIRG91 scenarios could somehow be shown to have contained greater predictive skill than UKCIP09, it seems implausible that organizations would want to continue using them given more recent scenario products. Users have disregarded impact and adaptation studies that used UKCIP98 scenarios simply because they are aware of the later UKCIP02 scenarios.
We should also recognize that when applying a criterion of predictive success to scenarios, more is likely to be gained by falsification than by confirmation. If the climatic future turns out differently to that envisaged by a set of scenarios, this offers the prospect of learning what factors in reality were either ignored or not well represented in the scenarios. In the case of national climate scenarios these factors may have three origins. There may be poorly represented physical climate processes in the underlying climate model(s)-for example poor representation of summer convective processes may help explain why UK climate scenarios seem not to have projected the change in summer precipitation that has been observed. Natural decadal variability in climate-not captured by the scenarios-may dominate the observed medium-term climate trends, thus leading to apparent discrepancy between scenario(s) and reality. And there may also be social, economic or technological drivers of greenhouse gas emissions that were poorly understood in the underlying emissions scenarios (this has been shown to be the case in the case of the IPCC Special Report on Emissions Scenarios; . Evaluating the predictive skill of climate scenarios therefore offers the prospect of learning, although we should beware that the intuitive expectation that learning will progress uniformly over time towards the 'true' answer is not always realized (O'Neill et al 2007). If we consider climate scenarios as products, we suggest that while there can never be a 'correct' scenario, 'incorrect' climate scenarios can help us better understand physical and social reality.
Our second point emerges from treating climate scenarios as social processes. Here, the retrospectively evaluated predictive success of the scenarios is largely irrelevant, even if their prospectively claimed predictive skill does have value (see below). Instead, we must recognize that social processes of learning are continuous and adaptive. As the relationship between climate change science, society and policy changes, so will the demands, expectations and roles of different social actors in scenario-generating processes. Just as there is no 'correct' climate scenario, there is no 'right' scenario process. Designing and managing the social processes of climate scenario negotiation and usage is as important and difficult as managing the technical aspects of climate scenario construction.
Finally, we note that there exists a constructive tension between the roles of climate scenarios as products and as processes. By emphasizing the status of national climate scenarios as products with scientific credibility, predictive authority and national consistency, it becomes possible to mobilize and entrain social actors and organizations into what subsequently becomes a learning process. This was clearly the case with the UKCIP98 and UKCIP02 scenarios; without the credibility and predictive authority carried by these scenario products, stakeholders would not have been so willing to commit time and resources to engaging with them (Gawith et al 2009). Yet there is a double irony here. The first irony is that once engaged in a scientific learning process about how climate scenarios are constructed and the uncertainties that they carry, stakeholders appreciate the limited predictive skill that the scenarios in fact contain, and also their transience. The second irony that comes out of the learning process is the appreciation that this limited predictive skill is nevertheless not a hindrance to their use. Rather than enabling optimized decisions about the future on the basis of predictive accuracy, scenarios can be used to facilitate robust decision-making on the basis of (represented) uncertainties. Central to securing the 'success' of scenarios in this more nuanced way is the development of appropriate guidance and interpretative material to accompany scenarios when circulating as products. Such guidance material never accompanied any of the four generations of UK scenarios reviewed here, but has been provided for the more recent UKCIP09 scenarios.
We therefore suggest that using climate scenarios in a social learning process may actually require a degree of illusion about their predictive skill before expectations about what the scenarios offer decision-makers can be more appropriately calibrated. The creative tension is between (interpreted) claims of some modellers that they can predict future climate, climate scenarios which reveal that in fact they cannot and robust decisions which are relatively insensitive to this discrepancy. This tension may also exist in other areas of science-policy interactions where model-based scenarios play a salient role in policy deliberations (see Evans 2008).
The ultimate purpose of scenarios-as originally recognized by Shell in the 1960s-is to bring conceptions of multiple possible futures into deliberations, strategies and decisions that are made today and/or in the near future. They are to do so in a structured and coherent way in which one learns as much about how we think we can 'know' the future, as one learns about what that future might be. Climate scenarios are not 'predictions' which describe what will happen, but are to be understood as 'predictive judgements' which describe what could happen (Shearer 2006, p 68). They are best understood as 'boundary objects' (Star andGriesemer 1989, Shackley andWynne 1996) whose ultimate evaluation should be made against multiple criteria-for example, the three suggested here-which reflect the different purposes for which climate scenarios are constructed.