Experiment design of the International CLIVAR C20C+ Detection and Attribution project

There is a growing research interest in understanding extreme weather in the context of anthropogenic climate change, posing a requirement for new tailored climate data products. Here we introduce the Climate of the 20th Century Plus Detection and Attribution project (C20C + D&A), an international collaboration generating a product specifically intended for diagnosing causes of changes in extreme weather and for understanding uncertainties in that diagnosis. The project runs multiple dynamical models of the atmosphere-land system under observed historical conditions as well as under naturalised versions of those observed conditions, with the latter representing how the climate system might have evolved in the absence of anthropogenic interference. Each model generates large ensembles of simulations with different initial conditions for each historical scenario, providing a large sample size for understanding interannual variability, long-term trends, and the anthropogenic role in rare types of weather. This paper describes the C20C + D&A project design, implementation, strengths, and limitations, and also discusses various activities such as this special issue of Weather and Climate Extremes dedicated to “First results of the C20C + Detection and Attribution project”.

T been the realisation that event attribution analysis leads to an improved scientific understanding of extreme weather itself by bridging daily forecasting, seasonal forecasting, and climate change research (Dole et al., 2011;Stott et al., 2013;Hoerling et al., 2013). However, the biggest motivation in recent years has been to provide information which helps the public at large to contextualise their experiences of current weather within the setting of anthropogenic climate change (Jézéquel et al., 2018), as exemplified by the 133 studies in the "Explaining Extreme Events from a Climate Perspective" supplements to the annual Bulletin of the American Meteorological Society "State of the Climate Report" since 2012 (Peterson et al., 2012Herring et al., 2014Herring et al., , 2015Herring et al., , 2016Herring et al., , 2018. Despite this proliferation of event attribution research, there remains a dearth of publicly available data products tailored toward general event attribution analysis. Some development has been made in terms of products designed for characterising recent variability and trends in extremes, such as the HadEX2 observational product (Donat et al., 2013) and the Twentieth Century Reanalysis Project (Compo et al., 2011), but these have limited ability to inform diagnosis of the underlying causes of long-term variations and trends. A more thorough understanding requires large collections of simulations of dynamical climate models. These provide large samples, allowing robust statistical characterisation of rare extremes, and the experiment design can be formulated specifically to diagnose causal factors external to the climate system. The most well-known example of this type of experiment consists of the historical (run with observed changes in greenhouse gases and other changes in atmospheric composition, the land surface, and solar insolation for the past 150 years) and historicalNat (run with the anthropogenic drivers maintained at pre-industrial values) simulations submitted to the international Coupled Model Intercomparison Project Phase 5 (CMIP5, Taylor et al., 2012). However, the number of simulations for any single model in CMIP5 is moderate at best, with moving windows in time providing reasonably large sample sizes for only a few models. Furthermore when considering atmospheric extremes, CMIP5 models have substantial regional biases in ocean temperatures that may have strong effects on the local gradients required to power extreme weather. Plans for the successor project to the detection and attribution component to CMIP5, namely the Detection and Attribution Model Intercomparion Project (DAMIP, Gillett et al., 2016), do not call for a larger number of simulations, and progress in reducing biases in ocean temperatures may only be moderate, if past progress is a guide (Flato et al., 2013). Event attribution studies thus far have therefore either made substantial assumptions to work around these issues, or have produced bespoke climate model output that is either not generally applicable to analysis of other extreme events or is not publicly accessible (e.g. Pall et al., 2011;Hoerling et al., 2013;Schaller et al., 2016).
Substantial further progress in event attribution thus demands a new climate model product tailored specifically for the problem. What should that product look like? There are both many conceptual and methodological differences in what constitutes event attribution analysis (Shepherd, 2016;National Academies of Sciences, Engineering, and Medicine, 2016). Some approaches require a very specific experiment design (e.g. Hannart et al., 2016), but nevertheless there are enough commonalities in the data requirements for most approaches such that it should be possible to have a product that can inform most methods. The CMIP5-style historical and historicalNat design does so, for instance. Methods that depend on analysis of long-term trends or the anomalous magnitude in relation to normal variability can be informed by historical-style simulations designed to simulate weather under boundary conditions that have been experienced, usually accompanied by observational data (e.g. Dole et al., 2011;Hoerling et al., 2013). Methods that use a factual-counterfactual comparison additionally require historicalNat-style simulations designed to simulate weather under boundary conditions that would have been expected in the absence of anthropogenic interference (e.g. Stott et al., 2004;Pall et al., 2011).
Here we introduce the C20C + Detection and Attribution (C20C + D&A) project, a new public international multi-model data product specifically designed to inform assessments of variability, longterm trends, and the anthropogenic role in extreme weather over terrestrial areas. It should also prove useful for understanding atmospheric variability generally. It follows the historical/historicalNat format, and thus can inform a large variety of methods for diagnosing mechanisms and causes. Unlike CMIP5 and DAMIP, the design uses models of the atmosphere-land system, using prescribed ocean surface and sea ice conditions (Pall et al., 2011). This should reduce ocean biases, and the greater computational efficiency permits large ensembles of simulations with models at higher spatial resolution than when using dynamical ocean models. The project is being undertaken through the Climate of the 20th Century Plus (C20C+) activity of the World Climate Research Programme's CLIVAR, which adopted the D&A project as a new focus in 2013 (as well as updating its name from C20C, Folland et al., 2014). C20C+'s purpose is to develop understanding of the nature of changes in atmospheric variability as well as their causes (Folland et al., 2002). The C20C + D&A experiment design is specifically intended to address questions concerning: • the characterisation of historical trends and variability in the properties of extreme weather events, including uncertainties such as those encapsulated through differences across models; • the estimation of the role of human interference in historical and current extreme weather, including understanding of the underlying uncertainties.
This paper is part of a special issue in this journal reporting on "First results of the C20C + Detection and Attribution project", and is intended as the general introductory paper for the project. Throughout this paper we will point the reader for further details of various topics to other C20C + D&A papers in this special issue and elsewhere, as appropriate. We start this paper by describing the experiment design in Section 2. Current progress is reported in Section 3, including details of the implementation of the experiment by each model. The C20C + D& A project has intentionally left room for flexibility in a number of aspects of the design, so details described in the section may be crucial for understanding results from comparisons across models. Section 4 presents a brief summary of some major lessons from analyses reported in this special issue and elsewhere, with a particular focus on indications that aerosol chemistry may be a highly important -and uncertainfactor. Section 5 lists various activities being undertaken to facilitate usage of the C20C + D&A project data, with free widespread usage considered a vital component of the project.

Experiment design
The project generates large ensembles of simulations of atmosphere models run under two types of scenarios ( Fig. 1), as initially tested by Pall et al. (2011) and since performed in a large and growing number of studies. There is no prescribed method within the project for generating different simulations within a given scenario. Most contributions so far have used macro-or micro-perturbations to a given initial state. For HadGEM3-A-N216, different realisations of stochastic physics are the primary distinction between simulations . While the use of atmosphere models, rather than coupled atmosphere-ocean models, should reduce ocean biases and permit greater computational efficiency (and hence more simulations with models at higher spatial resolution), the lack of a dynamically interacting ocean implies assumptions that anthropogenic climate change does not influence ocean variability, that short-term coupled atmosphere-ocean interactions are unimportant in production of extreme weather, and (depending to some degree on how the simulations are analysed) that the anthropogenic climate change influence is identical for all (relevant) forms of extreme weather (Risser et al., 2017;Dong et al., 2017;Fischer et al., 2018).
The "All-Hist" factual scenario mimics the CMIP5 "historical" and DAMIP "historical" (Gillett et al., 2016) scenario, except that observed sea surface conditions are prescribed rather than being calculated by a dynamical ocean model. Radiative and surface conditions are varied in the same way as they have in the real world. These include greenhouse gas concentrations, tropospheric aerosol burdens, stratospheric ozone concentrations, stratospheric aerosol burdens, solar luminosity, land use/cover, sea surface temperatures, and sea ice concentrations. Some climate model simulations are run instead with aerosol precursor emissions, calculating the burden through atmospheric chemistry modules.
The "Nat-Hist" counterfactual scenario mimics the CMIP5 "historicalNat" and DAMIP "hist-nat" (Gillett et al., 2016) scenario, designed to represent how the world might have evolved in the absence of anthropogenic interference with the climate system. Anthropogenic radiative conditions (greenhouse gas concentrations, tropospheric aerosol burdens/emissions, ozone concentrations) are set to circa year 1850 values, but stratospheric aerosol burdens and solar luminosity remain unaltered from the All-Hist scenario. The nature of land use/ cover change under the Nat-Hist scenario is ambiguous and the decision on how to treat land use/cover change has been left at the discretion of the participating modelling groups (Section 3).
There are many different possibilities for the Nat-Hist ocean surface conditions (Pall et al., 2011;Christidis and Stott, 2014;Bichet et al., 2016;Schaller et al., 2016;Stone and Pall, 2019;Sun et al., 2018). Given the nature of the of the experiment design, it seems most sensible to retain the variability in the observed All-Hist sea surface temperatures, to ensure that results do not depend on a different sampling of El Niño events, for instance (Pall et al., 2011). The project thus adopts the practice of estimating Nat-Hist sea surface temperatures through the use of estimates of the amount of ocean warming attributable to anthropogenic interference, and subtraction of those estimates from the observed All-Hist sea surface temperatures (Fig. 1). Local sea ice concentration is nonlinearly related to anthropogenic interference, however, and so the observed All-Hist sea ice must be adjusted in a way that is consistent with the new sea surface temperatures (e.g. Stone and Pall, 2019). The project intends to explore numerous plausible estimates of the attributable ocean warming. To this end, in this special issue Stone et al. (2018) use year-to-year variations in event attribution results based on one attributable warming estimate to determine deviations to that scenario estimate which are most likely to yield informative further attributable warming patterns for use in the project.
The C20C + D&A project is flexible in further aspects. For instance the observationally based datasets defining the various radiative and surface conditions are not specified in the protocols, and thus are likely to differ from model to model (Section 3). The primary reasoning for this approach is scientific: it provides material for exploratory analyses which may identify important issues that were not known beforehand and thus could not explicitly be built into the experiment design. An example of how this has proved useful will be described in Section 4.

Current status
Climate models that have submitted All-Hist simulations and some form of Nat-Hist simulations (including NonGHG-Hist simulations, representing a world in which only historical anthropogenic greenhouse gas emissions had been averted) are listed in Table 1. These models range from what was average spatial resolution in CMIP5, to the highest resolution models that contributed historicalNat simulations to CMIP5 (approximately 9000 km 2 ), to much finer resolution not yet feasible at this scale with atmosphere-land-ocean models. Descriptions of some of these contributions are included in this special issue Stone et al., 2018;Sun et al., 2018).
Details of the historical scenarios which have currently been explored are listed in Table 2. In keeping with the flexible design of the project, specifics of these simulations vary from model to model (Table 3). Differences include whether prescribed aerosol burdens or emissions have been used, the data product used for radiative and surface forcing estimates, the year used in lieu of the "circa 1850" for the Nat-Hist settings, and how Nat-Hist land use/cover is treated. (Note that it has been found that the LBNL/CAM5.1 family of models did not in fact include variations in volcanic aerosols in their simulations, contrary to claims in some papers.) All of these climate models have been run under the standard All-Hist scenario (designated "All-Hist/ est1"). Most of the models have also been run under the C20C + D&A benchmark Nat-Hist scenario, designated as Nat-Hist/CMIP5-est1 Fig. 1. A schematic of the experiment design of the C20C + D&A project. For the All-Hist scenario, an ensemble of simulations of an atmosphere/land model, each differing in the initial state, are run forward in time with historical observed radiative forcings and ocean surface conditions. For the Nat-Hist scenario, a similar ensemble of simulations is run, but with anthropogenic radiative forcings set to pre-industrial values and sea surface temperatures cooled by a space-and time-varying estimate of the warming attributable to anthropogenic emissions. (Stone and Pall, 2019). The "CMIP5-est1" part of the label refers to the manner in which the observed ocean conditions have been cooled in relation to the All-Hist/est1 scenario. In this case, the estimate is based on the difference in skin temperature between the historical and his-toricalNat simulations from multiple models in the CMIP5 archive. Currently one other Nat-Hist estimate ("Nat-Hist/obs-trend-1880s-est1", based on extrapolation of observed trends to 1880s conditions, Sun et al. (2018), similar to the extrapolation of Christidis and Stott (2014)) has been explored with multiple models, as well as one estimate of a world in which only anthropogenic greenhouse gas emissions had never occurred ("NonGHG-Hist/HadCM3-p50-est1", estimated from simulations of the HadCM3 climate model, Wolski et al., 2014). The plan though is to explore many further estimates of the attributable anthropogenic warming, with sampling methods being developed . Various strengths and weaknesses of available attributable warming estimates are discussed in Stone and Pall (2019).
The simulations conducted as of March 2019 are summarised in Fig. 2. Many models have a small ensemble of approximately halfcentury-long All-Hist simulations for analysis of long-term trends, and a much larger ensemble of simulations over a shorter recent period for factual-counterfactual comparison. Some models also have long ensembles in a Nat-Hist scenario, allowing for comparison of trends in natural versus anthropogenic worlds, or providing scenario-consistent baselines for referencing factual-counterfactual comparisons. Continually updated lists of simulations, including available output, are provided at http://portal.nersc.gov/c20c/data.html.

Lessons so far
One of the biggest challenges in climate analysis is the evaluation of climate model quality (Flato et al., 2013). The C20C + D&A archive provides both a more urgent requirement for effective evaluation methods and a new data set for testing the effectiveness of those evaluation methods. For instance, Angélil et al. (2016) compare return value estimates from C20C + D&A models and various reanalysis products and find that over much of the world's land areas the reanalysis products are in more disagreement with each other than the C20C + D&A models are with each other, suggesting that current reanalysis products are inadequate to serve a simple role in model evaluation for the purposes of event attribution. In this special issue, Ciavarella et al. (2018) continue the development of model evaluation tools through separate examination of predictable and unpredictable components of interannual variability. However, Herger et al. (2018) note that the dominant contribution to uncertainty in risk-based event attribution analyses may in fact be from the long-term change attributable to anthropogenic emissions, which are poorly constrained by the available observational record (Lott and Stott, 2016), rather than from climatological statistics, as has hitherto been assumed (Bellprat and Doblas-Reyes, 2016). Evaluation of relevant aspects of model quality remains a challenge for event attribution study.
The C20C + D&A archive provides material for understanding the relative contributions of a number of sources of uncertainty in estimates of various aspects of extreme weather. For instance, in this special issue, Dittus et al. (2018) examine the role of ocean surface conditions in temperature and precipitation extremes, measured according to a number of different metrics, across C20C + D&A models. Wehner et al. (2018) compare the role of atmosphere model selection, aerosol forcing implementation, location and event rarity in estimating the anthropogenic contribution to 3-day-average maximum daily temperature. Similarly, Mukherjee et al. (2018) use both CMIP5 and C20C + D&A simulations for a similar investigation of extreme precipitation over India as a function of climate model selection, location, and event rarity. Meanwhile, Kim et al. (2018) and Sun et al. (2018) examine the role of anthropogenic emissions in specific extreme weather events that    were recently experienced. One property of event attribution estimates that has been highlighted by the C20C + D&A simulations is a potentially important role for a feedback involving aerosol forcing. Some areas can exhibit anthropogenically driven attributable increases in the frequency of cold events or decreases in the frequency of hot events in Nat-Hist simulations relative to All-Hist simulations (Angélil et al., 2016;Wehner et al., 2018). These areas are also notable for high anthropogenic aerosol burdens, such as eastern Asia (Ma et al., 2017;Kim et al., 2018), and so far have only been found in a model driven by emissions of aerosol precursors (rather than directly through time-averaged burdens) which can interact with the meteorology. Fig. 3 shows a particular example for the middle of the southern dry season over the Democratic Republic of the Congo. In the MIROC5 model, the long tail of 5-day cold events in July 2015 in the All-Hist/est1 simulations shrinks in the Nat-Hist/CMIP5-est1 simulations, and in fact shrinks so much that it overwhelms the effect of the mean coolling: the simulations suggest that anthropogenic emissions made cold events more likely. This property holds for other years as well. In contrast, the frequency distributions from the CAM5.1-1degree simulations lack a long cold tail, and the difference between the two   scenarios is a simple 1.5 ∘ C mean coolling. Why the difference? It may be due to interactions between aerosol processes and the meteorological state. Emissions of organic aerosol precursors (and, at much lower magnitude, black carbon aerosol precursors) are especially strong in the areas of southern D.R. Congo and northern Angola experiencing their dry season, and these are advected north over the D.R. Congo (Fig. 4). The aerosol burdens, and anthropognic change, are similar in the CAM5.1-1degree and MIROC5 simulations, but the CAM5.1-1degree burdens are prescribed and unable to interact with the meteorology; in contrast, the MIROC5 simulations simulate aerosol processes based on precursor emissions, and thus can interact with the meteorology. This aerosol hypothesis is currently based mostly on the match between areas of high aerosol burdens and areas with unusual attributable extreme temperature changes in the MIROC5 simulations. Even if the aerosol hypothesis is demonstrated in a detailed model experiment and analysis, we should caution that aerosol modelling is still in an early stage of development and the robustness of any aerosol feedback is uncertain; indeed, the difference between the frequency distributions in the HadGEM3-A-N216 simulations, which are also based on aerosol precursor emissions, do not show the same effect (Fig. 3).

Community engagement
The decision to perform simulations under the C20C + D&A project is predicated on an expectation that the data will be rich in information for a variety of purposes, many anticipated by the contributing groups as outlined in this paper and, hopefully, many that are as yet unanticipated. However, the volume of data produced by the C20C + D& A project currently exceeds 3 P B and is continually growing. In order to justify its purpose, therefore, the project needs to leverage the analysis personnel, skills, tools, and other resources of the weather and climate research community at large. Consequently, a major emphasis of the project involves facilitating access and analysis of the data. This is being accomplished through a number of efforts.
First, all output of the simulations (and a number of the inputs too) have been placed on a public data portal accessible through http:// portal.nersc.gov/c20c/data.html. All models have recorded a large set of monthly two-dimensional and three-dimensional output for the atmosphere, while many have done so for the land surface too. Many models have also recorded a large set of daily two-dimensional and three-dimensional variables, as well as a small set of 3-hourly two-dimensional variables, while a small subset have included 3-hourly 3dimensional variables for at least some simulations. Because of the large data volume, larger and less-used files are stored on a tape system while smaller, more frequently accessed files are stored on a disk system. Information on how to access these files, and the status of data publication, is given at http://portal.nersc.gov/c20c/data.html. Data is made freely available, with no registration required, and is subject to the Creative Common License v2.0 (http://creativecommons.org/ licenses/by-nc-sa/2.0/) unless otherwise noted.
Data from some simulations are available through other data archives around the world as well. A subset of monthly data is also available through the NOAA Earth System Research Laboratory FACTS site (http://www.esrl.noaa.gov/psd/repository/facts). In particular, this facility allows online visualisation, visual comparison, and limited analysis.
An additional facilitation effort has been a pair of "hackathons" . These have been week-long meetings of researchers who conducted the project and researchers interested in data analysis, hosted on-site of the data portal and with access to the computational facilities of the National Energy Research Scientific Computing Centre (NERSC). This special journal issue is another element in facilitating research with C20C + D&A data which was proposed in the first hackathon.
C20C + D&A is also engaging with other international efforts in order to develop further understanding of the climate system. For instance, recognising that DAMIP is expected to provide limited material for analysis of extreme weather, C20C + D&A will serve as a kind of "global probability downscaling" tool, using estimates of attributable ocean warming obtained from DAMIP models to produce alternate estimates of the Nat-Hist scenario of what the world might have been like in the absence of human interference (Gillett et al., 2016). Overlap of the All-Hist/est1 reference scenario with other projects, such as the AMIP experiment of CMIP6 DECK (Eyring et al., 2016) and the AMIP20C experiment of the Global Monsoons Modelling Intercomparison Project (GMMIP, Zhou et al., 2016), will hopefully facilitate cross-project investigations.
More relevant for understanding climate change risk, the "Half an Additional degree of warming, Prognosis and Projected Impacts project (HAPPI, Mitchell et al., 2017) is performing a similar experiment to C20C + D&A except examining potential worlds that are 1.5 ∘ C and 2.0 ∘ C warmer than pre-industrial, with the intention of informing negotiations following from the 2015 conference of the parties to the United Nations Framework Convention on Climate Change. There is a large overlap between contributing groups and members, the experiment design of the factual "All-Hist/est1" reference scenario is shared, and HAPPI also uses the C20C + D&A portal for dissemination of model output. Together the two projects provide estimates of weather hazards for natural (similar to pre-industrial), recent/current, 1.5 ∘ C, and 2.0 ∘ C worlds, with warmer worlds also planned, thus providing material for quantification of the weather hazard component of the "Reasons for Concern for Risks Associated with Extreme Weather Events", a summary measure used in the past few assessment reports of the Intergovernmental Panel on Climate Change (Oppenheimer et al., 2014).
Finally, C20C + D&A overlaps with various initiatives to develop operational event attribution systems. For instance, the HadGEM3-A-N216 simulations were performed as part of the EUCLEIA project (https://eucleia.eu), in which the Hadley Centre's HadGEM3-A-N216based attribution system was set up to run on a seasonal cycle in a manner similar to a seasonal forecasting system. The follow on project EUPHEME (https://eupheme.eu) is now taking a step further and moves towards a prototype service, using scientific information from the attribution system to develop attribution "products" for a range of stakeholders. The HadAM3P-N96, CAM5.1-2degree, and CAM5.1-1degree simulations were performed as part of the Weather Risk Attribution Forecast effort, testing workflows for systematic proactive event attribution forecast services (Lawal et al., 2015). The CAM4, CAM5.1.1-degree, and ECHAM5.4 simulations were performed as part of NOAA's Facility for Climate Assessments (FACTS).

Conclusion
The C20C + D&A project represents a novel tool for understanding changing risks under past and current (and, through overlap with the HAPPI project, future) anthropogenic climate change, by providing large samples of atmosphere/land-surface climate model data at high frequency resolution. This special issue of Weather and Climate Extremes lays out details of the C20C + D&A project, its implementation using various climate models, and a collection of analyses that take advantage of its unique properties. The broader research community is invited to make use of the data resource and to advise further on future directions for the project.

Conflict of interest
I declare that my co-authors and I have no competing interests.