A comparison of model ensembles for attributing 2012 West African rainfall

In 2012, heavy rainfall resulted in flooding and devastating impacts across West Africa. With many people highly vulnerable to such events in this region, this study investigates whether anthropogenic climate change has influenced such heavy precipitation events. We use a probabilistic event attribution approach to assess the contribution of anthropogenic greenhouse gas emissions, by comparing the probability of such an event occurring in climate model simulations with all known climate forcings to those where natural forcings only are simulated. An ensemble of simulations from 10 models from the Coupled Model Intercomparison Project Phase 5 (CMIP5) is compared to two much larger ensembles of atmosphere-only simulations, from the Met Office model HadGEM3-A and from weather@home with a regional version of HadAM3P. These are used to assess whether the choice of model ensemble influences the attribution statement that can be made. Results show that anthropogenic greenhouse gas emissions have decreased the probability of high precipitation across most of the model ensembles. However, the magnitude and confidence intervals of the decrease depend on the ensemble used, with more certainty in the magnitude in the atmosphere-only model ensembles due to larger ensemble sizes from single models with more constrained simulations. Certainty is greatly decreased when considering a CMIP5 ensemble that can represent the relevant teleconnections due to a decrease in ensemble members. An increase in probability of high precipitation in HadGEM3-A using the observed trend in sea surface temperatures (SSTs) for natural simulations highlights the need to ensure that estimates of natural SSTs are consistent with observed trends in order for results to be robust. Further work is needed to establish how anthropogenic forcings are affecting the rainfall processes in these simulations in order to better understand the differences in the overall effect.


Introduction
In 2012, rainfall over 150% above normal for the period from late July to late August was reported across many countries in West Africa (ACMAD 2012). This year was characterised by an anomalously wet monsoon, with an earlier than normal onset and possible links to the Madden-Julian Oscillation, El Niño Southern Oscillation (ENSO) and strong African Easterly Wave activity (Cornforth 2013). This led to more than 1.5 million people being affected by floods in countries across West and Central Africa, with deaths in some countries (OCHA 2012) and hundreds of thousands of people made homeless (IRIN 2012). When events such as this occur, they can raise questions about how they have been affected by climate change.
Following its proposal by Allen (2003), the science of extreme event attribution assesses the impact of anthropogenic climate change on the probabilities of individual events. This could be relevant for informing adaptation strategies  and addressing impacts under international climate policy (James et al 2014), which may be particularly important in regions such as Africa. Since the first event attribution study over a decade ago of the 2003 European heatwave (Stott et al 2004), there have been many more studies (as reported, for example, in the annual Bulletin of the American Meteorological Society reports, e.g. Herring et al 2015). However, relatively few have been on events in Techniques for assessing how probabilities of extremes have changed due to anthropogenic climate change can be based on observational trends or climate model simulations. Climate model studies compare the probability of a particular extreme in the actual world, simulated with all known external climate forcings, to that in a world with a particular climate forcing removed, such as anthropogenic emissions. These can either use coupled climate models (e.g. Bellprat et al 2015) or atmosphere-only simulations (e.g. Lott et al 2013). Coupled simulations assess the change in probability of an event under general climate conditions with all natural variability included, whereas atmosphere-only simulations assess the change in probability of an event given the actual sea surface temperatures (SSTs). When using atmosphere-only models to produce estimates of the world without anthropogenic emissions, the influence of emissions has to be removed from the SSTs as well as the atmosphere. This can provide an additional challenge and level of uncertainty (Christidis and Stott 2014).
Very few studies have compared the information that coupled and atmosphere-only models can provide about the influence of climate change on an event. Lewis and Karoly (2015) compared a coupled multi-model ensemble of Coupled Model Intercomparison Project Phase 5 (CMIP5) simulations to two different atmosphere-only model ensembles in their study of extreme rainfall in Australia in 2010-2012. They found that anthropogenic contributions to the event depended on the model used, with more robust results using atmosphere-only models than the CMIP5 ensemble. This highlights the need for comparisons of different model ensembles in order to produce robust event attribution results. Here we carry out a similar analysis of the anthropogenic influence on 2012 precipitation in the West Sahel, using a multi-model ensemble of coupled CMIP5 simulations and two ensembles of atmosphere-only simulations from the Hadley Centre Global Environment Model version 3-A (HadGEM3-A) and a regional version of the Hadley Centre Atmospheric general circulation Model version 3P (HadAM3P), to establish if attribution statements about the event are consistent across the ensembles.
The paper is structured as follows: section 1 describes the observational and model datasets, model validation methods and event attribution analysis used. In section 2 we present the results of the model evaluation and anthropogenic influences on 2012 precipitation, and these are compared and discussed in section 3. Conclusions can be found in section 4.

Region and observations
The region considered is the West Sahel, as defined by Rowell et al (2016), as this encompasses many of the areas that were affected by the rainfall of 2012. This region is shown in figure 1 and defined by 16°W to 5°W and 12°N to 18°N. Throughout this study, monthly mean precipitation is averaged over June-July-August (JJA) for each year. Although this period misses the end of the rainy season in the region in September, this standard season definition allows for comparison

Model ensembles
For each model, or multi-model ensemble, two ensembles are required. One comprises simulations with all known external climate forcings included (ALL) and the other with only natural forcings (NAT).

CMIP5
From the CMIP5 simulations (Taylor et al 2012) an ensemble comprising ten models is used, selected on the basis of having monthly precipitation data available for both all forcings and natural forcings simulations including the year 2012. The models and numbers of simulations used are detailed in table 1. For the ALL ensemble, historical simulations were extended to include 2012 using extension simulations (historicalExt) where possible, else using RCP8.5 simulations. This emissions scenario was chosen as it is closest to recent emissions observations (Sanford et al 2014). The NATensemble comprises historicalNat simulations. All the models used include greenhouse gas, aerosol, solar and volcanic forcings (for more detail about the forcings see references in table 1).

HadGEM3-A
An ensemble of simulations is used from HadGEM3-A . These are the September 2011-August 2012 experiments from Christidis and Stott (2014). This atmosphere-only model is forced with observed HadISST SSTs and sea ice coverage and run at N96 horizontal resolution. Well-mixed greenhouse gases, aerosols, ozone, land-use changes, volcanic and solar forcings are used . Five all forcings runs from 1960-2010 are used for validation. For the year 2012 there are 600 ALL ensemble members and four NAT ensembles with 600 members in each. Each NAT ensemble uses a different estimate of the anthropogenic influence on SSTs, from HadGEM2-ES, CanESM2, CSIRO Mk3.6.0 and using the observed trend. For each model, SST changes are estimated using the difference in temperature averaged over 2003-2012 between the mean of the ALL and NAT simulations. The observed trend is calculated using a linear fit to a time series of HadISST data since 1870. This change is assumed to be caused by anthropogenic forcings (Christidis and Stott 2014). These model and observed changes are calculated for each month and gridpoint and subtracted from the HadISST data to approximate the natural boundary conditions to the atmosphere.

HadAM3P
The final ensemble used is from the weather@home modelling system (Massey et al 2015), which allows a very large ensemble of simulations to be produced. The model used for these simulations is a regional version of HadAM3P over Africa, at N96 horizontal resolution. This atmosphere-only model is forced by Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) SSTs and sea ice coverage (Donlon All model data are regridded to the observational grid (1.0°Â 1.0°). The mean of each ALL ensemble is then bias corrected to that of the observations, over the longest time period shared by the two datasets. This bias correction is also applied to the NAT ensemble.
For the CMIP5 ensemble, this is done for each individual model ensemble. The variance is not biascorrected due to the relatively short data time periods available. The ALL ensembles are evaluated with respect to the observations by analysing the long-term trends, interannual variability and power spectra. Before analysing the interannual variability and power spectra, time series are detrended using a linear leastsquares fit, except for the HadAM3P simulations as these are each run for a single year.

Teleconnection analysis
The ability of each ALL ensemble to reproduce significant observed teleconnections is assessed, as these are a key driver of rainfall variability. The SST regions considered are from Rowell (2013) and defined in table 2. Teleconnections are first assessed between GPCC precipitation and HadISST SST observations in the six regions, using JJA means of the same year for both. The Pearson correlation coefficient is calculated for each teleconnection and the significance assessed at a 95% confidence level. For each model ensemble, teleconnections are assessed for the longest available time period. For the HadGEM3-A and HadAM3P ensembles, these are analysed in the same way as the observations using HadISST and OSTIA SST observations respectively. For the CMIP5 ensemble, the teleconnections are analysed for each model ensemble, using the corresponding SST data, and the analysis used to create a reduced CMIP5 ensemble (CMIP5 tc). Where a model has significant teleconnection of the opposite sign to a significant correlation in the observations, for any of the regions, this model is removed from the analysis. For each remaining model, ensemble members in the incorrect phase for any significant teleconnections in the model years used in the event attribution analysis are removed. In this way, smaller ensembles that represent the relevant teleconnections, ALL TC and NAT TC , are produced in a similar way as by Bellprat et al (2015).

Event attribution
The ALL and NAT ensembles for 2012 are bootstrapped 1000 times. For the CMIP5 ensemble, which has fewer members, a period of 5 years (2008-2012) is used instead of the single year in order to reduce uncertainty. This is a reasonable compromise as the climate is approximately stationary over this short period and these are coupled simulations so do not represent actual calendar years. The bootstrapped ALL and NAT ensembles have gamma distributions fitted, as this gives a reasonable fit for monthly mean precipitation (Husak et al 2007). The probability of exceeding the observed 2012 value is then calculated for the ALL distribution (P ALL ) and the NAT distribution (P NAT ).
The Difference of Binary Logarithms of Probability (DBLP, Lott and Stott 2016) is calculated to analyse the difference in the probabilities. This is defined as , while being useful when P ALL > P NAT so FAR is between 0 and 1, is not well-defined when P ALL < P NAT and FAR is negative (Hansen et al 2014). DBLP has the benefit of being well-defined when positive or negative. It is a symmetrical index tending to positive or negative infinity if either probability is zero, while retaining ease of understanding. For example, DBLP = 1 corresponds to a doubling in probability due to climate change and DBLP = 2 is a 4 times increase. DBLP = À1 corresponds to a halving of probability to climate change and DBLP = À2 a quartering, etc. The bootstrapping enables an estimate of uncertainty in the DBLP distribution to be generated.  Figure 1 shows the observed 2012 precipitation anomaly, which is positive across most of the West Sahel region. The observational times series (figure 2 (a)) shows that precipitation is at its highest in 2012, at 156 mm month −1 , since 1964, at the beginning of the well-documented decreasing precipitation trend and subsequent recovery (e.g. Dai et al 2004).

Model evaluation
3.2.1. Variability Figure 2 compares the model ALL ensembles and observations. The CMIP5 ensemble reasonably simulates the long-term precipitation trend, but fails to capture the drought period and recovery since the 1960s ( figure 2(a)). However, the ensemble interannual variability captures the spread of the observations well ( figure 2(b)) and the power spectra of the observations lies within the ensemble spread, except for very short periods ( figure 2(c)). The HadGEM3-A and HadAM3P simulations capture the trends in precipitation over recent decades much more reliably (figures 2(d) and (g)) and the interannual variability is well-represented (figures 2(e) and (h)). This is to be expected from the more constrained atmosphere-only simulations compared to the coupled simulations. The power spectrum of the observations is also within the spread of the HadGEM3-A spectra ensemble (figure 2(f)).
This qualitative comparison of the interannual variability and power spectra suggest that the variability in the observations is reasonably well captured by all the model ensembles analysed, which supports only biascorrecting the mean of the model data. Table 3 summarises the correlations between West Sahel precipitation and SSTs in six teleconnection regions. In the observations, the EqEAtl, CIndO, Niño3.4 and IOD regions all exhibit negative correlations with precipitation. The TAD has a positive correlation and the Med correlation is not significant.

Teleconnections
HadGEM3-A only has three significant teleconnection regions; the Niño3.4 and IOD are of the same sign as the observations but the EqEAtl has is a positive correlation which is negative in the observations. HadAM3P has significant correlations for all regions, but these are of opposite sign to the observations for both the EqEAtl and the TAD. Similar analysis for each of the CMIP5 models shows only one model has a teleconnection that is significant and opposite in sign to the observations: CSIRO Mk3.6.0 with a positive IOD relation. This model was removed from the ensemble and the data from the remaining models removed if in the wrong teleconnection phases for years 2008-2012, as per section 1.3.2, to create the ALL TC and NAT TC ensembles. In general, most of the CMIP5 models represent the signs of the EqEAtl and CIndO teleconnections correctly, and all models correctly simulate the positive TAD correlation, most also with very similar magnitudes. However the Niño3.4 and IOD correlations are less well captured and not significant in most of the models.

Distributions of 2012 precipitation and DBLP
The CMIP5 ensemble NAT distribution for 2008-2012 is shifted slightly higher than the ALL distribution, with the observed value in the upper part of the distributions ( figure 3(a)). With the teleconnection analysis these distributions become narrower (figure 3 (b)), as expected since some of the SST variability has been removed. The distributions appear much closer together, with the NAT still slightly higher than the ALL. The observed value is situated further towards the tails of the distributions in this case. The DBLP distributions are negative for both of these cases, with the CMIP5 median DBLP at À1.5 ( figure 4(a)), corresponding to a probability of high precipitation in the all forcings world (P ALL ) around one third of that in the natural forcings world (P NAT ). The CMIP5 tc DBLP distribution has much greater spread (figure 4 (b)). The median is lower at À1.8, but distribution ranges from approximately 0 to À4, representing an uncertainty in the change in probability of high precipitation due to climate change varying from no change to a 1/16 reduction.
For the HadAM3P ensemble, the 2012 distributions show a higher NAT distribution than ALL with the observed value in the upper tail of the ALL distribution ( figure 3(c)). The DBLP distribution is again negative but is much narrower than the CMIP5 distribution ( figure 4(c)). The median is À2.4, corresponding to a P ALL around 1/5 of P NAT .
In the HadGEM3-A case there are four different NAT ensembles compared with the ALL ensemble (figures 3(d)-(g)). In the three model cases, the NAT ensemble is shifted higher than the ALL ensemble. However, using the natural SSTs from the observed trend ( figure 3(g)) produces the only case where the NAT distribution is lower than the ALL distribution. The DBLP distributions are all relatively narrow (figures 4(d)-(g)). The three model NAT ensembles all show negative DBLPs, with medians between À2.7 and À1.7. The Obstrend DBLP distribution is positive with a median of 1, corresponding to a doubling of the probability of high precipitation compared to the natural world.

Discussion
Across most of the model ensembles, climate change decreased the probability of high precipitation in the West Sahel in JJA 2012. However, results from the coupled and atmosphere-only models cannot be directly compared as they ask different questions. While the CMIP5 ensemble assesses the change in probability given all SST variability (with some limitation to this variability by filtering teleconnections), the HadAM3P and HadGEM3-A ensembles assess the change in probability given the actual SSTs at the time of the event (with estimated SSTs for the natural world). This partly explains why the Difference of Binary Logarithms of Probabilities (DBLP) uncertainty distributions are much narrower in the atmosphere-only model cases, as the simulations are much more constrained. It is also due to the greater numbers of ensemble members and use of only one model. The CMIP5 ensemble with teleconnection analysis provides something in between these two cases, by excluding members which incorrectly simulate relevant teleconnections and also those in incorrect SST phases for the event of 2012 (estimated by years 2008-2012). By constraining the distributions we would expect a narrower DBLP distribution than with all the CMIP5 simulations included. However this appears to have been counteracted by the substantial decrease in the number of data points, leading to much greater uncertainty. Across West Africa and the wider Sahel region there is much uncertainty in climate model projections of precipitation (e.g. Biasutti et al 2008, Druyan 2011 and disagreement about the role of anthropogenic forcings in altering the climate. The direct effect of carbon dioxide in the atmosphere could increase precipitation by enhancing monsoon flow (Skinner et al 2012, Biasutti 2013. Dong and Sutton (2015) found greenhouse gases to be the main cause of the recovery of Sahel July-August-September rainfall using HadGEM3-A, explained by the increase in land-sea temperature contrast. This would be consistent with a positive DBLP. However, warming SSTs in different regions have been shown to decrease precipitation in the Sahel (e.g. Bader and Latif 2003, Rodríguez-Fonseca et al 2015, Biasutti 2013) by weakening monsoon flow (Giannini et al 2003), which would be consistent with a negative DBLP. Tropical ocean warming could also lead to increased precipitation if sufficient moisture is available to reach an increased convection threshold (Giannini et al 2013). Other anthropogenic emissions may also have an impact on precipitation, for example sulphur dioxide emissions may cause a decrease in precipitation in the Sahel   Environ. Res. Lett. 12 (2017) 014019 (Dong et al 2014), and aerosols were shown to producing drying around 1940-1980(Ackerley et al 2011, which would act to decrease the DBLP. When interpreting these attribution results it must be noted that the HadGEM3-A ensemble does not represent the EqEAtl teleconnection correctly, and misses the CIndO and TAD correlations (table 3). HadAM3P also has the EqEAtl and TAD correlations in the opposite direction, which will affect the precipitation processes in the model. The CMIP5 ensemble fails to capture the long-term trends in precipitation in the region. Further analysis could consider how well the models represent the relevant dynamical phenomena associated with extreme rainfall in the region, to ensure they are suitable for an attribution study (Mitchell et al 2016). Future work is also needed to establish how anthropogenic forcings are affecting the rainfall processes in the simulations. This will help us to further understand the differences in the overall effect on precipitation, so we can be confident that events such as those of 2012 are genuinely less likely to happen in the future, as the majority of the models show.
The HadGEM3-A ensemble with natural SSTs from the observed trend was the only ensemble to produce a positive DBLP distribution. Figure 5 shows JJA mean SST time series for each model and the observations used to calculate the SST changes used in the HadGEM3-A simulation, averaged over each teleconnection region which is significant in HadGEM3-A (table 2). Considering the trend in HadISST and the difference between ALL and NAT model simulations over recent years, the EqEAtl region shows a positive difference between the ALL and NAT simulations in all the models which is consistent with the increasing trend in HadISST. In the Niño 3.4 region, ALL simulations are warmer than NATsimulations over recent years in all the models, but with greater magnitudes than the observed trend. In the HadGEM3-A simulations, a greater anthropogenic SST contribution would be subtracted from the HadISST observations in the model cases compared to the Obstrend case. This would lead to higher NAT precipitation because of the positive correlation. This is consistent with the 2012 distributions (figures 3(d)-(g)), with the highest NAT distributions corresponding to the models with greatest differences in ALL and NAT Niño3.4 SSTs. The IOD region has a positive HadISST trend with a similar SST difference magnitude in CanESM2. However, CSIRO Mk 3.6.0 and HadGEM2-ES both show the NAT SSTs to be slightly higher than the ALL SSTs. We would expect this to lead to higher precipitation in the Obstrend NAT distribution than in these two models. This is not the case (figures 3(e)-(g)), but this effect may be counteracted by the Niño3.4 influence where the teleconnection is of a greater magnitude (table 3). The NAT SSTs in the HAM3P simulations were also estimated using CMIP5 simulations and produce a similar DBLP distribution (figure 4(c)) to the HadGEM3-A distributions using model estimates of NAT SSTs.
This shows the importance of model SST changes being consistent with observations if results are to be robust. Assessing whether model simulations of SST changes due to anthropogenic climate change are consistent with observed SST trends is one way of validating natural forcings simulations. However it Environ. Res. Lett. 12 (2017) 014019 also needs to be considered that long-term trends in observed SSTs may not only be due to anthropogenic forcings. Being able to evaluate natural simulations is obviously a difficulty, as observations do not exist for a world without anthropogenic climate change, and also often do not exist for the world prior to when anthropogenic forcings began to have an influence.

Conclusions
There is much disagreement between climate model projections about the magnitude and sign of future changes in precipitation in the West Sahel. This study contributes to climate change understanding in this region by analysing the change in probability due to anthropogenic emissions of high precipitation in June-July-August 2012 using three model ensembles. This is one of only a few studies to have analysed results from both coupled and atmosphere-only model simulations, but this is important to generate greater understanding of changes in the event due to anthropogenic forcings. Results show a decrease in the probability of high precipitation across the majority of the model ensembles: the CMIP5 coupled multi-model ensemble, the weather@home HadAM3P atmosphere-only ensemble, and the HadGEM3-A ensembles when natural SSTs are estimated using models. The decreases are between a factor of 0 and 16, and signify a decrease in probability under both general climate conditions and those specific for 2012. The uncertainty in the effect of climate change clearly depends on the model ensemble used, with greater certainty in atmosphere-only models. These models, however, do not completely represent all the observed SST-precipitation teleconnections and so results must be caveated by this. Creating reduced ensembles of CMIP5 simulations where teleconnections were well-represented greatly reduced the number of ensemble members and therefore the certainty in the result.
However, when the observed SST trend is used to estimate the natural world SSTs to force the HadGEM3-A model, climate change increases the probability of high precipitation by a factor of 2. This appears to be due to differing trends in SSTs between the models and observations in the Niño 3.4 region. It is difficult to determine whether this divergence is a product of model errors or natural variability in the observations. This emphasises the need to ensure that modelled climate processes are consistent with observed changes.
Further work is needed to understand how anthropogenic forcings are affecting the rainfall processes in the different models, in order to understand why and how the different model ensembles produce different estimates of the change in probability of high precipitation. This study demonstrates the need for comparisons of model ensembles in event attribution studies, in order to gain understanding of the robustness of results, and to make use of evaluation techniques to ensure that natural simulations are consistent with observations.