EVALUATING CLIMATE MODELS WITH AN AFRICAN LENS

In recent decades, there has been remarkable progress made in climate modeling (Gates et al. 1990; Gates et al. 1995; McAvaney et al. 2001; Randall et al. 2007), but with limited discernible improvement over Africa (Flato et al. 2013; Rowell 2013; Watterson et al. 2014). This is most frequently highlighted with reference to the Sahel, where many models fail to capture the magnitude of the 1970s–1980s drought (Biasutti 2013; Roehrig et al. 2013; Vellinga et al. 2016). Other African regions also present demanding tests for climate models. Organized convection (Jackson et al. 2009; Marsham et al. 2013; Birch and Parker 2014) and sharp gradients in temperature, soil moisture, and potential vorticity (Cook 1999; Thorncroft and Blackburn 1999) are problematic given large grid spacing. The presence of strong land–atmosphere interactions (Koster et al. 2004; Taylor et al. 2013), large aerosol emissions from arid regions (Engelstaedter et al. 2006; Allen et al. 2013), influences from global ocean basins (Folland 1986; Rowell 2013), and prominent modes of interannual and interdecadal rainfall variability (Giannini et al. 2008) exacerbates the challenge. Furthermore, some of these features and systems are poorly understood as a result of limited access to readily available observations (Fig. 1) and research attention. None of the current generation of general circulation models (GCMs) was built in Africa (Watterson et al. 2014), and the relevant processes operating there have not always been the first priority for model development. Now, there are growing efforts to bolster African climate science (Shongwe 2014), to run and evaluate regional and variable-resolution models over AFFILIATIONS: James—Climate Research Laboratory, Centre for the Environment, and Environmental Change Institute, Oxford University, Oxford, United Kingdom, and Department of Oceanography, University of Cape Town, Cape Town, South Africa; Washington and haRt—Climate Research Laboratory, Centre for the Environment, Oxford University, Oxford, United Kingdom; aBiodun—Climate Systems Analysis Group, University of Cape Town, Cape Town, South Africa; Kay and senioR—Met Office Hadley Centre, Exeter, United Kingdom; mutemi—University of Nairobi, and IGAD Climate Prediction and Applications Centre, Nairobi, Kenya; poKam—University of Yaoundé I, Yaoundé, Cameroon; aRtan—IGAD Climate Prediction and Applications Centre, Nairobi, Kenya CORRESPONDING AUTHOR: Rachel James, rachel.james@eci.ox.ac.uk

I n recent decades, there has been remarkable progress made in climate modeling (Gates et al. 1990; Gates et al. 1995;McAvaney et al. 2001;Randall et al. 2007), but with limited discernible improvement over Africa (Flato et al. 2013;Rowell 2013;Watterson et al. 2014). This is most frequently highlighted with reference to the Sahel, where many models fail to capture the magnitude of the 1970s-1980s drought (Biasutti 2013;Roehrig et al. 2013;Vellinga et al. 2016). Other African regions also present demanding tests for climate models. Organized convection (Jackson et al. 2009;Marsham et al. 2013;Birch and Parker 2014) and sharp gradients in temperature, soil moisture, and potential vorticity (Cook 1999;Thorncroft and Blackburn 1999) are problematic given large grid spacing. The presence of strong land-atmosphere interactions (Koster et al. 2004;Taylor et al. 2013), large aerosol emissions from arid regions (Engelstaedter et al. 2006;Allen et al. 2013), influences from global ocean basins (Folland 1986;Rowell 2013), and prominent modes of interannual and interdecadal rainfall variability (Giannini et al. 2008) exacerbates the challenge. Furthermore, some of these features and systems are poorly understood as a result of limited access to readily available observations ( Fig. 1) and research attention.
None of the current generation of general circulation models (GCMs) was built in Africa (Watterson et al. 2014), and the relevant processes operating there have not always been the first priority for model development. Now, there are growing efforts to bolster African climate science (Shongwe 2014), to run and evaluate regional and variable-resolution models over Africa (e.g., Endris et al. 2013;Engelbrecht et al. 2009Engelbrecht et al. , 2015Gbobaniyi et al. 2014;Kalognomou et al. 2013), to develop the first global models in African research institutions (Engelbrecht et al. 2016), and to improve models from international modeling centers over Africa (Graham 2014;Senior et al. 2016;R. A. Stratton et al. 2017, unpublished manuscript). There is also a wealth of relevant expertise in African meteorological services and universities, with many scientists focusing on observations or weather time scales who have the potential to contribute to climate model development-particularly through evaluation.
Model development most commonly progresses through hypothesis development and sensitivity testing, including running climate models on weather and seasonal time scales (e.g., Rodwell and Palmer 2007) and adding known missing physics. Dedicated field campaigns are also important in data-sparse regions, and for processes that are not well monitored, providing boundary conditions and observations for parameter development (e.g., Redelsperger et al. 2006;Washington et al. 2012;Stevens et al. 2016). Another strategy is top-down model evaluation and intercomparison. Based on the contention that model comparison will lead to improvement, there has been an impressive effort to make data from different modeling groups publically available through the Coupled Model Intercomparison Project (CMIP; Meehl et al. 2000;Eyring et al. 2016a).
As well as informing model development, model intercomparison is also seen as a route toward better understanding model information for decision-making. Yet, so far the CMIP project has not resulted in improved performance for Africa (Flato et al. 2013;Rowell 2013;Whittleston et al. 2017), and it is still very difficult to make conclusions about which of the models, if any, might generate more credible projections (e.g., Druyan 2011;. The pace at which new experiments are generated can exceed the resources to analyze and understand them, particularly in Africa, where capacity remains limited. As part of phase 6 of CMIP (CMIP6), there is a drive to advance evaluation through the routine deployment of community-based analysis tools to document and benchmark model behavior (Eyring et al. 2016b). This will initially build on existing repositories of diagnostics ( Bodas-Salcedo et al. 2011;Luo et al. 2012;Phillips et al. 2014;Eyring et al. 2015), but, recognizing that there are important processes that require innovation in evaluation tools, the relevant World Climate Research Programme working groups are encouraging experts to develop and contribute additional analysis codes, and are working on developing the infrastructure necessary to incorporate these tools (Eyring et al. 2016b). This represents an excellent opportunity to reconsider how models should best be evaluated, particularly for Africa.
Model evaluation might conceptually be divided into (i) analysis of physical processes and (ii) quantification of performance. On a global scale, important work has been done to investigate model representation of clouds and water vapor (e.g., Jiang et al. 2012;Klein et al. 2013), tropical circulation (e.g., Niznik and Lintner 2013;Oueslati and Bellon 2015), and modes of variability (e.g., Guilyardi et al. 2009;Kim et al. 2009). This process-oriented evaluation is fundamental to inform model development. On a regional scale, particularly for understudied regions in Africa, existing evaluation work is largely restricted to the quantification of models' similarity to observations. The ability to reproduce the historical climatology is a fundamental "validation" check, and statistics have been developed that can impressively summarize comparisons of large multivariate datasets into single plots (Taylor 2001) and scalars (Watterson 1996). These "skill scores" have important applications, for tracking model development over time (Reichler and Kim 2008) and for comparing or ranking models (Gleckler et al. 2008;Schaller et al. 2011;Watterson et al. 2014). However, while performance evaluations can reveal symptoms of model problems, and comparison with observations has demonstrated some large biases over Africa (e.g., Roehrig et al. 2013), performance metrics are less informative for illuminating causes and potential fixes (Gleckler et al. 2008). Identifying metrics to rank models, or constrain future projections, is also very challenging for regions and processes that are poorly understood (Collins 2017;Knutti et al. 2017) and poorly observed (Fig. 1), and culling ensembles based on existing metrics for Africa fails to reduce the range of uncertainty in precipitation projections (Rowell et al. 2016).
Here, we argue that the evaluation of climate models over Africa needs to move beyond scalar metrics, validation, and checks on performance toward investigating how models simulate processes on a regional scale. A better understanding of how the models behave is fundamental to help determine how to improve them, and it is also an important way to assess their adequacy for future projection (James et al. 2015;Rowell et al. 2015;Baumberger et al. 2017). Engagement with African experts is key to identify and analyze the processes that matter regionally. In this paper, we draw on expertise from across the continent to explore the potential for progress through process-based evaluation for Central, East, southern, and West Africa, as well as at a pan-African scale. For each region, we review existing model evaluation efforts, identify important processes, and present an example of process-based evaluation.
The analysis is applied to the Met Office Unified Model (MetUM), at the beginning of a four-year effort to improve its ability to simulate African climate [the Improving Model Processes for African Climate (IMPALA) project, part of the Future Climate for Africa program; www.futureclimateafrica.org]. The MetUM is a fitting example, since it is already subject to well-established evaluation procedures, and there is a good baseline understanding of the model's performance (see the sidebar on "Baseline understanding of model performance over Africa"), yet important gaps exist in the analysis of the processes that matter for Africa. The model has been developed in the United Kingdom, and this paper illustrates how deliberate and explicit inclusion of a team of experts in Africa can advance region-specific evaluation.
We hope the examples presented here will provoke discussion about what other processes and diagnostics should be examined, and promote the development of a model evaluation "hub" for Africa. One successful example of this approach is the Working Group on Numerical Experimentation's (WGNE) Madden-Julian oscillation (MJO) task force, which aims to facilitate improvements in the simulation of the MJO in weather and climate models (Wheeler et al. 2013). By collectively identifying priorities for evaluation, and sharing research insights about model behavior and analysis methods, a model evaluation hub on African climate could fast track research, and move toward identifying and developing diagnostics to be incorporated into the CMIP evaluation toolkit (Eyring et al. 2016b) and routinely applied across models, potentially delivering a step change in our understanding of climate models over Africa.
otherwise stated, data are presented for the 35-yr period 1979-2013. Data from the atmosphere-only simulation, forced with observed sea surface temperatures (SSTs) for , were also analyzed, but the main focus here will be on the coupled run.
Observations. Unfortunately, there is a dearth of readily accessible observational data for Africa. As a result, many of the commonly used archives have large gaps in space and time (e.g., Washington et al. 2006;Rowell 2013;; Fig. 1), inhibiting both our understanding of historical climate and our evaluation of climate model simulations. Observational uncertainty cannot be eliminated, but it can be partly addressed through careful selection of datasets for specific applications; for example, precipitation might be better evaluated using mid-to late twentiethcentury climatological estimates, for which gauge data (e.g., Nicholson 1986) are more readily available. It might also be advisable to compare multiple sources of data, including ground-based and satellite records, as well as proxies; for example, river flow

BASELINE UNDERSTANDING OF MODEL PERFORMANCE OVER AFRICA
E xisting assessment of  has highlighted some largescale biases of potential relevance to Africa, providing a useful basis for the region-specific analysis. The Southern Ocean absorbs too much incoming solar radiation and has associated biases in SSTs, winds, and precipitation ). The ITCZ is too far south, which may be linked to this Southern Hemisphere albedo error , although targeted albedo corrections do not provide a simple fix (Hawcroft et al. 2016). In the Indian Ocean, the convection is too strong, which has been connected with the entire Congo basin is drier than GPCP during DJF, although there is large observational uncertainty for this region ). Figure SB1 also shows that southern Africa is too wet for much of the year, as is common among CMIP models (Christensen et al. 2007). There are large biases in the Indian Ocean, and East Africa is too wet during the short rains season (October-December) and too dry during the long rains season (March-May), which has also been found by other CMIP5 models (Yang et al. 2014;Tierney et al. 2015).
the long-standing dry bias in the Indian summer monsoon, and may be linked to a similar dry bias in the WAM. Figure SB1 displays precipitation biases for . Similar plots are routinely output as part of a MetUM assessment, and they are typical of many evaluation packages used for other models. Figure SB1 therefore gives an indication of the kind of inferences typically available without further process-based assessment. A dry bias occurs in the Sahel during the WAM, and this region is also too dry in MAM and SON. The bias in SON extends to the west of Central Africa, and almost data could be used as a proxy for rainfall (e.g., Todd and Washington 2004). Reanalysis data are another important resource to tap when investigating climate where there is a lack of, or gaps in, the tropospheric circulation records; although, in data-sparse regions the output may be dominated by modeled processes.
In this paper we have prioritized just one reference dataset for each variable in order to obtain consistency across regions, drawing on previous analyses to select datasets deemed to be most reliable (e.g., Parker et al. 2011), and repeating the analysis with additional reference datasets in regions with high observational uncertainty. The datasets are summarized in Table 1.

APPROACHES TO PROCESS-BASED EVAL-UATION.
Approaches to process-based evaluation are considered for four African regions-Central, East, southern, and West Africa-moving from the least to the most studied domain. First, though, we take a pan-African approach: perhaps the newest frontier for climate science in Africa. While many studies do consider Africa as a whole, there has been limited consideration of the processes that act across the continent and thus connect its distinct regional climates.
For each domain we review existing work to outline what is already known about climate model performance, identify important processes for evaluation, demonstrate an example of a process-based approach applied to HadGEM3-GC2, and discuss the lessons learned: for this particular model and for methods to evaluate other models.

Pan-African.
Climate models with large grid spacing have difficulty reproducing the exact spatial distribution of variables such as precipitation (Dai 2006;Levy et al. 2012), even if they show some skill in simulating thermodynamic responses (Allen and Ingram 2002;Shepherd 2014) and modes of variability (Guilyardi et al. 2009), and some consistency in large-scale circulation changes (Held and Soden 2006;Vecchi and Soden 2007). Location-based assessment such as the calculation of bias and root-mean-square error is therefore quite a rigid test of model ability. A larger-scale analysis might extract more meaningful information about this type of behavior and thus there is logic in beginning at the pan-African scale.
Relevant pan-African features include the intertropical convergence zone (ITCZ), African easterly jet (AEJ), tropical easterly jet (TEJ), and certain teleconnections. While the ITCZ is not as coherent and uniform as theory might suggest (Nicholson 2009), the meridional migration of tropical convection is nevertheless an underpinning driver of African climate (Waliser and Gautier 1993). The AEJ, often noted as an important influence on West African climate (Nicholson 2009; e.g., Thorncroft et al. 2011), including its role in modulating African easterly waves (AEWs) and mesoscale convective systems during the West African monsoon (WAM; June-September) (Mekonnen et al. 2006;Leroux and Hall 2009), is also important for Central Africa and has a southern component during September-November (SON;Nicholson and Grist 2003;Jackson et al. 2009;Adebiyi and Zuidema 2016). The TEJ, which is well known to play an important role in African climate during boreal summer (Koteswaram 1958;Rowell 2001;Caminade et al. 2006), may also manifest south of the equator in January and February (Nicholson and Grist 2003). Teleconnections to global ocean basins also often affect more than one region (Giannini et al. 2008), the most well documented being the dipole between East and southern Africa during El Niño-Southern Oscillation (ENSO) events (Goddard and Graham 1999;Indeje et al. 2000).
In the case of HadGEM3-GC2, existing analysis has already shown which areas of the continent have too much or too little precipitation (Fig. SB1), but an analysis of pan-African processes might help explain and contextualize these precipitation biases and point toward strategies for improvement. One fundamental question is where and when the model is placing its peak ascent, and whether that ascent induces dry biases poleward of the locus of convection. Here, we analyze the seasonal cycle of vertical velocity, based on omega at 500 hPa (ω 500 ), a well-established measure of the large-scale vertical motion (Bony et al. 2004;Oueslati and Bellon 2015), which has been used to diagnose tropical circulation in previous work (e.g., Schwendike et al. 2014) and can be compared between models and reanalysis.
As would be expected, many of the regions that are too wet have more ascent than in the reanalysis, such as across southeast Africa in November. Regions that are too dry are associated with anomalous subsidence, most notably the Congo basin, which has a large ω 500 bias relative to the European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-I; Fig. 2c) and to a lesser extent to the National Centers for Environmental Prediction-Department of Energy (NCEP-DOE) second Atmospheric Model Intercomparison Project (AMIP-II) reanalysis [NCEP-2; not shown (Kanamitsu et al. 2002)]. In other regions the ω 500 bias does not necessarily map exactly onto the precipitation bias, but nevertheless might help to explain the results. For example Fig. 2e reveals that during August, when the model has too little precipitation in the Sahel, there are downward anomalies across most of North Africa; and in parts of the northern Sahel, peak uplift occurs in boreal winter rather than in summer (Fig. 2b).
As well as helping to explain precipitation biases, Fig. 2 also allows for inferences about linkages between regions. The model is broadly able to simulate the migration of the tropical convection, with maximum uplift occurring in much of the Sahel zone during the boreal summer months and in southern Africa during the austral summer (Figs. 2a,b). Existing work has already suggested that the ITCZ is shifted too far south in the MetUM , and many of the omega biases in Fig. 2 imply that this southward shift is manifest over Africa, with upward anomalies in southern Africa and downward anomalies in the Sahel during boreal summer and the Congo basin during austral summer. The ω 500 plots may also give an indication as to how the model represents the zonal overturning circulation. The MetUM (and many other models) generate overly strong convection in the tropical Indian Ocean Johnson et al. 2017), which is visible in Figs. 2c,d,f. In January and November, this is located to the east of a strong downward bias over the Congo basin and Atlantic Ocean, suggesting a possible modification in the Walker circulation over Africa. These results point to the benefits of considering these regions together: further work to explore drivers of the migration of tropical convection-for example, the seasonal cycle of SSTs, and further analy sis of Indian Ocean biases and Walker circulation patterns-could generate inferences about multiple regions and seasons.
It would be useful to investigate how ω 500 compares across CMIP models, and whether common precipitation biases (such as the wet bias in southern Africa) are associated with common biases in vertical velocity. The CMIP evaluation toolkit might therefore benefit from a measure of ω 500 across Africa similar to Fig. 2. Given the limited previous work on pan-African evaluation, other diagnostics warrant further discussion and investigation, but potentially useful figures might include maps of moisture flux (following Suzuki 2011), latitude-by-month plots of the AEJ and ITCZ (following Nicholson and Grist 2003), and rainfall-SST correlations to assess teleconnections (following Rowell 2013).
Central Africa. Central Africa is here defined as western equatorial Africa, extending from the Atlantic coast to the Rift Valley and between 10°S and 10°N. The region is critically understudied, in part because of limited data availability (McCollum et al. 2000;. Several studies have assessed climate model precipitation based on observations and reanalysis (Haensler et al. 2013a,b;Aloysius et al. 2016). However, given the lack of gauge data included in precipitation datasets for this region, particularly for recent decades (Fig. 1), and the large variation in precipitation estimates from satellite and reanalysis datasets Creese and Washington 2016), validation of modeled precipitation is challenging. Process-based evaluation is beneficial in this context, because it allows for features and variables that are better observed or understood to become the focus.
As the third largest convective zone worldwide (Webster 1983), the Congo basin is dominated by convective processes, with peak rainfall in the transition seasons of March-May (MAM) and SON, governed by the migration of solar insolation, but with important intraseasonal variability and approximately 70% precipitation delivered by mesoscale convective systems (Nicholson and Grist 2003;Jackson et al. 2009). Recent studies, mainly using reanalysis datasets, have identified prominent drivers of regional circulation (Suzuki 2011;Pokam et al. 2014;Neupane 2016) and the water cycle (McCollum et al. 2000;Pokam et al. 2012), demonstrating an important interaction with the Atlantic Ocean (Hirst and Hastenrath 1983a;Dezfuli et al. 2015), and possible remote drivers including Indo-Pacific SSTs (Hua et al. 2016). During the main rainy season, SON, low-level westerlies (LLWs) from the eastern equatorial Atlantic play a key role in moisture provision ).
Here, we analyze lowlevel moisture transport and circulation during SON. Moist circulation patterns from reanalysis, influenced by observations of large-scale winds, might be expected to be better constrained than spatially heterogeneous variables such a s precipit at ion. Nevertheless, given observational uncertainty in this region, we use two reanalysis datasets [including the NCEP-National Center for Atmospheric Research reanalysis (NCEP-1; Kalnay et al. 1996;NOAA 2011) as well as ERA-I (Table 1)].
The basic structures of moisture transport are similar between the model and both reanalyses, although there are important distinctions, including between NCEP-1 and ERA-I (Fig. 3a). Note that differences are still evident if the datasets are replotted at the resolution of NCEP-1 (not shown), in particular ERA-I shows more intense moisture divergence along the Atlantic coast south of 5°N relative to NCEP-1. This moisture divergence is even more pronounced in HadGEM3-GC2, implying an overestimation of the LLWs in the model. In HadGEM3-GC2 the strong low-level winds feed a cyclonic flow into the Angola low, potentially strengthening the southeastward transport of moisture. This might help explain the dry bias at the Atlantic coast and wet bias in eastern and southern Africa (Fig. SB1), associated with a southeastward shift in ω 500 relative to the reanalysis (Fig. 2f). Analysis of circulation in other seasons (not shown) suggests that, while the LLWs are a seasonal feature in reanalysis, the model produces them throughout the year, and enhanced northwesterly flow may also help explain the dry bias and downward anomalies in the Congo basin during December-February (DJF; see Fig. SB1 and Fig. 2c).
To explore the drivers of the LLWs, the divergent and rotational components of the horizontal wind at 850 hPa are shown in Fig. 3b, following Pokam et al. (2014). The westerly f low over the Atlantic coast appears to be dominated by the divergent circulation, and the spatial structure of the winds in HadGEM3-GC2 is in broad agreement with the reanalyses, but the model overestimates the core speed of the divergent flow into the Congo basin by 1-2 m s −1 . The rotational flow in the center of the Congo basin is also more strongly westerly in HadGEM3-GC2 than in the reanalysis, suggesting that the differences in Fig. 3a are influenced by both divergent and rotational wind.
However, Fig. 3 also shows large differences between NCEP-1 and ERA-I (in keeping with Creese and Washington 2016). Even using moisture flux, which serves as a better constrained variable than parameterized rainfall, it is difficult to find a credible reference for evaluating the models.
Further work to evaluate models over the Congo basin might not always lead to conclusive statements about how similar the models are to reality. Tracking the representation of key processes in the models and reanalysis is nonetheless important in order to investigate which features exhibit more or less variability between models, and to identify suitable targets for field campaigns. For example, Creese and Washington (2016) find model variability in Congo basin precipitation to be strongly correlated with moisture inflow from the LLWs, suggesting that radiosonde data for the Atlantic coast, from new targeted field experiments or potentially mid-twentieth-century archives (as used in, e.g., Hirst and Hastenrath 1983b), could begin to constrain the ensemble. Diagnostics of moisture flux [following Fig. 3; Creese and Washington (2016); Pokam et al. (2012)] might therefore make a useful contribution to the CMIP evaluation toolkit, to track model variability in important regional circulation features in the next generation of models.
East Africa. Equatorial East Africa, here defined as the bimodal rainfall region east of the Rift Valley and from approximately 6°S to 11°N, is one of the most attractive and yet challenging regions of Africa for climate scientists. With a strong influence from ENSO (Mason and Goddard 2001;Mistry and Conway 2003;Mutemi 2003) and the Indian Ocean zonal mode (IOZM) or Indian Ocean dipole (IOD; Clark et al. 2003;Hastenrath et al. 2011), particularly during the short rains season (Lyon and DeWitt 2012), East African rainfall is among the best candidates for seasonal prediction anywhere in the world (Black et al. 2003;Hastenrath et al. 1993Hastenrath et al. , 2004Nicholson 2014). However, representing these teleconnection mechanisms in a coupled ocean-atmosphere modeling system, and on long time scales, is very challenging. The influence of the complex topography and the rapid evolution of the circulation regimes during the monsoonal transition seasons present further challenges (Hastenrath 1985), as do aspects of East African climate that are more poorly understood, such as the interplay of moist wind regimes from the Indian and the Atlantic Oceans (e.g., Williams et al. 2012).
Compared to Central Africa there has been more work to evaluate climate models over this region, on a range of time scales, including some more processoriented studies. Models show a spread in capability (Indeje et al. 2000;Mutemi et al. 2007;Kipkogei et al. 2016), with the short rains generally being better represented (Shongwe et al. 2011). Like HadGEM3-GC2 (see Fig. SB1), most models overestimate the short rains and underestimate the long rains (Yang et al. 2014;Tierney et al. 2015). Analysis of future projections for East Africa also raises questions about model ability. Most GCMs, including HadGEM3-GC2, show wetter conditions in response to higher levels of greenhouse gases (James and Washington 2013;Chadwick et al. 2015), but observations demonstrate a recent increase in droughts in East Africa (Copsey et al. 2006;Williams and Funk 2011). While there are plausible reasons for precipitation to decline and then increase again, this "paradox" between past and future trends could also be due to deficiencies in the modeled response to anthropogenic forcing (Rowell et al. 2015).
Further process-based evaluation for East Africa is important to investigate whether models adequately represent mechanisms associated with past and future precipitation changes. Existing studies indicate an important role for the Indian Ocean (Shongwe et al. 2011;Williams et al. 2012;Williams and Funk 2011;Copsey et al. 2006). Many models provide poor simulations of the Indian monsoon (Kang et al. 2002;Moss et al. 2012), and show persistent wet biases in the Indian Ocean during boreal spring and summer (Bollasina and Nigam 2009;Bollasina and Ming 2013;Williams et al. 2014). Here, we analyze the circulation over the Indian Ocean and East Africa during the short rains season, including teleconnections associated with interannual variability.
There are important differences between ERA-I and HadGEM3-GC2 in the mean winds in the equatorial Indian Ocean during October-December (OND) (Fig. 4a), with the model simulating easterly (green) rather than westerly (orange) flow. Previous work has shown easterly biases in many climate models over the Indian Ocean (Yang et al. 2014). Figure 4 shows that for HadGEM3-GC2 during OND the mean zonal winds are of the wrong sign. The wind bias is accompanied by biases in other variables: modeled SSTs are too warm in the western Indian Ocean and too cool in the east (not shown). The model also has a wet bias and enhanced ascent over East Africa (Fig SB1 and Fig. 2) and a dry bias over the Maritime Continent (not shown). Therefore, the observed structure of the Walker circulation in the Indian Ocean (see Hastenrath et al. 2011), with maximum convection and precipitation in the warm pool region, and sinking air and drier conditions over East Africa, appears to be disrupted or even reversed. This is probably at least partly due to SST biases and the location of convection in the coupled experiments, as the atmosphere-only run shows conditions that are more similar to the observations (not shown).
Despite the errors in the mean circulation, the model appears to offer a good representation of the interannual variability in the moisture flux. Figure 4b shows composites of the five wettest minus the five driest years over East Africa, based on data from the Global Precipitation Climatology Project (GPCP) and the modeled precipitation. The pattern is relatively similar for the model and the reanalysis. During wet (relative to dry) years there is more moisture convergence over East Africa and the western Indian Ocean, and less over the Maritime Continent, with easterly moisture flux anomalies over much of the tropical Indian Ocean. The extent of the wet and dry regions differs between the model and reanalysis but the characteristics of the responses are similar.
These results highlight the importance of considering both the mean state and variability, and their interaction, when evaluating the models' predictive skill, as has been applied, for example, by Johnson et al. (2017) to the Indian monsoon. While  has biases in the mean circulation, there appears to be some skill in terms of variability, and through comparison with mechanisms for future wetting (following, e.g., James et al. 2015), it might be possible to better understand the "East African paradox" and the plausibility of the models' projections.
The CMIP evaluation infrastructure might therefore usefully include diagnostics that track the Indian Ocean circulation in the mean, during wet and dry years, and in future simulations, perhaps featuring maps like Fig. 4 alongside longitude-height cross sections to display the modeled Walker circulation over the Atlantic, the African continent, and the Indian Ocean (following Shongwe et al. 2011).
Southern Africa. Southern Africa is broadly defined here as the land areas poleward of 10°S. Unlike the other regions considered, southern Africa experiences a strong influence from the midlatitudes and important tropical-extratropical interactions (Harrison 1984tropical-extratropical interactions (Harrison , 1986, which represents an interesting challenge for climate models (e.g., Niznik et al. 2015). Modeling southern African climate is also complicated by remote influences, notably from ENSO (Lindesay 1988;Reason et al. 2000), the IOD ( Goddard and Graham 1999;Reason 2002), the southwest Indian Ocean dipole (Washington and Preston 2006;  Kay and Washington 2008), the southeast Atlantic (Reason et al. 2006;Hansingo and Reason 2009), and Antarctica (Reason and Rouault 2005;Pohl et al. 2010;Manatsa et al. 2013), as well as the role of local land surface interactions, complex orography, and aerosols (Anderson et al. 1996;Mason and Joubert 1997;Tyson and Preston-Whyte 2000;Ramanathan et al. 2007).
Previous model evaluation for southern Africa suggests that, like HadGEM3-GC2 (Fig. SB1), most GCMs overestimate precipitation (Christensen et al. 2007;Cook and Vizy 2012), with a large range in magnitude (Lazenby et al. 2016). Relative to Central and East Africa, there has been more research into understanding the modeled circulation (e.g., Shongwe et al. 2009;Tozuka et al. 2014;Lazenby et al. 2016), modes of variability (e.g., Kataoka et al. 2012;Boulard et al. 2013;Dieppois et al. 2015), and mechanisms of future change (Engelbrecht et al. 2009), as well as efforts to characterize atmospheric states based on winds, moisture, and temperature, which suggests more similarity between models based on atmospheric circulation than precipitation (Hewitson and Crane 2006). There has also been an emphasis on regional climate models (RCMs; e.g., Crétat et al. 2012;Kalognomou et al. 2013;Meque and Abiodun 2015;Shongwe et al. 2015;Favre et al. 2016;Pinto et al. 2016).
Here, we focus on an important mode of tropical-extratropical interaction and a precipitationgenerating mechanism: tropical-extratropical cloud bands, known locally as tropical-temperate cloud bands (TTCBs) or tropical-temperate troughs (TTTs). TTCBs extend northwest (NW)-southeast (SE) from the subcontinent into the southwest Indian Ocean (Harrison 1984). A large proportion of southern African rainfall is associated with TTT systems (Harrison 1984), particularly heavy rainfall (Hart et al. 2013). TTTs are therefore a good target for evaluation in terms of impact relevance and might also be an indicator of the credibility of modeled regional dynamics: the location and character of tropical convection, storm tracks, subtropical highs, and the Angola low might all influence the formation of TTCBs (Todd et al. 2004;Fauchereau et al. 2009;Hart et al. 2010;Ratna et al. 2013;Macron et al. 2014).
Here, TTCBs are identified in satellite-derived [National Oceanic and Atmospheric Administration (NOAA); Table 1] and HadGEM3-GC2 daily outgoing longwave radiation (OLR) data using an automated cloud band identification algorithm, developed by Hart et al. (2012) to flag and track TTT systems. Contiguous regions of low OLR that have sufficient latitudinal extent and positive tilt (NW-SE orientation) are flagged as TTCBs, and then data from the preceding and subsequent days are analyzed to characterize the life cycle of the TTT event. Previous research using cluster analysis (Vigaud et al. 2012) or self-organizing maps (Tozuka et al. 2014) has suggested that climate models simulate TTT-like variability, but this tool allows the TTT events to be extracted explicitly, enabling analysis of their frequency, spatial distribution, annual cycle, and associated weather phenomena; here, we examine precipitation, for example.
HadGEM3-GC2 simulates TTCBs with a spatial signature similar to that of the satellite data. The gridpoint frequency during December, when TTTs peak in the satellite dataset, demonstrates two diagonal bands of concentrated activity in both the  and NOAA data, extending southeastward from southern Africa and Madagascar (Fig. 5a). However, many more TTT events are detected in the HadGEM3-GC2 results than in the satellite data: 90 relative to 48 per year (15°-40°S, 7. 5°-100°E). The exaggeration in the number of events detected is particularly large over the Indian Ocean, where the model is known to produce overly strong convection in the tropics . In terms of precipitation contributed by TTTs, the spatial pattern approximately follows the distribution of TTT events for both datasets, but the model produces much more precipitation than in the satellite data (Fig. 5b). In some regions more than 70% of modeled rainfall is contributed by TTTs, and this accounts for a large proportion of the model's wet bias (not shown).
In December, therefore, HadGEM3-GC2 appears to generate too many TTTs and too much rainfall from TTTs. The distinction in terms of the seasonal cycle is more fundamental (Fig. 5c). In the satellite dataset, TTT events over southern Africa are a summer phenomenon, peaking in November (Hart et al. 2013), which coincides with the onset of the rainfall season. In HadGEM3-GC2, however, TTT events are detected throughout the year, appearing to contribute convective precipitation even during the dry winter months. This may indicate that the character of the events in the model, and the mechanisms for generating cloud bands, are different from those found in nature. For example, TTTs are known to be associated with large meridional perturbations in the upper-tropospheric flow in the region (Hart et al. 2010;Manhique et al. 2011). Hart et al.'s (2012) tool could be used to unlock the necessary sampling base to analyze these baroclinic waves and other associated circulation features, to better understand the TTT events identified in the model data, and to assess the plausibility of the mechanisms for cloud band formation. This could also be a step toward investigating whether the model is able to simulate the interannual variability of TTT events. The number and location of TTCBs varies with ENSO (Manhique et al. 2011), and testing whether the model can capture this relationship might give an indication as to whether it can realistically simulate the potential changes in TTCBs under anthropogenic forcing.
Comparisons across CMIP would aid this analysis, and therefore diagnostics of TTCBs might make a useful contribution to the CMIP evaluation infrastructure, perhaps alongside indicators of the south Indian convergence zone (following Lazenby et al. 2016): a large-scale feature prominent in the seasonal mean for austral summer and within which synoptic-scale TTCBs are favored. Other important regional features for evaluation include the Angola low and the Kalahari heat low (following Munday and Washington 2017).
West Africa. West Africa, here defined as the region south of 20°N and west of approximately 20°E, has received greater research attention than anywhere else on the continent, particularly relating to the semiarid Sahel (approximately 10°-20°N). The largescale drought and devastating famine of the 1970s-1980s (Hulme 1992Tarhule and Lamb 2003) has led to concern about how the Sahel might be affected by climate change (e.g., Held et al. 2005;Hoerling et al. 2006;Dong and Sutton 2015). Model projections of future climate vary dramatically between much wetter and much drier conditions (Druyan 2011), and this has heightened the ambition to identify which models are more or less credible. There are already good examples of process-based model evaluation for West Africa, including investigation of the meridional circulation (Cook and Vizy 2006;James et al. 2015), the AEJ (Caminade et al. 2006;Abiodun et al. 2011), AEWs , the Saharan heat low (Biasutti et al. 2009), jet-rainfall coupling (Whittleston et al. 2017), and analysis across a range of time scales (Cook and Vizy 2006;Ndiaye et al. 2009;Birch and Parker 2014;Diasso and Abiodun 2017;Vellinga et al. 2016). This research has shown that many models do not produce a monsoon at all: the rainfall maximum does not move onto the continent during the boreal summer (Cook and Vizy 2006), at least in part because of a warm SST bias in the Gulf of Guinea (Roehrig et al. 2013).
HadGEM3-GC2 has a dry bias over West Africa . Some circulation features are represented reasonably well, for example, the AEWs, but the relationship with precipitation is weak . Comparison of different model integrations [including regional model runs; Diallo et al. (2014)] suggests that performance over West Africa is quite sensitive to changes in model configuration. Several recent versions of the MetUM have succeeded in reproducing the "jump" associated with monsoon onset: a distinct and rapid shift of the rainfall maximum from the Guinea Coast to approximately 11°N in early July (Graham 2014). However, it has not yet been possible to identify the specific parameter combinations needed to sustain this feature, highlighting the need for continued attention to these processes. Improvement in the WAM may be limited partly by large-scale biases, including the southward displacement of the ITCZ , and wet bias over the Indian Ocean . Resolution is also a limitation, perhaps unsurprisingly, given that >70% of rainfall is estimated to be contributed by mesoscale convective systems (Nesbitt et al. 2006). Vellinga et al. (2016) found that only 25-km simulations could generate the organization of convection necessary to simulate decadal variability, and analysis of convective-permitting simulations (≤12 km) has demonstrated the importance of representing local convection for regional circulation patterns ( Marsham et al. 2013;Birch and Parker 2014). Initial results from a short regional climate simulation at 4.5 km also suggest that explicit convection can substantially reduce the MetUM's dry bias over the Sahel (R. A. Stratton et al. 2017, unpublished manuscript).
Advances in computing power and model resolution may therefore be needed to improve the representation of the WAM in global models, but in the meantime continued evaluation of model processes is important to assess the extent to which the current configurations used for climate projections are useful. In particular, analysis of interannual variability might give an indication as to whether the global model is capable of generating the circulation mechanisms needed to bring rainfall into the Sahel. The model may be too dry on average, but are there years when it does produce suitable conditions for the monsoon? Here, we compare the meridional circulation in HadGEM3-GC2 with ERA-I (Fig. 6), examining the climatological mean (following Nicholson 2009;Abiodun et al. 2011), and composites of the five wettest and five driest years in the Sahel (10°-20°N, 8°W-8°E) during August, the core of the monsoon season.
The climatology shows important differences between the model and reanalysis.  underestimates precipitation in the Sahel, placing the precipitation maximum over the Guinea Coast (shown in green in Fig. 6a). This is also associated with differences in vertical velocity and zonal wind relative to the reanalysis. In ERA-I, there is a large region of upward motion throughout most of the troposphere 5°-15°N (in blue), situated between the core of the AEJ (at 600 hPa and 14°N) and the TEJ (at 200 hPa and 7°N). In contrast, HadGEM3-GC2 has several regions of upward motion, with a maximum at the Guinea Coast and a weaker zone of ascent at 10°N. There is strong upward motion in the Sahel, but only in the lower troposphere, and this is capped by subsidence aloft (red shading), which is perhaps indicative of a Saharan heat low, but displaced south relative to the reanalysis and disconnected from another zone of ascent at approximately 23°N. The AEJ is too far south, and the TEJ is not clear, indicating a different upper-atmosphere flow pattern compared to that in ERA-I. This could be closely related to the differences in vertical velocity, particularly at the jet-exit regions, which are associated with descent.
During wet years, HadGEM3-GC2 shows more precipitation and more ascent, especially at 10°N, and Hovmöller plots of precipitation suggest that there is a muted monsoon jump (not shown). This pattern is still weaker than that in the reanalysis, but in August the model does show some of the same distinctions between dry and wet years as ERA-I (and NCEP-1; not shown). Dry (relative to wet) years are associated with downward anomalies around 8°-15°N, and a southward shift and strengthening of the AEJ. There is also anomalous ascent near the surface of the Sahel, which could indicate a southward shift of the Saharan heat low. This is much farther south in HadGEM3-GC2 relative to ERA-I, suggesting some differences in terms of the mechanisms for drying.
While there are important distinctions between the model and reanalysis in terms of the circulation features and the anomalies associated with interannual variability, there are also similarities, and in wet years, the model does show some of the features associated with the monsoon during August, even if the pattern is slightly weaker than in reanalysis. Therefore, the model may be able to represent variability to some extent, and recent analysis has demonstrated reasonable skill in decadal forecasting for the Sahel using the MetUM (Sheen et al. 2017). The dynamics evident in Fig. 6 could be useful in tracking the performance of new model versions and identifying configurations that allow more convection in the Sahel during wetter years. Such diagnostics might also make a useful contribution to the CMIP toolkit, when comparing the ability to simulate wet and dry year dynamics across the ensemble, perhaps in conjunction with latitude-by-month plots of precipitation, zonal wind, and the intertropical discontinuity (following Abiodun et al. 2011).
OUTLOOK. Adaptation planners are faced with an increasing number of future climate model experiments (Giorgi et al. 2009;Taylor et al. 2012;Eyring et al. 2016a), but with limited information about the credibility of the data used. For Africa, producing reliable simulations is a huge challenge, given the complexity of the climate system, lack of observations, and limited previous research. Yet, there is also growing interest and expertise in forecasting African climate (e.g., Shongwe 2014), across the continent and globally. Moreover, there are new initiatives to strengthen investigations into model behavior, such as the planned CMIP evaluation infrastructure (Eyring et al. 2016b). In this paper we explore the potential to harness burgeoning attention and expertise to improve understanding of model ability and guide model development efforts for Africa, through region-specific, process-based evaluation that builds on local expertise.
In the previous section, model ability was examined for five domains in turn, demonstrating value in an approach that is both process based and region specific. Such analysis has the potential to improve our understanding of the distinct features that matter in each region. In East Africa, teleconnections with the Indian Ocean appear to play an important role in recent droughts and may have an influence on the response to global warming. In southern Africa, a large proportion of heavy rainfall events is contributed by tropical-extratropical systems that require specific analysis tools. The level of appropriate investigation may also vary regionally, depending on the current existing understanding of the situation. For Central Africa, where there is a lack of station data, an analysis of moist circulation can make a substantial contribution to our understanding of the model behavior. In West Africa, where there has been considerable previous research but also enduring problems with simulating the monsoon with parameterized convection, it is important to investigate the extent to which coarse-resolution models can be useful, for example, in representing interannual variability.
The benefit of regional process evaluation is further supported by the analysis of HadGEM3-GC2, which has generated new information about the model's behavior across Africa, supplementing our existing understanding (as summarized in the sidebar). Analysis of the timing and location of tropical ascent (Fig. 2) suggests that many of the precipitation biases could be associated with a southward shift, or delay in the migration of, tropical convection, which may be linked to larger-scale problems with hemispheric asymmetry. One example is the dry (wet) bias over Central (southern) Africa (see Fig. SB1), and Fig. 3 reveals some of the regional circulation patterns associated with this apparent southward shift in precipitation and convection, including overly strong LLWs along the Atlantic coast of the Congo basin, possibly forced by rotational flow from the Guinea Coast. Known problems with the Indian Ocean in the MetUM also appear to play an important role over Africa: during OND equatorial winds flow in the wrong direction (Fig. 4), constituting an important bias in the mean Indian Ocean Walker circulation with implications for the short rains in East Africa. Enhanced convection in the Indian Ocean is also found to coincide with an amplification in the number of oceanic TTT events (Fig. 5). Over southern Africa, the model also appears to generate too many TTTs, which may partly explain the wet bias. A more fundamental issue is that TTTs and convective precipitation are also detected during the winter, when southern Africa should be dry, and this could indicate differences in the regional dynamics that generate TTTs. In West Africa, there has already been considerable evaluation of HadGEM3-GC2, with the results demonstrating a dry bias: here, it is shown that during the wettest years the model is able to simulate some of the features associated with monsoon rains.
New evaluation approaches can thus incrementally improve our understanding of the models. But can they foster improvements in model simulation and/ or inform confidence assessments? Gleaning messages for model development is not straightforward: the analysis here is not sufficient to identify specific parameter adjustments or structural developments. Neither is this work sufficient to judge whether a specific future projection is trustworthy. However, the results and, crucially, the process of engaging African experts in model development, can contribute to statements about which applications the models may be more or less suitable for. For example, HadGEM3-GC2 may have some skill in simulating teleconnections associated with interannual variability in East Africa, despite mean biases. The findings can also highlight issues that should be prioritized in model development. By incorporating our analysis tools into the regular assessment of the MetUM, the issues identified can be tracked in ongoing efforts to improve the model. For example, the Southern Ocean and Indian Ocean biases are already targets for MetUM model development; by including more diagnostics of African processes, it will be possible to ensure that any adjustments are also measured in terms of their influence on African regions. The analysis tools might also provide a fundamental check on the regional circulation in new convectivepermitting simulations (R. A. Stratton et al. 2017, unpublished manuscript).
The greatest potential of the new evaluation approaches, though, lies in fostering a collaborative environment for analyzing regional dynamics across the CMIP archive. By automating the assessment process, it could be possible to deliver a step change in our understanding of model behavior. Here, we present just five examples of process-based evaluation. If, as a community, we can jointly identify priority processes for evaluation and develop five useful diagnostics for each region that can be applied to all models, this could significantly fast track our understanding of CMIP6 and future model generations. Deciding which diagnostics to use is not simple, and a research exercise in itself, but one that should be prioritized if modeling of African climate is to catch up with other regions.