Evaluation of mechanisms of hot and cold days in climate models over Central Europe

Changes in intensity, frequency, and location of temperature extreme events are a focus for many studies that often rely on simulations from climate models to assess changes in temperature extremes. Given the use of climate models for attributing such events to human and natural influences and for projecting future changes, an assessment of the capability of climate models to properly simulate the mechanisms associated with temperature extreme events is necessary. In this study, known mechanisms and relevant meteorological variables are explored in a composite analysis to identify and quantify a climatology of synoptic weather patterns related to hot and cold seasonal temperature extreme events over Central Europe. The analysis is based on extremes that recur once or several times per season for better sampling. Weather patterns from a selection of CMIP5 models are compared with patterns derived from the ERA interim reanalysis. The results indicate that climate models simulate mechanisms associated with temperature extreme events reasonably well, in particular circulation-based mechanisms. The amplitude and average length of events is assessed, where in some cases significant deviations from ERA interim are found. In three cases, the models have on average significantly more days per season with extreme events than ERA interim. Quantitative analyses of physical links between extreme temperature and circulation, relative humidity, and radiation reveal that the strength of the link between the temperature and the variables does not vary greatly from model to model and ERA interim.


Introduction
Temperature extreme events, both warm and cold, and related incidences, such as droughts and icy conditions, affect the environment, governments, and people. Within the last decade, Europe has experienced several of such events, such as the 2003heatwave, or the cold spells in early 2012 and 2013. Both heatwaves and cold spells are also responsible to a great extent for increased mortality. In 2003, for instance, about 70 000 people died in Europe due to heat stress (Robine et al 2008). Cold spells are often also associated with heavy snowfall, and put strain on national infrastructure. Before Christmas 2010, for instance, most of Europeʼs airports were shut down due to heavy snowfall (Prior and Kendon 2011).
Numerous efforts have been put into studying both changes in the frequency and intensity of events under climate change projections (see Donat et al 2013, Sillmann et al 2013a, 2013b, Hartmann et al 2013, Bindoff et al 2013 and the mechanisms that lead to them. Literature suggests that one of the most important factors driving an extreme event, either warm or cold, is the large-scale circulation of the atmosphere (e.g. Cattiaux andYiou 2012 in Peterson et al 2012). One extensively studied extreme event is the European heatwave in 2003, for which a combination of several factors lead to its development (Black et al 2004, Meehl and Tebaldi 2004, Sutton and Hodson 2005, Black and Sutton 2007, Fischer et al 2007, Vautard et al 2007. Studies agree that this heatwave was likely caused by an anticyclonic blocking circulation pattern, though the initial cause for this pattern remains unknown. A lack of moisture and intensified radiative heating likely amplified the high temperatures (Weisheimer et al 2011).
For cold extreme events during the winter season, a circulation relatively unusual for Europe brings cold Any further distribution of this work must maintain attribution to the author (s) and the title of the work, journal citation and DOI.
Arctic air to Europe (Cattiaux andYiou 2012 in Peterson et al 2012). The circulation pattern in that case resembles negative NAO index-conditions and favours persistent anticyclonic regimes (Cattiaux et al 2010, Masato et al 2012, which are thus important for cold extremes. It is not clear to what extent the mechanisms that lead to extreme events are simulated realistically in climate models. In the present paper we analyse the temperature, geopotential height, relative humidity, and radiation anomalies that coincide with moderately hot or cold temperature extremes from a selection of climate models. We focus on temperatures that recur on average almost once (99th/1st percentile) or several times (95th/5th percentiles) per season. This choice limits the analysis to unusually hot or cold events, while ensuring reasonable sampling over the past few decades. We also focus on mechanisms contributing to extremes that are discussed in the literature and investigate to what extent these mechanisms are simulated realistically in climate models. In section 2, we describe our analysis method adapted from Loikith and Broccoli (2012). Next, we present our results from comparing the magnitude and duration of hot JJAand cold DJF-events, which we define as exceedences of temperature thresholds individually per model. We further assess the skill of models to generate patterns related to the associated mechanisms in the climate models (section 3). Then, in section 4, we briefly analyse known mechanisms quantitatively by projecting the patterns onto the time series and measure to what extent known mechanisms relate monotonically to temperature extremes.

Data and methodology
In order to assess the ability of climate models to simulate the mechanisms that lead to warm and cold extreme events, we compare the climatology of synoptic conditions associated with such events in the ERA interim reanalysis (Dee et al 2011) with similar climatologies in a selection of AMIP simulations (Gates 1992) from the CMIP5 ensemble (Taylor et al 2012). Our assessment concentrates on eight different AMIP simulations that had upper level variables on daily timescales available (table 1) in the CMIP5 database. Some modeling teams provide an ensemble of simulations, from which we use the first member only.
Prior to our analyses, we interpolate all model and ERA interim data to a 2°× 2°grid and select the study domain as a box from 50°W to 70°E, and from 25°N to 85°N (figure 1).
Our analysis is similar to the composite analysis of Loikith and Broccoli (2012). We start with near-surface temperature anomalies relative to a daily 1981-2005-climatology of detrended and lowpass-filtered (10 days moving average filter) near-surface temperatures. From these temperature anomalies, we then compute the seasonal 5th and 95th percentiles of the area average between 2 • W, 20 • E, 42 • N, and 55 • N (Central Europe). These percentiles are used as thresholds to identify cold (5th percentile) and warm events (95th percentile) in the datasets. In this manuscript, we concentrate on cold winter and warm summer events for 1981-2005, which gives a sample size of 112 at least moderately extreme events.
Similarly, we derive anomalies of simultaneously occurring 500 hPa geopotential height (z500), nearsurface temperature (t2m), mean sea level pressure (MSLP), near-surface relative humidity (rhs), and, where available, surface shortwave (rad short ), longwave (rad long ) and total radiation budgets (rad total ). The anomalies used are standardized seasonally through dividing by their individual model standard deviations. From these anomalies we obtain composites by averaging the individual cases. We assess the skill of the models to generate weather patterns related  Giorgetta et al (2013) to extreme events by comparing the composites derived from ERA interim with those derived from the climate models. The skill of models in simulating extremes is quantified by calculating the pattern correlation and root mean squared error (RMSE) and presented in 'Taylor' diagrams (Taylor 2001). We chose to analyse composites as we started our analyses initially by conducting a 'k-means' cluster analysis (Philipp et al 2014), where results pointed to only one group of cluster. This suggests that the cold or hot events are not caused by several, distinct types of circulation states, but that the circulation states involved are variations around a single type of anomaly, justifying use of a composite analysis. In order to quantify the potential mechanisms related to hot summer and cold winter events, we determine the amplitude of the circulation patterns involved by projecting the composite patterns back onto the individual cases. Here, one variableʼs projection x tp consists of a scaling factor times the composite pattern and is: p where x t denotes the variable at one day index t. x p is the variableʼs composite pattern. The scaling factor in equation (1) reduces the spatial dimension of individual patterns x p to a univariate variable.
The strength of possible links between cases of near-surface temperature and the other variables is assessed through Spearmans rank correlation ρ, which is a measure for any monotonic relationship. Linking large-scale patterns to temperatures is achieved by calculating ρ between the scaling factors of the temperature pattern and the scaling factors of the other variables. We calculate ρ for z500, the total radiation budget, and relative humidity.
The selection of variables reflects the variables identified to play a major role in the processes and focusses mainly on circulation, humidity, and radiation. We would have liked to include a broader range of humidity-related and other variables. However, we were limited by the availability of such data in the CMIP5 archives.
Current studies put emphasis on the role of soil moisture in generating temperature extremes (Weisheimer et al 2011, Seneviratne et al 2012). We, however, refrain from including the soil moisture for several reasons. First, we have not been able to obtain soil moisture on a daily basis. However, as the total soil moisture changes on monthly or longer time scales only (Orth and Seneviratne 2012), using daily soil moisture might not be necessary. Second, preliminary analyses of monthly fields revealed that throughout the selected models and the reanalysis, the monthly soil moisture fields do not demonstrate consistent behavior for the moderate extremes analysed (not shown). We concluded that soil moisture fields are not easily assessable in a domain-wide comparison with reanalysed soil moisture, even if much effort went into mimicking observed soil moisture (Dee et al 2011). Soil moisture fields within a model reflect the model internal representation of soil moisture and cannot be directly compared between models and reanalyses.
Our approach enables us to focus on synopticscale variability. We obtain and evaluate climatologies of the atmospheric conditions important in the development of events, independent from secular trends and their magnitudes. Here, the total number of cases of hot and cold events is limited by the criterion to determine extreme events, i.e. the 95th (5th) summer (winter) percentile of the surface temperature averaged over Central Europe. Our approach, however, does not allow to assess whether modeled extreme events take place at the same time as in ERA interim. Events in the AMIP simulations occur at different times and for different periods unrelated to the observed state. Except from the boundary conditions (such as sea surface temperatures and sea ice, and external forcings), the individual AMIP simulations are not tied to observed states of the atmosphere. Consequently, synoptic scale activity within AMIP-style climate models is independent from that of observations.
We determine significant differences in section 3.1 (regarding amplitude and duration of events) and section 4 (regarding the rank correlations) between the models and ERA interim at the 5% level through block-bootstrapping with a block-length of 10 days. With regard to the null hypothesis that there is no difference between the test statistics from ERA interim and the model (with the alternative hypothesis of a difference being unequal 0), we build bootstrapped distributions of the test statistics. We determine significance of our test statistics, if the statistics fall outside of the critical values identified as the 2.5th and 97.5th percentiles of the bootstrapped distributions. The bootstrapped distributions are build by dividing the ERA interim time series into chunks of 10 days length. These chunks are reassembled randomly, where the reassembled time series are shorter than the original time series (to make a large number of slightly different combinations possible). By repeating this procedure 10 000 times, we create distributions of the test statistics calculated from the pool of resampled time series.

Amplitude and duration of events
Before we assess how the models perform in simulating the mechanisms associated with heatwaves and coldspells within a modelʼs climate, we first compare the amplitude and duration of events in ERA interim and models with each other.
First, we compare the unstandardized 5th and 95th seasonal percentile of the area-average of temperature anomalies to see how the models perform in terms of the amplitude of extreme events (figure 2). The JJA-95th percentile of temperature anomalies in ERA interim is 3.4 K, which is captured by the models quite well with three models showing significantly stronger extremes (4.1-4.6 K) than ERA interim. The other models are consistent with ERA interim during JJA. Considering even warmer events (99th JJA-percentile respectively), for which ERA interim has a value of 4.6 K, bcc-csm1-1 and bcc-csm1-1-m both show significantly warmer 99th JJA-percentiles (7.6 K and 6.4 K). During winter, the 5th (1st) percentile of DJF temperature anomalies is −5.1 K (−8.3 K) in ERA interim. Except for CanAM4 and MIROC5, all the models have a colder 5th percentile with MPI-ESM-LR being significantly colder than ERA interim with a temperature anomaly of −6.1 K. For very cold DJF events, none of the models show a significantly different temperature amplitude from ERA interim due to the high variability of the 1st percentile of DJF temperatures.
Given how we define events, the average number of hot or cold days per year does not differ significantly between any of the models and ERA interim with all models having about 4-6 warm summer or cold winter days each year.
However, if only years in which hot or cold events occur are used (figure 2), we see that a few models tend to simulate events with too long a period on average. With 8.5 and 9.1 days, bcc-csm1-1 and HadGEM2-A have on average significantly more warm summer days in a year with an extreme event than ERA interim (5.1 days on average), which suggests that events last longer in those models than in ERA interim. In years where a cold event takes place, ERA interim has on average 5.9 cold days with HadGEM2 having significantly more cold days (9.2 days). The number of cold or warm days per season with an event in the other models does not differ significantly from ERA interim.

Patterns in ERA interim
The patterns of warm summer and cold winter events derived from the ERA interim reanalysis (figures 3 and 4) show that in winter, cold extreme events are associated with a reversed NAO-circulation that brings cold Arctic air to Central Europe (figure 3). This circulation type is composed of higher than usual geopotential height (and surface pressure) anomalies to the north, and lower than usual values to the south. In between these high-and low-pressure systems, cold air is advected to Central Europe giving colder than average temperatures as expected ('Contribution of atmospheric circulation to remarkable European temperatures of 2011' by Cattiaux and Yiou in Peterson et al 2012). There are also positive anomalies over the Mediterranean sea.
During summer warm extreme events we find higher than usual values of geopotential height over Central Europe coinciding with higher than normal temperature anomalies (figure 4). Higher geopotential height anomalies are surrounded by lower than average geopotential height anomalies. This pattern resembles the classical omega blocking pattern with a low-high-low pattern (arranged from west to east), which is moving slowly and brings steady anticyclonic weather conditions for a relatively long period (Glickman 2000). Over Central Europe, this anticyclonic circulation is associated with clear skies, which then lead to radiative heating (figure 5) as the soil and air dries (leading to reduced evapotranspiration due to reduced available soil moisture over prolonged heat periods).
This pattern is also visible in the MSLP, which is shifted slightly further to the east compared to geopotential height. In addition to radiative heating, the area that is the center of the high temperatures experiences further warming due to warm advection from the Composites of the standardized near-surface temperature (left) and 500 hPa geopotential height (right) for cold winter events over Central Europe in ERA interim (upper row) and HadGEM2 (lower row). The composites have been derived from all cases where the area-averaged temperature over Central Europe is smaller than its 5th seasonal percentile in DJF. Note that values outside of ±0.18 are significantly different from 0 at 0.05 significance (determined through a studentʼs t-test).
south. The MSLP also shows minor features introduced by topography. South of the Alps, there is anticyclonic curvature, while just north-west of the Alps, the curvature is cyclonic (figure 5). We also examined lagged relationships for JJA hot days and found weaker but similar patterns (figure 1 in supplementary online material). This suggests that seasonally warm events form part of longer lasting events.

CMIP5 models
Our results show that the models perform almost equally well depending on the variable and season (table 2 and figure 6). Circulation-related variables are simulated best while variables influenced by nearsurface or surface processes are simulated less well. We concentrate on results from HadGEM2, but much of what we find applies to the other models as well (see supplementary online material).
Winter temperature and geopotential height patterns in HadGEM2 (figure 3) look very similar to the patterns derived from ERA interim. The patterns associated with cold winter extremes are simulated very well with a pattern correlation between 0.91 and 0.94 for the temperature, geopotential height, and MSLP fields. For longwave and total radiation budgets, the pattern correlations are about 0.76. However, Had-GEM2 does not generate the reanalysis pattern of relative humidity well (table 2). Nevertheless, HadGEM2 clearly shows very good skill in simulating the patterns that lead to cold extremes in winter, even though the winter composite of relative humidity only shows a pattern correlation of about 0.2.
For warm events in summer, the pattern correlations are not as high as for cold winter events, likely because they are smaller scale than the winter events. For instance, the warmer air spreads out further over the sea in ERA interim than it does in HadGEM2 (figure 4). Clear differences can be seen in the MSLP fields, where the MSLP in HadGEM2 is of a different magnitude compared to that in ERA interim Composites of the standardized near-surface temperature (left) and 500 hPa geopotential height (right) for hot summer events over Central Europe in ERA interim (upper row) and HadGEM2 (lower row). The composites have been derived from all cases where the area-averaged temperature over Central Europe is larger than its 95th seasonal percentile in JJA. Note that values outside of ±0.18 are significantly different from 0 at 0.05 significance (determined through a studentʼs t-test).
(figure 5). Also, the centers of high and low pressure are shifted. Another reason for slight discrepancies can be found in the radiation budget. While the location of the radiation budget in summer agrees to a good extent, the magnitude of the total radiation budget is smaller in HadGEM2. Similar patterns were found for the other models we considered (see supplementary online material).
When we consider all of the selected models, we see that the models in general simulate the mechanisms of cold winter events over Central Europe a little better than those of warm summer events (figure 6). However, it is difficult to single out a best or worst model among our selection.
Only in some instances are the RMSE and pattern correlations between models and the reanalysis significantly different from the HadGEM2 values (table 2). Significance is determined via block bootstrapping (10 days block length), where critical values of differences are ±0.08 and ±0.09 for the pattern correlation and the RMSE respectively. Another reason not to pinpoint a single model is the uncertainty of the skill scores, which arises either from sampling due to chaos, regridding, or from discrepancies between the reanalysis and the real observed state of the atmosphere. By subsampling from the pool of patterns in a model associated with its hot or cold extreme events we bootstrapped a distribution of skill scores (figure 7 shows the Taylor diagram for DJF-geopotential height in HadGEM2 as an example). Some variables, such as the 2m temperature show quite a narrow band of skill score uncertainty. Other variables, for instance the MSLP in both summer and winter, show a wider uncertainty band. Little uncertainty in the skill scores means that simulated patterns vary little between extreme events, and vice versa.

Quantification of mechanisms
Next, we quantify the described potential mechanisms related to hot summer and cold winter events as described in section 2 by calculating the rank  Table 2. Pattern correlation and RMSE between composites of several variables (near surface temperature (t2m), geopotential height at 500 hPa, mean sea level pressure (MSLP), long-and shortwave radiation budget (rad long and rad short ), total radiation budget (rad total ) and relative humidity (rhs)) in ERA interim and the models used in this study. The composites have been derived from selected cases, where the area-averaged temperature over Central Europe is larger (smaller) than its 95th (1st) seasonal percentile in JJA (DJF). NA denotes variables that have not been available for analyses.

HadGEM2
MPI correlation ρ between the amplitude of the surface temperature composite pattern and that of patterns of geopotential height at 500 hPa (as a measure for the circulation), total radiation budget, and relative humidity (where available). The results are shown in figure 8.
ρ measures to what extent stronger anomalies in one variable are directly correlated with stronger anomalies in the other. Here, it quantifies the extent to which the pattern of surface temperature is affected by the patterns of the other variables for hot JJA-and cold DJF-events. If the surface temperature was affected by the circulation, radiation, and humidity, we would expect ρ to be significantly different from 0. By bootstrapping ρ under the null hypothesis ρ = 0 (with alternative hypothesis ρ ≠ 0), critical values at the 0.05 significance level are determined to be approximately ±0.20.
As can be seen from figure 8, circulation, radiation, and relative humidity are important for hot and cold events in summer and winter. In JJA (DJF), ρ for the circulation is about 0.52 (0.73), for the radiation 0.23 (0.42), and 0.51 (0.58) for relative humidity in ERA interim. Though ρ varies among the models, we detect significantly different values at the 0.05 level from ERA interim only in some cases, where differences are greater than 0.28 in magnitude (determined via  bootstrapping). In JJA, ρ for the radiation in the MPI model is significantly larger than in ERA interim. In DJF, the models for which relative humidity is available (except from HadGEM2), have significantly lower rank correlations for this variable than ERA interim.
Our results from analysing ρ suggest that physical mechanisms related to hot JJA-and cold DJF temperature events agree with the mechanisms in ERA interim. With regard to the relative humidity, we would like to emphasize that for JJA most models show a similar strength of relationship between hot and dry anomalies as ERA interim, while this is not the case for winter-relative humidity. Here, ρ is smaller than in ERA interim, even though the models reproduce cold winter temperature events fairly well (section 3). We speculate that the models somewhat underestimate the role of relative humidity as a mechanism for cold extremes in winter.

Conclusions
In our study we assess the amplitude and mechanisms of temperature extreme events over Central Europe and their circulation patterns. We compare data from several AMIP CMIP5 simulations with ERA interim. We have shown that within the simulations the warm and cold temperature events over Europe occur for the correct reasons. The models perform reasonably well appearing slightly better in DJF than in JJA. In particular, mechanisms related to the circulation are well simulated. Other mechanisms related to radiation and humidity are simulated less well. We expect that future improvement in simulating extreme events will originate from a better description of non-circulationbased processes. Differences in pattern skill scores between the models are often marginal, thus making it difficult to single out a best model.
The absolute temperature amplitude of events is simulated realistically with some exceptions, where models over-or underestimate temperatures of warm and cold events. Regarding the average length of events, ERA interim and the models agree with each other, apart from three cases where models simulate events with too long a period on average. These analyses are complemented by the quantification of mechanisms by calculating the rank correlation between temperature, geopotential height, radiation, and relative humidity patterns. The analysis of the rank correlation reveals that the temperature is affected by circulation, radiation, and humidity. The influence of these factors varies only a little throughout the selection of models, except from relative humidity in winter. Figure 8. Potential contributors to temperature extremes: rank correlation ρ between the amplitude of the projected near-surface temperature composite pattern and that of projected composite patterns of geopotential height at 500 hPa, total raditation budget and relative humidity (where available) for hot summer (left) and cold winter events (right). The line at approximately 0.2 denotes values different from 0 at 0.05 significance suggesting a significant relationship between surface temperature and the other variables.