The record-breaking compound hot and dry 2018 growing season in Germany

Record-breaking hot temperatures were observed in many places around the world in 2018, causing heat-related deaths, crop failure, wildfires and infrastructural damages. In Germany, extremely hot temperatures were accompanied by extremely low precipitation, compounding the impacts. Here we investigate spring to autumn temperature and precipitation in Germany over the historical period. We show that since measurements started in 1881, Germany has never experienced as hot and dry conditions during March to November as in 2018. We analyse the rarity of the event and illustrate that estimates of return periods for such compound extreme events are extremely high but very uncertain and strongly depend on the way they are estimated. We further investigate output from an ensemble of climate model simulations (CMIP5). Most climate models represent the distributions of temperature and precipitation in Germany and their dependence relatively well. Statistical projections of the bivariate temperature-precipitation distribution suggests that a growing season such as 2018 will become less likely at warmer global mean temperatures due to slight increases in precipitation. In contrast, climate models project an increasing likelihood of a 2018-like event and much larger uncertainties both for temperature and precipitation at different warming levels. Both observation-based scaling and climate model estimates consistently project that the compound hot and dry conditions in peak summer June – August become more likely. Overall, our results highlight the challenges associated with estimating the rarity of very extreme multivariate events and illustrate how consistent future changes in multivariate extremes can be estimated from observations only.


Introduction
The year 2018 was extremely dry and hot in Germany with impacts particularly on crops and forests (Buras et al., 2020;Toreti et al., 2019). The regionally extremely hot and dry conditions formed part of a larger set of concurrent climate extremes across the northern hemisphere, which was caused by a circumpolar wave pattern (Kornhuber et al., 2019) and led to a record breaking number of hot days in highly populated and agricultural regions in the northern mid-latitudes between May and July (Vogel et al., 2019).
Quantifying the probability of such extremely hot and dry conditions today and in the future is important for adaptation planning, for instance in the agricultural sector, but also for fisheries, river transport and for energy supply. In general, the likelihood of compound hot and dry conditions at the seasonal time scale is strongly governed by the co-variability of temperature and precipitation . In central Europe, temperature and precipitation are negatively correlated at seasonal to half-yearly time scales in the warm season (Madden and Williams, 1978;Trenberth and Shea, 2005), favouring the occurrence of summers that are either warm and dry or cold and wet at the same time. High temperatures and lack of precipitation share common drivers such as anticyclonic conditions leading to subsidence inhibiting convection and leading to clear-sky conditions. In the hot and dry tail, land-atmosphere interactions can become important in shaping the dependence between temperature and precipitation. When soil dryness is below a certain threshold, more of the available energy from the sun is transformed into sensible heat, increasing air temperature and consequently evaporative demand and thus further drying out the soil (Seneviratne et al., 2010;Vidale et al., 2007). As a consequence, precipitation may be reduced because of lower evapotranspiration (Vogel et al., 2018). The present-day climate in Germany is typically not moisture limited, but under very dry conditions the described land-atmosphere feedback can enhance the magnitude of concurrent hot and dry conditions . No similar feedback processes are known for the cold and wet tail of the bivariate temperature-precipitation distribution.
Arguably, natural ecosystems and crops are most sensitive to weather conditions during the growing period (Frank et al., 2015;Xie et al., 2018), which roughly extends from spring to autumn in Germany. Therefore, here we focus on temperature and precipitation averaged over March to November (climatological spring to autumn in the northern extratropics) to capture the most impact-relevant period. We also compare results with the summer only (June-August).
Near-surface air temperature has increased by 1.2 � C in Germany since the beginning of wide-spread observations in 1881 until 2012 (Kaspar et al., 2013) and is projected to further increase if no radical reductions in fossil fuel emissions are achieved. In contrast, long-term trends in annual and summer mean precipitation in central Europe show no trend in the historical period (Gudmundsson and Seneviratne, 2016;Orth et al., 2016) and are highly uncertain for the future (Orth et al., 2016;Greve et al., 2018) due to model uncertainties in changes in atmospheric dynamics (Zappa and Shepherd, 2017;Li et al., 2018) and tropospheric stability (Kr€ oner et al., 2017), and the representation of land-atmosphere coupling (Vidale et al., 2003;Vogel et al., 2018). However, even without robust changes in precipitation, an increases in temperature will lead to more frequent compound hot and dry conditions (Sarhadi et al., 2018).
Regionally averaged temperature and extreme precipitation often scale surprisingly linearly with global mean temperature (GMT) (Mitchell, 2003;Seneviratne et al., 2016). Deriving a pattern (e.g., seasonal temperature change per degree global warming) and then scaling its magnitude is commonly called ''pattern scaling'' (Santer et al., 1990). Although pattern scaling is common for temperature and precipitation (e.g., Osborn et al., 2016), we are not aware of studies that in addition to the univariate patterns also consider the dependence between the variables in the scaling. Based on the assumption that the observational record since 1881 is long enough to provide a reliable estimate of the forced response that is not strongly affected by internal variability, which we can test here, the regional scaling per degree GMT warming can be estimated based on the observations along with an estimate of uncertainty due to variability. Given the evidence from climate models that the forced response pattern scales well with GMT at the regional scale , one can extrapolate regional climate distributions of temperature and precipitation conditioned on GMT derived from observations as alternative to climate model-based projections.
The goal of this paper is twofold. First, we examine the rarity of the 2018 event in Germany. Second, we discuss the value of statistical projections of the bivariate temperature-precipitation distribution based on a pattern-scaling approach. With respect to the first goal, rarity or extremeness in the univariate case can be estimated by transforming the data into quantiles, even though the result will always depend somewhat on the definition of the event (Brunner et al., 2019). In the two-dimensional case, a common approach to estimate rarity is to compute bivariate return periods. These are, however, not uniquely defined in two dimensions. Moreover, uncertainties are very large for very rare events (Serinaldi, 2013(Serinaldi, , 2016. Here we discuss the benefits and limitations of those approaches.
Regarding the second goal of the paper, we will conduct a bivariate pattern scaling approach and compare the results against climate model projections. Climate impacts often depend on multiple climate variables and their co-variability. Consequently, if climate projections are used for impact assessments, it is important to correctly represent the dependence between climate drivers . Here we project the full bivariate temperature-precipitation distribution solely based on observed scaling relationships and conditioned on GMT targets, thereby retaining the observed dependence between the two variables.

Data
We use temperature and precipitation averaged over Germany based on station observations for the period 1881-2019 as provided by the German Weather service (DWD) from their website (accessible through their ftp server ftp://ftp-cdc.dwd.de/pub/CDC/). The data are derived from station data, which varied from approximately 100 stations in the late 19th century to more than 400 stations today (Kaspar et al., 2013). Spatial grids are interpolated from station data following rigorous quality checks and are available for the time period 1881 until present. DWD provides seasonal averages spatially average over the German states and the entire country, and here we use the average of spring to autumn (i.e., March-November) and summer averages (i.e., June-August) over Germany. Changes in the measurement network density may affect the homogeneity of the time series, though it can be assumed that for seasonal spatial means (as used in this paper) this should not be a major issue (Kaspar et al., 2013). To estimate GMT, we use observation-based estimates of annual global mean temperature anomalies from the Goddard Institute for Space Studies (GISS) (Hansen et al., 2010;GISTEMP Team, 2019) covering the time period 1881-2019.
We further use temperature and precipitation from the simulations conducted in the Coupled Model Intercomparison Project phase 5 (CMIP5, Taylor et al., 2012) from 1881 to 2100 averaged over Germany. From 2006 onwards we use the representative concentration pathway 8.5 (RCP8.5). Monthly values are averaged to March-November and June-August means, respectively. Annually averaged global mean temperatures are used as GMT. For projections at different GMT targets, we extract temperature and precipitation in Germany from 30-year time slices for which GMT first crosses the given temperature target, similar to Vogel et al. (2018). Climate model data are then pooled for plotting purposes. In total, we use the 86 individual climate model simulations listed in Table 1.
Throughout the manuscript we will focus on the time period March-November but occasionally also report results for the summer only (June-August).

Bivariate return periods based on copulas
We will first introduce some notation following Salvadori et al. (2016). Let I denote the unite interval ½0; 1�. A bivariate copula C is a joint distribution on I 2 ¼ ½0; 1� 2 with uniform margins. A bivariate cumulative distribution function F is linked to C by the functional identity formulated by Sklar's Theorem (Sklar, 1959) for all x 2 R 2 , where F 1 and F 2 are the univariate margins of F . If F 1 and F 2 are continuous, then C is unique. In particular, the multivariate probability distribution function f can be written as where c is the copula density. The (joint) survival function is defined by where c C is the survival copula of the X i , and F i ¼ 1 À F i is the survival function of X i for i ¼ 1; 2. We further introduce the Kendall's function K associated with the copula C of X: with t 2 I. The upper-orthant Kendall distribution function c K can be defined as and we can call 1 À c K the survival Kendall function. We can estimate the likelihood p that a value exceeds a certain critical multivariate threshold and falls into a critical region (or hazard region). Since we are dealing with yearly data, the return period (in years) is then simply defined as 1=p. Ideally, the critical region should be related to the region in the driver space which causes the largest impact. If the impact function is known or can be estimated (Gouldby et al., 2017;Idier et al., 2013), the impact for different combinations of drivers can be directly calculated and the effect of driver dependencies on the impact can be investigated. In practice, however, consistent data on climate-related impacts are often not available (Tschumi and Zscheischler, 2020) and the affected systems can have a strong influence on the type and magnitude of the impact . Therefore, independently defined hazard scenarios are typically used to study the extremeness of a multivariate event. There are four intuitive hazard scenarios that can be directly derived from the copula that is fitted to the data: the ''AND'' scenario (AND), (ii) the ''OR'' scenario (OR), (iii) the ''Kendall'' scenario (K) and (iv) the ''Survival Kendall'' scenario (SK), which are defined as follows (see Fig. 1 in Salvadori et al. (2016) for a graphical illustration): Interpreting the AND and OR scenarios are straightforward: A value is considered extreme if x 1 and x 2 is large (AND), and if x 1 or x 2 is large (OR), respectively. These approaches have been used widely in the literature to study impact-relevant compound extremes. For instance,  analyzed the present-day likelihood and future changes of very hot and very dry summers using the AND hazard scenario. These are important conditions for the terrestrial carbon cycle, as extreme reductions in carbon uptake occur in particular during concurrent hot and dry conditions (Zscheischler et al., 2014;Chen et al., 2019). Moftakhari et al. (2017) investigated coastal flooding risk related to fluvial discharge and coastal water level using the OR hazard scenarios. Arguably, it is sufficient that either the fluvial discharge, the coastal water level, or both be large to produce a potentially hazardous flood. The Kendall and Survival Kendall scenario have a slightly different interpretation. They can be characterized by the ''critical layers'' L t and L t , respectively, which separates the bivariate space into a critical region and a non-critical region. While all points x on L t share the same probability t The Kendall return periods have also been widely used in applications. For instance, Corbella and Stretch (2012) showed that return periods of erosion events correlated best with Kendall return periods of wave height and duration at the east coast of South Africa. Arguing that crops are sensitive variations both in temperature and precipitation,  have used Survival Kendall return periods of temperature and precipitation to explain crop yield variability in Europe. For an overview about recent studies that have used the four hazard scenarios introduced above in different applications, the reader is referred to Salvadori et al. (2016). Care need to be taken when translating the different hazard probabilities p that characterize different critical regions into return periods since this bears the danger of forgetting to which hazard scenario they belong and may therefore lead to illogical comparisons (Serinaldi, 2015).
In this study we discuss the challenges associated with determining the rarity of an individual multivariate event. We highlight that the notion of extremeness strongly depends on the underlying hazard scenario and the way copulas are fitted to the data, and does not necessary imply and extreme impact. In particular, the transformation into uniform margins to fit appropriate copulas can be done in two ways: (a) by transforming the margins into empirical quantiles (EMP) or (b) by fitting an appropriate marginal distribution and computing quantiles based on this distribution (FIT). Both approaches have been used in the literature.
We model copulas using the R package VineCopula (Schepsmeier et al., 2018), which selects the best-fitting copula out of a group of more than 10 different copulas and their rotations using the maximum likelihood estimation and the Bayesian Information Criteria (BIC). A full overview including definitions about the most commonly used copulas can be found in Nelsen (2007). The best fitted marginal distributions and selected copulas for March-November averages of temperature and precipitation over the historical period in Germany are listed in Table 2. All selected copulas passed standard goodness-of-fit tests (P > 0:1). Note that we inverted precipitation in this table such that hot and dry conditions are at the upper part of the distribution. Generally, in all cases Japan Agency for Marine-Earth Science and Technology, Atmosphere and Ocean Research Institute (The University of Tokyo), and National Institute for Environmental Studies copulas with tail dependence in the hot and dry tails are selected. This means that extremely hot conditions and extremely dry conditions are likely to co-occur. For the original data, the fit is close to the independent copula, as the data is only very weakly correlated (P ¼ 0:06 for Kendall's τ). For the anomalies (see Section 2.2.2), the upper tail dependence λ is 0.41 and 0.28 based on empirical (EMP) and modelled (FIT) marginal distributions, respectively (derived from the copula fit). If the sample size is very large, exceedance probabilities, or hazard probabilities (i.e., (5)-(8)), can be estimated by simple counting. We follow this approach for the multi-model climate simulations (after appropriate bias adjustment). We compute hazard probabilities in this way for all climate model simulations as well as only for those simulations that capture the observations well (see Section 2.2.4).

GMT scaling
For observations and climate model simulations, we estimate GMT scaling by a simple linear regression (Gudmundsson and Seneviratne, 2016). More specifically, where Y is regional average precipitation or temperature, a is the intercept and b the fitted linear scaling coefficient. We test whether the variance also changes with GMT based on a linear regression with nonconstant variances (using lmvar from the R package lmvar). Both for temperature and precipitation, a linear model with constant variances is preferred. We further test whether the dependence structure, that is, the copula between the first 50 years and last 50 years is significantly different based on the test designed by R� emillard and Scaillet (2009) (see also Section 2.2.4). This is not the case (P > 0:1). Despite the relatively long observational record covering 139 years, the linear fit may be somewhat affected by internal variability. We account for the uncertainty of the linear fit (based on the standard error of the estimated scaling parameter) when extrapolating the relationship for different GMT warming targets. Fitted scaling coefficients and their uncertainties based on observations fall within the multi-model range over the same time period (Fig. 1a and b). Overall, observed temperature scaling falls well in the center of the climate model range (Fig. 1a) at around 1.4 � C/ � C GMT. Observed precipitation scaling is, albeit positive, not significantly different from zero (P > 0:05) but falls at the top end of the climate model range (Fig. 1b), which is symmetrically distributed around 0. Due to the changing role of radiative forcing agents, the scaling may in principle not be identical in the future. However, a comparison between the scaling coefficients in climate models estimated for the historical observational period versus the future period until the end of the 21st century highlights that linear scaling can be relatively robustly estimated for temperature based on the historical period whereas it is more uncertain for precipitation ( Fig. 1c and d). In particular, scaling coefficients for precipitation tend to be smaller in the future period. This may either relate to the fact that the historical trend is affected by internal variability or the regional precipitation is forcing dependent, e.g. affected by aerosol forcing. Thus, here we combine the purely climate model-based projections and the observation-based scaling as two alternative lines of evidence, which both have their strength and weaknesses. Throughout the manuscript we refer to the  original observations as ''original data'' and the residuals after GMT scaling as ''anomalies''. We use the established scaling relationship to project bivariate distributions of temperature and precipitation under different GMT targets as follows. Using (2) we can sample from bivariate temperatureprecipitation distributions based on the fitted parametric distributions to the anomalies and the fitted copula (Table 2). We do this with the function mvdc from the R package copula (Yan, 2007). Following this approach, we sample bivariate temperature-precipitation data given baseline 0 � C, and the GMT targets þ1 � C, þ1.5 � C, þ2 � C and þ3 � C compared to 1881-1900 following the scaling relationships derived by (9). We take the uncertainty of the scaling relationships into account by sampling 10,000 points for each GMT target by sampling temperature and precipitation values, respectively, from the joint posterior distribution of the regression vector and the standard deviation with a non-informative prior.

Conditional sampling of temperature
Extremely hot and dry conditions will become more frequent if temperatures increase and precipitation stays constant, which is largely the case for Germany. We quantify this effect by estimating the probability that temperature is higher than in 2018 given precipitation is at least as low as observed in 2018, i.e. P ðT > T 2018 jP < P 2018 Þ, at different warming levels. This probability can be derived by sampling from the fitted copulas and the scaling relationship derived in Section 2.2.2.
We compare these conditional probability estimates with estimates based on multi-model climate simulations. To estimate the conditional probabilities from climate model simulations, we pooled 50-year periods centered around a range of GMT targets for multiple models. We then compute the relative number of years that are warmer than 2018 given that precipitation is smaller than in 2018. Uncertainties are estimated using bootstrapping.

Comparison between observations and climate models
We compare observations and CMIP5 model simulations of temperature and precipitation with respect to the original values and anomalies (after regressing against GMT). To account for mean biases in CMIP5 models we apply a very simple bias adjustment by subtracting the difference between observed and simulated means of the corresponding model (1881-2019) in temperature and precipitation from the simulated values. For the bias adjusted model output and the anomalies we perform three tests. We test whether both marginal distributions are captured well by the models using the Kolmogorov-Smirnov test. Furthermore, we compare dependence structures between temperature and precipitation between observations and models. For this task we use the test developed by R� emillard and Scaillet (2009), which assesses whether the empirical copulas between two bivariate distributions significantly deviate from each other. The test has been implemented by the authors in the R package TwoCop. The test is very reliable for sample sizes n > 100 (we have n ¼ 139 samples) (R� emillard and Scaillet, 2009).
For all statistical tests, the significance threshold is chosen as α ¼ 0:05.
We refer to the subset of CMIP5 models that pass all three tests as ''good models'' (good).

Rarity of 2018
Averaged over the extended growing season from March through November, 2018 was hotter and drier than any other corresponding extended growing season period in Germany since measurements started in 1881 (Fig. 2a). Based on fitted marginal distributions to a parametric distribution (FIT, Table 2), the best estimate for the likelihood that a year is both warmer and drier than 2018 is 1=6495 if a stationary climate is assumed (Table 3). However, uncertainties are very large and range from 1=431841 to 1=2123 (95% range). This probability increases to 1=546 ½1=19600; 1=350� if empirical quantiles are used to represent the marginal distributions (EMP). These numbers are so low because temperature and precipitation from March to November are only very weakly correlated (Kendall's rank correlation is À 0.11, P ¼ 0:06). Consequently, it is very unlikely that a given year breaks both the temperature and dryness record in the same year. The probability of exceeding a 2018-like year is much smaller when the marginal distributions are modelled explicitly (FIT) because the parametric fits assigns a much lower probability to the year 2018 as it deviates substantially from its most closest values, which is the case both for temperature and precipitation (Fig. 2a). An empirical transformation into quantiles does not account for such differences in spacing between the samples. In fact, the empirical univariate return period for the 2018 event is 140 (since p ¼ 1=ðn þ 1Þ with n the number of available years) both for temperature and precipitation, whereas it is > 1000 years for temperature and > 400 years for precipitation based on the parametric fits. This difference highlights the uncertainties associated with the transformation of marginals into quantiles, which are often used to fit copulas, in particular for very rare events. It should be noted that parametric distributions that are fitted to the full distribution may also not be particularly well suited to estimate extreme return periods. Hence, the two estimates reported here cover a substantial range of the uncertainty associated with estimating the likelihood of very rare bivariate events. In addition, large uncertainties are also related to the choice of copula. We estimated these uncertainties via bootstrapping and refitting copulas to each bootstrapped sample (Table 3). If we focus on summer only (June-August), 2018 was the second hottest and second driest summer on record (Fig. 3a). Kendall's rank correlation between temperature and precipitation in summer is À 0.29 (P < 0:01).
Regional temperatures and GMT have been increasing substantially over at least a century in most parts of the world, resulting in nonstationary temperature time series. Temperature records are therefore expected to be broken with much higher frequency than expected in a stationary climate (Meehl et al., 2009;Rahmstorf and Coumou, 2011). To account for this non-stationarity, we estimate the rarity of the 2018 spring to autumn period by adjusting for the increase in GMT. Regressing against GMT, the likelihood of an event that is both warmer and drier than 2018 (p AND ) increases to 1=237 ½1=702; 1=177� (EMP) and 1=1456 ½1=4257; 1=1031� (FIT), respectively. Inspecting the distribution of the anomalies relative to the GMT increase, it is evident that 2018 was still record-breaking dry and the fourth hottest year on the record even after the effect of long-term warming signal has been removed (Fig. 2b).
Similar to the estimate derived for the original values, the large difference between p EMP AND and p FIT AND is related to the fact that by fitting the marginal distributions to a parametric model, the comparably large difference between 2018 results in a very low likelihood for 2018, in particular for precipitation. As expected, return periods based on the OR scenario are substantially smaller (Tables 3, 4). Confined to summer only, 2018 was the 5th warmest and 2nd driest when adjusting for the GMT increase (Fig. 3b).
In all four casesaveraging over March to November or June to August, with and without adjusting for GMT increasethe dependence between temperature and precipitation is best captured by copulas that have a tail dependence in the hot and dry tails (Table 2). This means that extremely hot and dry years are likely to co-occur and the likelihood of an extremely hot and dry growing season/summer (p AND ) is much larger than if temperature and precipitation would be tail independent.
In the following we compare observation-based estimates of the rarity of the 2018 event against empirical estimates from climate model simulations. Generally, model simulations capture the observed data distributions relatively well. 79 and 73 out of 86 model simulations pass all three tests with respect to the original values and the anomalies, respectively. Fig. 4 illustrates all pooled modelled years for the simulations of the ''good'' models and highlights the exceedance thresholds as defined by the observations from 2018. The return periods of 2018like events based on copula modelling are typically smaller for the AND scenario than the estimates based on counting exceedance probabilities in multi-model simulations for both the original values and anomalies (Tables 3 and 4, respectively). For the OR scenario, the estimate based on model simulations are in between the estimates based on copulas for the original values and slightly higher for the anomalies. The difference induced by using all model simulations in comparison to only those that capture the observed behaviour well (''good models'', see Section 2.2.4) is generally small, and put the climate model estimates Table 3 Return periods (in years) for the four hazard probabilities p defined by the year 2018. Different estimates are based on empirical quantiles (EMP) and fitted parametric distributions (FIT) to estimate the marginal distributions as well as simple counting of years that fall into the different hazard regions in the CMIP5 model ensemble. We differentiate between all simulations (MOD) and only simulations that capture both marginal distributions and the dependence structure well (MOD good , see Section 2.2.4). Numbers in brackets show the 95% confidence interval estimated through bootstrapping.  Fig. 3. As Fig. 2 but for summer only (June-August).

Table 4
As Table 3  closer to the copula-based estimates.

Climate change projections
The projection based on extrapolation of the observed scaling suggests that a 2018-like event becomes somewhat less likely in future. Based on the copula fit with parametric marginals, we estimate that at 1 � C global warming (slightly warmer than current conditions) a 2018-like event (AND condition) would occur approximately every 1500 years, compared to every 2000 years in a 1.5 � C world. This increase in return period is due to a small increasing trend in March-November precipitation, which leads to reduced probability of the observed dry anomaly even though the warm anomaly recorded in 2018 is expected to become more common due to a strong warming (Fig. 5a). Multi-model projections yield a larger uncertainty range in particular for the warm tail of the multi-model distribution. Since some climate models also project a reduction in March-November precipitation (cf. Fig. 1b, d), very dry conditions as observed in 2018 become even more likely in the future in the pooled climate model ensemble (Fig. 5c). The discrepancy between the extrapolation of the observed scaling yielding on average a lower probability of 2018 conditions and the multi-model mean projecting higher probability, primarily arises from discrepancies in the precipitation trend, as a result of a different precipitation response to anthropogenic climate change or internal variability. It is documented that central Europe including Germany is an area with large uncertainty in precipitation changes located between the Mediterranean, which is expected to experience drying and Scandinavia, which is expected to experience a wetting trend (Rowell and Jones, 2006;Kr€ oner et al., 2017;Orth et al., 2016). Furthermore, a tendency to drier summer conditions and wetter winter conditions maybe partly compensating in the growing season March-November. In summary, for the growing season in Germany analyzed here, the uncertainties in the precipitation trends may relate to a tug of war between drying and wetting signals.
If we focus on summer only, climate model projections and observation-based scaling yields more consistent results and 2018 is projected to become more likely. Pattern scaling and climate models both project a drying trend and strong warming (Fig. 5b, d). Some climate models project extremely warm summers with very dry conditions in Germany at high levels of global warming (above 2 � C).
If a period March-November is as least as dry as 2018, the conditional probability that it is at least as hot as in 2018 increases rapidly once GMT has warmed to about 0.5 � C (Fig. 6). In current climate conditions at about 1 � C GMT increase, the conditional probability for temperatures at least as high as 2018 is 53% (5-95% uncertainty range: 0-100%). In a 1.5 � C warmer world, that probability is nearly 92% (19-100%), compared to a 2 � C warmer world where it reaches 100% (71-100%). Hence, if global warming reaches 2 � C, every year that is at least as dry as 2018 is projected to be as hot as 2018 or hotter. Again, uncertainties are large for these estimates. Empirical estimates of conditional probabilities based on climate model simulations are more challenging to compute because of the extremeness of the event. We pooled 50 model years of time periods centered around a range of GMT targets and estimated uncertainties through bootstrapping. In general, the conditional exceedance probability is increasing more slowly with global warming for CMIP5 models, which can be explained by the larger model spread under future conditions compared to the observation-based scaling (Fig. 5).

Discussion
There is no question that the year 2018 in Germany was climatically extraordinary. Yet exactly how extraordinary is difficult to determine. Exceptional high return periods have also been estimated for univariate events in the past (e.g. Sch€ ar et al., 2004;Luterbacher et al., 2004;Barriopedro et al., 2011) albeit with large uncertainties. Taking climate change trends into account, those numbers usually become much less exceptional (Coelho et al., 2008). In the bivariate case, however, a second condition is added so that return periods are larger by definition (e.g. AghaKouchak et al., 2014). In such a case, precisely estimating return periods of rare events becomes even more challenging. Uncertainties in the estimation of univariate quantiles are multiplied with uncertainties related to the choice of the best copula. We estimated that the very high temperatures and very low precipitation of the 2018 spring-to-autumn in Germany would happen at most every several hundred years under present-day conditions. Extreme multivariate return periods might be estimated with higher confidence using multivariate extreme value theory (Davison and Huser, 2015;Engelke and Ivanovs, 2021). Hereby, the extremes in the marginal distributions are typically modelled with an extreme value distribution. However, 139 samples, as in our case, are not enough for fitting a multivariate parametric distribution on the tails only.
It is interesting to note that in all studied cases a copula with tail dependence in the hot and dry tails is chosen as a the best representation of the data. The distribution of the multi-model ensemble for the projections of the summer also seem to have this feature (Fig. 5d). Nevertheless, climate model-based return periods are much more extreme (Tables 3 and 4) suggesting that the dependence in the hot and dry tail is not as strong in climate models as compared to the observations even though the linear correlation between summer temperature and precipitation is quite well captured in the northern hemisphere . Thus, projections based on climate model simulations may underestimate the risk of co-occurring hot and dry extremes. The particularly strong dependence in the hot and dry tail may emerge from an amplification of the overall Fig. 6. Conditional probability that temperature in Germany averaged from March through November is higher than in 2018 given that precipitation is at least as low as in 2018. Exceedance probabilities are plotted against different GMT targets relative to 1881-1900. (a) Observation-based scaling. Shading encompasses 5-95% uncertainty range propagated from the uncertainty in the scaling coefficient of temperature against GMT. (b) CMIP5 climate models. All data from the ''good'' models are pooled for 50 year periods around given GMT targets. 5-95% uncertainty bands are estimated via bootstrapping. dependence by land-atmosphere interactions, which only emerges when soil dryness is below a certain threshold (Seneviratne et al., 2010;Vidale et al., 2007). The stronger dependence in the hot and dry tails might lead to an underestimation of the likelihood of concurrent hot and dry conditions if standard statistical models based on linear correlations are fitted to the entire distribution. The challenge to put precise numbers on the frequency of rare events makes it difficult to assess the risks associated with compound events  and therefore limits our ability to adapt to the impacts of such events.
In order to estimate future changes in the probability of compound hot and dry conditions as observed in March-November 2018, we have used two different lines of evidence, (a) a pattern scaling extrapolation based on observed scaling with GMT, and (b) CMIP5 multi-model projections. By combining copula theory with pattern scaling, we have illustrated how multivariate climate distributions can be projected conditional on GMT by not only scaling their marginals but also maintaining their dependence structure. In the projections of March-November based on an extrapolation of observation-based scaling, a dry and hot March-November like 2018 is projected to become less likely in a warmer climate because precipitation shows a slightly positive trend over the observational record. In contrast, multi-model projections based on CMIP5 yield a larger uncertainty range, also including very dry future conditions (Fig. 5). The fact that there is a drying signal in many climate models but not in the observed scaling may be due to a different precipitation response between climate models and observations or due to the fact that the precipitation scaling is different in the historical period and the future, for instance as a result of forcing dependence or internal variability. When comparing the estimated scaling coefficients between the historical and the entire period up to 2100, we illustrate that for precipitation, scaling coefficients based on the historical period are rather overestimated (Fig. 1), that is, climate models generally project less increase and often even a decrease in precipitation for warmer conditions compared to an extrapolation based on historical time period. Hence, even if no drying trend has yet been observed, a drying trend might still occur in the future. On the other hand, however, Vogel et al. (2018) have analyzed climate projections for the summer in central Europe and found that those climate models that project the driest and hottest conditions show unrealistic present-day conditions. This illustrates that using a observation-based scaling can provide an important alternative source of information. It is important to note that for the extremely hot and dry summer conditions climate models projections and extrapolation of observation-based scaling yield consistent results projecting that conditions as in summer 2018 become more likely.
Even if precipitation shows no change in a warmer world, the likelihood that extremely dry years are extremely hot at the same time increases (Fig. 6). This is simply due to the fact that the entire temperature distribution is shifted towards warmer conditions. At 2 � C global warming we estimate that more or less every spring-autumn that is at least as dry as 2018 will be also at least as hot as experienced in 2018 in Germany.

Conclusions
Concurrent hot and dry extremes frequently cause large impacts to human society, economy and natural ecosystems. Here we analyse the rarity of the extremely hot and dry 2018 growing season (March-November) in Germany. The event was record-breaking both in its high temperatures and low precipitation amount. Estimates of return periods range from several hundreds to several thousands of years depending on the definition of the event, and are highly uncertain. Our findings reveal a positive tail dependence in the hot and dry tails between temperature and precipitation averaged over summer and averaged over the growing season in Germany. We use climate model-based projections and observation-based scaling to project the bivariate temperature and precipitation distribution at different global warming targets. The observed 2018 event becomes more likely in climate model projections but slightly less likely in the projections based on the observation-based scaling due to a small increase in precipitation in the observed time period. For summer only (June-August), observation-based scaling and climate models consistently project more frequent compound hot and dry conditions as in summer 2018. Our analysis illustrates how both approaches offer two complementary lines of evidence to project multivariate probability distributions.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.