Documented and Simulated Warm Extremes during the Last 600 Years over Monsoonal China

: In this study, we present an analysis of warm extremes over monsoonal China (21–45 ◦ N, 106–124 ◦ E) during the last 600 years based on Chinese historical documents and simulations from the Paleoclimate Modelling Intercomparison Project Phase 3 (PMIP3) and the Coupled Model Intercomparison Project Phase 5 (CMIP5). The Chinese historical documents indicate that extreme warm records become more frequent after ~1650 CE in North China and ~1850 CE in the Yangtze River Valley. Our analyses of two threshold extreme temperature indices also illustrate that warm extremes have become more frequent since the 17th century in North China and the mid-19th century in Yangtze River Valley in good agreement with the changes in warm extremes revealed in the historical documents. This agreement suggests potential mechanisms behind the shift of periods, which should be further investigated in the future.


Introduction
The last 600 years is a crucial period that links the last millennium together with the recent warming climate. It generally can be divided into two periods, the Little Ice Age (LIA; 1400-1849 CE) and the 20th Century Warming Period (20C WP; 1850-2005 CE). The LIA was first mentioned by Matthes who intended to describe glacial regrowth in the Sierra Nevada, California [1]. Later, this concept was used to describe the latest and extensive occurrence of glacial activities in the late Holocene [2]. Paleoclimate proxies and climate simulations indicated that the global mean surface air temperatures (SAT) during the LIA were about 0.5-1.5 • C colder than the early 1800s [3,4]. However, the onset, duration and temperature changes during the LIA seem to be highly regional-dependent [5][6][7][8]. After the end of the LIA, a warming trend occurred that lasted into the 20C WP [9,10]. Within the 20C WP, the globally averaged surface temperatures over land and ocean showed a linear warming trend of 0.85 • C (according to the Intergovernmental Panel on Climate Change a 90% likelihood of varying between 0.65 • C and 1.06 • C) from 1880 to 2012 CE [11].
During the last 600 years, global and regional climates show remarkable variations as demonstrated in many proxy records as well as historical documents [12][13][14][15][16]. There are prominent cold periods; the middle 15th century, the early 17th and the early 19th century can be found during the LIA [17][18][19]. The global climate during the 20C WP includes a slight cooling (late 19th century) then the early twentieth century warming (ETCW;1901-1950 [20] and finally the recent warming hiatus (1998 [21,22]. According to previous studies, the increased frequency and severity of climate extremes can be found during the 20C WP [23][24][25].

Historical Documents Data
Documented records have unique advantages for the reconstruction of the past climate. They have the characteristics of high spatial and temporal resolution with accurate age control and clear climate meaning [41]. A few comprehensive integrated achievements such as the compilation of documentary records and electronic databases have provided a reference for analyzing the last millennium climate. Numerous studies of climate extremes using historical documents have been done [42][43][44][45].
Two classical books are used in this study, "A Collection Of Meteorological Records For The Past 3000 Years In China" [46] and "Meteorological Disaster Of China" [47]. The first book was written by Deer Zhang in 2004 and served as a chronological record. It is the symbol of the success of Chinese historical climatology. It contains records of local weather or climate conditions and related disaster records from the 13th century BC to 1911 CE. A total of 8228 kinds of relevant records of ancient documents was collected and organized in the process of compiling this book. The second book was written by Kegang Wen in 2005. It records droughts, heatwaves, low temperatures, typhoons and other meteorological disasters taking years as the unit. There are 32 volumes (including comprehensive and local volumes) in this book. We mainly focus on the volumes related to eastern China provinces. The original records of these two works mainly come from the "zai xiang (灾祥 )" or "xiang yi (祥异 )" part of the local chronicles, which mainly cover the special astronomical phenomena and agrometeorological events that occurred each year. The modelling outputs came from the last millennium (LM) experiments in PMIP3 [48] and the historical experiments in CMIP5 [49] simulated with CCSM4, BCC-CSM1.1, CSIRO-Mk3L-1-2, MPI-ESM-P, MRI-CGCM3 and IPSL-CM5A-LR (Table 1). All models with daily outputs were employed in this study except for MIROC-ESM due to its abnormal warming in the LM experiment [3,50,51]. Only the first run (tagged with r1i1p1 in the Earth System Grid Federation database, ESGF) for each model was used in this study. Basic information of the six models is shown in Table 1 and these data can be downloaded at the website of https://esgf-node.llnl.gov/projects/cmip5/ (accessed on 5 January 2021). Note IPSL-CM5A-LR only provides the daily outputs from 1500 to 2005 CE. Detailed climate forcing and experimental designs are introduced at the PMIP website (https: //pmip3.lsce.ipsl.fr/wiki/doku.php/pmip3:design:lm:final/ accessed on 13 December 2020) and the CMIP website (pcmdi.llnl.gov/mips/cmip5/ accessed on 18 December 2020).
Although there is still bias between observations and simulations, the performance of these six models had a modest improvement compared with CMIP3 in simulating the factors that modulate temperature variations such as the EI Niño Southern Oscillation [52], Atlantic Multidecadal Oscillation and Pacific Decadal Oscillation [53].

Observation Data
To validate the simulated warm extremes, we used the modern observational dataset CN05.1 (Figure 1). CN05.1 is a high-quality dataset of gridded Tmax and Tmin established from 2416 stations over China by the National Climate Center of the China Meteorological Administration [54,55].

Warm Records in Chinese Historical Documents
We used a semantic judgment method to select extreme warm records in the historical documents. These keywords can be classified into three categories. The first is the direct descriptions of heat. Many Chinese words, for example, kushu (酷暑), shengshu (盛 暑), shenshu (甚暑), yan (炎), re (热), xunzhuo (熏灼) and yv (燠), indicate that it was very hot. The second is the description of the sun and winds in hot scenarios. For example, sunshine like fire (ri ru hu, 日如火), the sun is shining and it is very dry (kang yang, 亢 阳), hot winds like fire (yan feng ru huo, 炎风如火) and hot air burns skins (jiao qi chu ren, 焦气触人). The third is the description of the biological responses of humans or animals, for example, people die of heatstroke (ren ye si, 人暍死), female farmers die of heatstroke (tian fu ye si, 田妇暍死), seedling of cereal crops wither (he miao jin ku, 禾苗尽枯 ) due to hot and dry conditions and fish die in drying rivers (he yv si, 河鱼死).
In principle, the keywords in the first category were necessary to identify warm extremes. However, when the keywords from the first category were missing, the co-occurring of keywords from the second and the third category were considered as auxiliary discriminant criteria.

Extreme Temperature Indices
Three indices, recommended by the Expert Team on Climate Change Detection and

Warm Records in Chinese Historical Documents
We used a semantic judgment method to select extreme warm records in the historical documents. These keywords can be classified into three categories. The first is the direct descriptions of heat. Many Chinese words, for example, kushu ( (酷暑 )), shengshu (盛暑 ), shenshu (甚暑 ), yan (炎 ), re (热 ), xunzhuo (熏灼 ) and yv (燠 ), indicate that it was very hot. The second is the description of the sun and winds in hot scenarios. For example, sunshine like fire (ri ru hu, 日如火 ), the sun is shining and it is very dry (kang yang, 亢阳 ), hot winds like fire (yan feng ru huo, 炎风如火 ) and hot air burns skins (jiao qi chu ren, 焦气触人 ). The third is the description of the biological responses of humans or animals, for example, people die of heatstroke (ren ye si, 人暍死 ), female farmers die of heatstroke (tian fu ye si, 田妇暍死), seedling of cereal crops wither (he miao jin ku, 禾苗尽枯 ) due to hot and dry conditions and fish die in drying rivers (he yv si, 河鱼死 ).
In principle, the keywords in the first category were necessary to identify warm extremes. However, when the keywords from the first category were missing, the cooccurring of keywords from the second and the third category were considered as auxiliary discriminant criteria.

Extreme Temperature Indices
Three indices, recommended by the Expert Team on Climate Change Detection and Indices (ETCCDI), were used to represent simulated warm extremes in this study [61][62][63][64]. They were the two threshold indices TX90p and TN90p and the duration index WSDI. TX90p (TN90p) was calculated as the percentage of days when Tmax (Tmin) was greater than a 90th percentile, representing the proportion of warm extremes. WSDI, which represented the duration of warm extremes, was calculated as an annual count of days with at least six consecutive days when Tmax was greater than a 90th percentile. The definitions of these three indices can be found at the ETCCDI website (http://etccdi.pacificclimate.org/ list_27_indices.shtml/ accessed on 9 January 2021). For each simulation and the CN05.1 Atmosphere 2021, 12, 362 5 of 18 observations, the values between 1961 and 1990 CE were used as the reference to get the 90th percentile for these three indices.

Model Validation
Taylor diagrams [65] were employed to validate the ability of models to simulate spatial variation and interannual variability of the extreme temperature indices. Taylor diagrams need three parameters, the correlation coefficient (CC), the root-mean-square (RMS) difference between the two time series and the ratio of the standard deviations (RSD) of the two patterns. In the Taylor diagrams of spatial variations, we calculated the three parameters based on the mean field of three indices ( Figures

Analyses of Simulated Warm Extremes
We used wavelet analyses to illustrate periods in the indices. The wavelet transform is a mathematical tool to supply time-frequency representations of an analyzed signal in a time domain [66]. In comparison with classical spectral analyses, wavelet analyses can quantify temporal variabilities on different scales but do not need a stationary series [67]. Thus, the method can analyze irregularly distributed events and time series that contain non-stationary power at different frequencies [68]. The mother function of wavelet analyses is the Morlet wavelet, which provides a balance between time and frequency localization and includes the exponential function modulated by a Gaussian wavelet [69].
We also employed trend analyses to show the trend of extreme temperature indices. As most of the extreme temperature indices did not follow a Gaussian distribution [70], the linear least squares estimation, though popular, could not be used in the trend analyses of extreme temperature indices. Instead, we used the non-parametric Theil-Sen method [71] to calculate the trend. Meanwhile, we used the Mann-Kendall test [72,73] to examine if the trend was statistically significant at the 95% confidence level. Figure 2d shows the frequency of documented extreme warm records within a five decadal interval. For example, 0.3 means three years in the decadal interval. There were 151 records of warm extremes over monsoonal China during the last 600 years. Among them, 128 records appeared in NC and 22 happened in YRV while only one was from SC ( Table 2). We found that warm records of monsoonal China became more frequent after 1850 CE as shown in Figure 2d. A similar feature can be found in Figure 2a-c. There was a significant increasing trend of the simulated Tmax, Tmin and reconstructed annual mean temperatures since 1850 CE. Previous studies have proved warm extremes became more frequent under the background of global warming [25]. During the LIA, documented warm extremes mainly concentrated on 1710-1760 CE whereas the frequency of warm extremes was low during 1440-1480 CE (Figure 2d, black bars). Previous studies based on historical documents have found that the middle of the 15th century was the cold period [15] while 1710-1760 CE was the relatively warm period [74]. Note that the two characteristic periods can be found in the simulated results (Figure 2a,b) and reconstructions by Hao et al. [75] (Figure 2c).

Warm Extremes in Historical Documents
For three regions of monsoonal China, the frequency of warm extremes was regionallydependent. The statistics of records per decade showed that extreme warm events became more frequent after~1650 CE in NC than before (red bars). In YRV, a change in frequency occurred after 1850 CE (blue bars). However, there was only one record in SC (green bar).

Warm Extremes in Simulations
3.2.1. Changes of Warm Extremes from the LIA to 20C WP Figure 3 shows the histograms of simulated summer mean Tmax and Tmin temperatures. Almost all of the models produced right-side shifts of post-1850 temperature distributions relative to pre-1850, indicating a remarkable recent climate warming in monsoonal China. Compared with the 90th percentile threshold in the reference period (1961( -1990, the pre-1850 temperature distributions often showed fewer areas above the threshold while the post-1850 temperature distributions showed more areas above the threshold, suggesting a higher frequency of warm extremes during the 20th century [62,64,76]. However, MRI-CGCM3 behaved differently when compared with the other five models. The right-side shift of the post-1850 Tmax relative to the pre-1850 was smaller than the other five models.

Validation of Indices
The model validation (Figure 4a) demonstrated that models had better abilities in simulating the spatial variation in the TX90p and TN90p indices than in the WSDI index. In the Taylor diagram, the simulated TX90p and TN90p indices had spatial correlation coefficients (dotted radial lines) 0.95-0.99 to the CN05.1 observations (during 1961-2005 CE) while the spatial correlation coefficients for the simulated WSDI index were lower, ranging between 0.7-0.95. Almost half of the models used here simulated smaller variances (black dotted circular lines) of the spatial distributions of these indices (with a ratio between 0.75 Atmosphere 2021, 12, 362 9 of 18 and 1) than the CN05.1 observation whereas other models reproduced larger variances (ranging between 1 and 1.5). TX90p and TN90p showed smaller RMS differences (less than 0.5) while the RMS differences of WDSI ranged between 0.5 and 1. ences (less than 0.5) while the RMS differences of WDSI ranged between 0.5 and 1.
In the case of the interannual variability (Figure 4b), TN90p showed relatively higher temporal correlation coefficients (ranging between 0.7-0.35) than the CN05.1 observation, relative to TX90p. However, the relatively high correlation could be due to global warming and it needs further study. Although most models reproduced similar temporal variances (varying between 1.0-1.25) to the CN05.1 observation, the RMS differences were large (larger than 0.5) and showed a remarkable spread. CCSM4, BCC-CSM1.1 and IPSL-CM5A-LR provided relatively better simulations in the interannual variability of TN90p.
Therefore, these six models had a better performance in simulating the threshold indices (TX90p and TN90p). In the analyses below, we only diagnosed the two threshold indices.  In the case of the interannual variability (Figure 4b), TN90p showed relatively higher temporal correlation coefficients (ranging between 0.7-0.35) than the CN05.1 observation, relative to TX90p. However, the relatively high correlation could be due to global warming and it needs further study. Although most models reproduced similar temporal variances (varying between 1.0-1.25) to the CN05.1 observation, the RMS differences were large (larger than 0.5) and showed a remarkable spread. CCSM4, BCC-CSM1.1 and IPSL-CM5A-LR provided relatively better simulations in the interannual variability of TN90p.
Therefore, these six models had a better performance in simulating the threshold indices (TX90p and TN90p). In the analyses below, we only diagnosed the two threshold indices.

Changes of Indices during 1400-2005 CE
The wavelet analyses of TN90p and TX90p suggested a shift in warm extremes since the 17th century in NC (as indicated by a weakened~60-year period in TN90p around 1600 CE simulated with CSIRO-Mk3L-1-2 and MPI-ESM-P in Figure 5i,j), a strengthened 32-128-year period in TN90p from around 1600 CE simulated with MRI-CGCM3 in Figure 5k, a 32-64-year period in TX90p around the 17th century simulated with CCSM4 in Figure 5a, a strengthened~60-year period in TX90p around the year 1650 CE simulated with BCC-CSM1.1 and MRI-CGCM3 in Figure 5b,e and a weakened~60-year period in TX90p from the 17th century simulated with CSIRO-Mk3L-1-2 in Figure 5c.
In SC, no obvious shift could be found during 1400-2005 CE. A few models still simulated weak oscillations ( Figure S6). For example, MRI-CGCM3 and IPSL-CM5A-LR reproduced a 64-year period in TN90p during 1800-1900 CE. However, this period had low powers or short durations.
According to the above analyses, two stages could be distinguished in NC and YRV. The trend of the two stages is shown in Table 3. The trend in TX90p and TN90p averaged (area-weighted) over NC, which was weak and insignificant before 1650 CE (ranging in −0.03~+0.034 and −0.038~+0.024) while it became significant after 1650 CE (ranging in +0.07~+0.174 and +0.07~+0.17). Moreover, the trend in TX90p and TN90p averaged over YRV was weak and insignificant before 1850 CE (ranging in −0.011~+0.001 and −0.007~+0.001) while it became significant after 1850 CE (ranging in +0.048~+0.159 and +0.124~+0.477). The spatial distribution of the trends in TX90p and TN90p looked consistent for all models (Figures S7 and S8).
powers or short durations.
According to the above analyses, two stages could be distinguished in NC and YRV. The trend of the two stages is shown in Table 3. The trend in TX90p and TN90p averaged (area-weighted) over NC, which was weak and insignificant before 1650 CE (ranging in −0.03~ + 0.034 and −0.038~ + 0.024) while it became significant after 1650 CE (ranging in +0.07~ + 0.174 and +0.07~ + 0.17). Moreover, the trend in TX90p and TN90p averaged over YRV was weak and insignificant before 1850 CE (ranging in −0.011~ + 0.001 and −0.007~ + 0.001) while it became significant after 1850 CE (ranging in +0.048~ + 0.159 and +0.124~ + 0.477). The spatial distribution of the trends in TX90p and TN90p looked consistent for all models (Figures S7 and S8). (a-f) Simulated TX90p by six models. (g-l) Simulated TN90p by six models. The shaded area shows the periods with confidence levels higher than 95% and the cone of influence is shown as the black line.  Bold values represent the trends with the confidence level higher than 95% due to the Mann-Kendall test.

Discussion and Summary
Our diagnoses on these CMIP5 simulations revealed a large model spread in simulating the interannual variability of warm temperature extremes. It seemed that this model spread was caused by climate forcings employed in LM experiments and climate feedbacks considered in models. The LM experiment often considered solar and volcanic forcings as well as changes in greenhouse gas levels and land use. There are four series for solar forcings that can be used in the LM experiment [77]; the reconstruction done by Vieira et al. [78] (hereafter VSK), by Steinhilber et al. [79] (hereafter SBF), by Delaygue and

Discussion and Summary
Our diagnoses on these CMIP5 simulations revealed a large model spread in simulating the interannual variability of warm temperature extremes. It seemed that this model spread was caused by climate forcings employed in LM experiments and climate feedbacks considered in models. The LM experiment often considered solar and volcanic forcings as well as changes in greenhouse gas levels and land use. There are four series for solar forcings that can be used in the LM experiment [77]; the reconstruction done by Vieira et al. [78] (hereafter VSK), by Steinhilber et al. [79] (hereafter SBF), by Delaygue and Bard [28] (hereafter DB) and by Wang et al. [80] (hereafter WLS). Two series of volcanic forcings can be used. One is the reconstruction done by Gao et al. [81] (hereafter GRA) and the other was created by Crowley et al. [82] (hereafter CEA). The six models analyzed in this study employed different combinations of forcings (Table S1). Furthermore, even though a few models used the same combination of forcings, for example, BCC-CSM1.1 and IPSL-CM5A-LR employed the same solar and volcanic forcing, a remarkable model spread still could be observed, which was related to climate feedbacks simulated in the model. CMIP5 models often use different or simplified schemes to consider clouds and water vapor [83], thus underestimating the cloud cover [84], stratospheric water vapor [85] and neglect their negative feedbacks [86]. As revealed by early studies, high cloud changes are the largest contributor to the model spread in longwave and shortwave cloud feedbacks [87] when the concentration of carbon dioxide increases. Lapse rate feedback can also cause different responses in the balance of low and high latitude surface warming [88].
However, we found model-document mismatches ( Figures S9 and S10). In China, historical documents show extreme warm records of 33 years in total in NC and 15 years in total in YRV (Table 2). However, TX90p simulated with CCSM4, BCC-CSM1.1 and CSIRO-Mk3L-1-2 showed 16, 12 and 11 years that agreed with the historical records in NC ( Figure S9). TN90p simulated with BCC-CSM1.1, MRI-CGCM3 and CSIRO-Mk3L-1-2 illustrated 8, 4 and 2 years, which were in agreement with historical records in NC. Meanwhile, in YRV the TX90p indices simulated with CSIRO-Mk3L-1-2, CCSM4 and MPI-ESM-P showed 7, 4 and 3 years that were consistent with the historical documents ( Figure S10). TN90p simulated with CSIRO-Mk3L-1-2 could produce a few years with warm extremes recorded in the historical documents. These model-document mismatches were not surprising because our model validation demonstrated that these global climate models had limited abilities in simulating the interannual variability of extreme warm indices.
Mismatches could also be found during the Medieval Climate Anomaly (MCA). As illustrated in an early study [75], the reconstructed annual mean temperature was 0.3 • C higher than the reference period (900-1900 CE) in Figure S11c. However, the clear warming was not replicated in the simulations in Figure S11a,b. The underestimation was also revealed in the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [89]. Moreover, the reconstructions showed the warmth during the MCA was similar to the 20C WP whereas the simulated temperature during the 20C WP was higher than the MCA. Therefore, the models used in this study could not reproduce the warm MCA period in the reconstruction.
Mismatches appeared not only in the time series but also in the spatial distributions of extreme warm events. During the last 700 years, 1743 CE was the hottest year in NC [30]. There are 52 records (the largest number of records in Table 2) found in the historical documents. These records describe that the extreme hotness occurred in the Beijing, Tianjin, Hebei, Shanxi and Shandong provinces [30]. A conversion from the temperature measurements using a Réaumur-scale thermometer showed that the daily temperature reached 44.4 • C in Beijing in 1743 CE [30]. However, many models could not simulate this extreme warm event well (Figure 7). It seemed that BCC-CSM1.1 reproduced the best simulations relative to the other five models. The BCC-CSM1.1 modelled TX90p field (Figure 7b) for 1777 CE generally agreed with the documented distribution of this warm event (Figure 7g) from historical documents.
Despite the notable model-data mismatches, the simulated shifts in the periods of warm extremes were authentic. In YRV, the shift in periods happened around 1850 CE, which is clearly associated with global warming after the industrial revolution [90]. In NC, however, it seemed that the shift in the periods of warm extremes occurred in the middle 17th century, which is earlier than the industrial revolution. Here, models simulated the shifts in the periods of warm extremes in NC after the 17th century ( Figure 5). It was obvious that the model responses to the climate forcings considered in the LM experiment were modified after the 17th century indicating a few potential mechanisms behind the shift of periods. As suggested by a few previous studies [91][92][93], many internal responses, for example, the intensified western Pacific sub-tropical high, the weakened Northeast China cold vortex or the negative phase of the North Atlantic Oscillation, can cause the high temperatures in summer in China. In addition to the historical records, tree ring data from Japan (30.333 • N, 130.5 • E) [94], which is located at the same latitude as NC, also show more frequent warm extremes after the 17th century. Hence, the consistency between the simulations and the reconstructions suggested this shift of periods should not be neglected and the potential mechanisms behind them should be further investigated in the future. Despite the notable model-data mismatches, the simulated shifts in the periods of warm extremes were authentic. In YRV, the shift in periods happened around 1850 CE, which is clearly associated with global warming after the industrial revolution [90]. In NC, however, it seemed that the shift in the periods of warm extremes occurred in the middle 17th century, which is earlier than the industrial revolution. Here, models simulated the shifts in the periods of warm extremes in NC after the 17th century ( Figure 5). It was obvious that the model responses to the climate forcings considered in the LM experiment were modified after the 17th century indicating a few potential mechanisms behind the shift of periods. As suggested by a few previous studies [91][92][93], many internal responses, for example, the intensified western Pacific sub-tropical high, the weakened Northeast China cold vortex or the negative phase of the North Atlantic Oscillation, can cause the high temperatures in summer in China. In addition to the historical records, tree ring data from Japan (30.333° N, 130.5° E) [94], which is located at the same latitude as NC, also show more frequent warm extremes after the 17th century. Hence, the consistency between the simulations and the reconstructions suggested this shift of periods should not In summary, Chinese historical documents show that extreme warm records became more frequent after the 17th century in NC and after the mid-19th century in YRV. Although CMIP5 global climate models still have limited abilities in reproducing interannual temperature variabilities and the single extreme warm events in East Asia, models can simulate the shifts in the periods of warm extremes. This model-document agreement suggests that it remains interesting and important to further investigate the mechanism behind the shifts in warm extremes in monsoonal China. Simulated TN90p by six models. The shaded area show the periods with confidence levels higher than 95% and the cone of influence is shown as black line., Figure S7: Trend distribution of summer extreme temperature indices in days per decade from 1650-2005 CE. The dots represent the trends that are significant at a 95% confidence level, Figure S8: Trend distribution of summer extreme temperature indices in days per decade from 1850-2005 CE. The dots represent the trends that are significant at a 95% confidence level, Figure S9