Dataset of trend-preserving bias-corrected daily temperature, precipitation and wind from NEX-GDDP and CMIP5 over the Qinghai-Tibet Plateau

A bias-corrected dataset containing daily meteorological data over the Qinghai-Tibet Plateau has been generated using a trend-preserving bias correction, the Inter-Sectoral Impact Model Intercomparison Project (ISI-MIP) approach, with a high-quality gridded meteorological dataset based on ground observations (CN05.1). The dataset contains daily bias-corrected values of maximum/minimum near-surface air temperature, precipitation and mean near-surface wind speed from 15 models from the Fifth Phase of the Coupled Model Intercomparison Project (CMIP5) and a downscaled high-resolution dataset (NEX-GDDP), based on CMIP5 models, over the Qinghai-Tibet Plateau (QTP) during 1986–2095. This dataset provides an important reference for the study of future climate change and its impacts in the Qinghai-Tibet Plateau region.


Specifications table
Global and Planetary Change; Geology Specific subject area Climate change; Natural disasters Type of data NetCDF How data were acquired NEX-GDDP/CMIP5 data were downloaded from their websites (links are provided in the "Description of data collection" section). A trend-preserving bias correction was applied to calibrate them. They were further processed to produce annual and seasonal mean and extreme values, to assess the skill of the bias correction. Matlab2019b and ArcGIS10.3 were the main tools used for data processing. Data format Raw and analyzed Parameters for data collection NEX-GDDP: 0.25 °× 0.25 °spatial resolution, training period 1986-2005, future period 20 06-2095/2099/210 0 (varies by model), daily temporal resolution, raster data. CMIP5: spatial resolution varies by model (see Table 1

Value of the data
• This dataset includes daily maximum/minimum near-surface air temperature, precipitation and mean near-surface wind speed from 15 models under the RCP4.5 and RCP8.5 scenarios. Its accuracy is further improved by being bias-corrected using local observation data (CN05.1), containing more observation gauges. • This dataset is of great value to researchers who study the impacts and risks of climate change in the QTP. Its spatial and temporal resolution allows potential users to develop their impact or risk assessment models at a higher resolution in this region. • For further insights and development of experiments, for example, it can be used to predict future extreme weather events, crop yields, etc. Furthermore, using data from 15 models enables multi-model ensemble analysis, which is considered to be essential in climate projection and climate risk analysis.

Data Description
In the following, "NEX-GDDP/GCMs before bias correction" refers to the original data that we downloaded from the relevant websites and "NEX-GDDP/GCMs after bias correction" refers to the data that we have bias-corrected by applying the ISI-MIP approach.
The raw dataset contains daily maximum/minimum near-surface air temperature, precipitation and mean near-surface wind speed from 15 models (see Table 1 ), which were bias-corrected using the ISI-MIP approach [1] under two RCP scenarios (RCP4.5 and RCP8.5). The resolution is 0.25 °× 0.25 °. The data are stored in NetCDF format. The download links are provided in the section "Data accessibility".
For analyzed data, we processed the daily bias-corrected data into annual, winter (from December to next February, DJF) and summer (from July to August, JJA) values and then calculated the differences between NEX-GDDP/GCM data before/after bias correction and CN05.1 (i.e., multi-model minus CN05.1) to show the improvement from applying the bias correction process. For mean values, we use annual, winter and summer average of Tmax, Tmin, Pr, and Wind. For climate extreme values, we use the 95th percentile of Tmax, the 5th percentile of Tmin, the 95th percentile of Pr and the 95th percentile of Wind. The trend values are represented by the slope of the unary linear regression of the observed/multi-model time series.
The rest of this section shows the differences between the NEX-GDDP/GCM data before/after bias-correction and the reference data.

Differences between NEX-GDDP/GCM data and observations (CN05.1) before and after bias correction during the training period (1986-2005)
As can be seen from Figs. 1 and 2 , there are large spatial differences between the models and CN05.1 before the bias correction is applied. Although the NEX-GDDP data have already been bias-corrected using GMFD (Global Meteorological Forcing Dataset) during their generation process, it is still necessary to correct the biases in this dataset using CN05.1, which contains more observation gauges over the study area. After bias correction, the agreement between the dataset and CN05.1 for the training period has substantially improved.

Changes during the future period (2006-2095) under RCP4.5, after bias correction
For the future period, the bias correction process adjusts the values to better relate to historical values ( Figs. 3 a and 4 a), while the long-term trend, which is represented by the linear regression slope, is well preserved ( Figs. 3 b and 4 b). The spatial variation of the differences between the models before and after bias correction are shown in Figs. 3 c and 4 c.

Experimental Design, Materials and Methods
The models to be bias-corrected contain two sets of simulated data. Fifteen models were selected, with data from a training period (1986-2005), a validation period (1966)(1967)(1968)(1969)(1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984)(1985) and a future period (20 06-2095/2099/210 0) under two Representative Concentration Pathway (RCP) scenarios, 4.5 and 8.5. Tmax, Tmin and Pr were obtained from NEX-GDDP and Wind was obtained from CMIP5. Tmax and Tmin are at a height of 2 meters and Wind is at a height of 10 meters. Near-surface wind is important to climate change in terms of the carbon exchange, water transport, evapotranspiration, etc., but NEX-GDDP does not contain this variable. We aim to provide a dataset containing interpolated and bias-corrected wind speeds that are convenient to use in research. NEX-GDDP is a statistical downscaling dataset based on CMIP5. In our dataset, Tmax, Tmin and Pr (from NEX-GDDP) and Wind (from CMIP5) come from the same 15 models and the same ensemble (r1i1p1).  The reference dataset is CN05.1 [4] , a daily gridded dataset released by the China Meteorology Administration's National Climate Center, including Tmax, Tmin, Pr and Wind. Tmax and Tmin are at a height of 2 meters and Wind is at a height of 10 meters. The spatial resolution is 0.25 °× 0.25 °and the data cover the period from 1st January 1966 to 31st December 2005.
1) Correction for monthly mean Following the algorithm, a constant offset C was used to bias-correct temperature, which was the difference of 20-year monthly mean temperature of CN05.1 and GCMs: Multiplicative correction factor c was used to bias-correct precipitation and wind, which was the ratio of 20-year monthly mean precipitation/wind of CN05.1 and GCMs. Take precipitation as an example, in which P CN05 . 1 i and P GCM i were the monthly mean precipitation of CN05.1 and GCMs. 2) Correction for daily variability In the second step, daily variability was corrected by mapping daily residuals. For temperature, The residual temperature T GCM i j of year i , day j was simply the difference between the daily value and its corresponding monthly value.
For precipitation and wind, the daily residual was the ratio between daily value and corresponding monthly value. Take precipitation as an example, The variability of daily residuals of the GCMs was adjusted by using some transfer functions. For temperature, we had the function where B was the slope of linear regression on the rank ordered daily residuals of CN05.1 data ( T CN 05.1 ) and GCMs data ( T GCM ). The daily variability of precipitation and wind was adjusted using nonlinear regression, g δˆ P GCM = a + b · δˆ P GCM − δˆ P GCM min × 1 − exp − δˆ P GCM − δˆ P GCM min τ , where δˆ P GCM was the rank ordered daily residual data of GCMs, δˆ P GCM min is the lowest value, and a, b and τ were the parameters of the function. Superscript ˆ represented the data of which the frequency of dry days was corrected (see Ref [1] ).