Creation and Veriﬁcation of a High-Resolution Multi-Parameter Surface Meteorological Assimilation Dataset for the Tibetan Plateau for 2010–2020 Available Online

: The Qinghai–Tibet Plateau (QTP) is a crucial component of the global climate system, inﬂuencing the regional and global climate through complex thermal and dynamic mechanisms. The high-altitude region, which is the largest part of the extra-polar cryosphere, encompasses extensive mountain glaciers, permafrost, and seasonally frozen land, making it highly sensitive to global climate change. However, the challenging environmental conditions, such as the harsh terrain and high altitude, coupled with sparse weather station distribution and weak observatory representation, make it difﬁcult to accurately quantify the atmospheric conditions and land–atmosphere coupling systems and their effects on the surrounding areas. To address these challenges, we utilized the Weather Research and Forecasting (WRF) model and a three-dimensional variational (3DVAR) assimilation method to create a high-resolution assimilated dataset (HRAD). The QTP-HRAD, covering the spatial range of 70 to 110 ◦ E and 25 to 40 ◦ N, was validated using both surface weather station observations and the European Center for Medium-Range Weather Forecasts Reanalysis V5, and can now be utilized for further studies on land–atmosphere interactions, water cycling and radiation energy transfer processes, and extreme weather events in the region.


Introduction
The Qinghai-Tibet Plateau (QTP) is widely recognized as the world's highest and largest plateau, with a highly complex terrain that includes mountain ranges, vast plains, and numerous lakes and rivers. This unique geographical region, sometimes referred to as the "Roof of the World," is situated in the heart of Asia and covers an area of over 2.5 million square kilometers. Due to its high elevation, harsh climate, and unique ecology, the QTP is often referred to as the "third pole" of the Earth, after the North and South Poles. The region plays a crucial role in global climate, as it is a major source of freshwater and a key driver of atmospheric circulation patterns that affect weather patterns around the world [1]. It serves as the origin of numerous major Asian rivers, making it a vital "hydrological tower" in Asia [2,3]. However, the plateau is particularly susceptible to the impacts of global climatic alteration, exhibiting a heightened rate of temperature amplification in comparison to other regions [4,5]. The impacts of global warming on the Tibetan Plateau have become a subject of great scientific interest in recent years [6][7][8]. Multiple observational and monitoring projects were implemented on the QTP. To better understand the mechanism of the land-atmosphere interaction and land surface processes on the Plateau, researchers have conducted numerical simulation studies [6].
The Tibetan Plateau's thermal impact on the atmosphere significantly impacts the land-atmosphere circulation of China, East Asia, and the entire world [9][10][11][12][13][14][15][16]. The Plateau's cryospheric processes, such as the absorption of solar radiation and seasonal variations in surface heat and water fluxes, lead to complex interactions between the land surface and atmosphere that make it challenging to fully understand and simulate the energy and water cycle of the QTP and its impact on global climate change [10,12,17]. Although there has been a growing amount of research on land-atmosphere interactions over the QTP, observational experiments to determine terrestrial and atmospheric parameters are limited by the harsh environmental conditions, high altitudes, and sparse distribution of weather stations. Additionally, most observational experiments are conducted only in summer at a few locations, leaving gaps in our quantitative understanding of the local land-atmosphere coupling systems and their effects on the surrounding areas [18].
A series of atmospheric field experiments have been conducted over the Tibetan Plateau (TP) since the 1970s. Notable examples include the first Qinghai-Xizang Plateau Meteorology Experiment (QXPMEX) [19], the Global Energy and Water Cycle Exchanges (GEWEX) Asian Monsoon Experiment (GAME)/Tibet intensive observation [20], and the Coordinated Enhanced Observing Period (CEOP) Asia-Australia Monsoon Project on the Tibetan Plateau (CAMP/Tibet) [21]. Through these experiments, several field observational stations were established and kept in operation. After decades of effort and with an optimized scientific design and layout, the level of atmospheric observation has greatly improved with respect to observation infrastructure, technology, and meteorological elements observed. Since 2014, the third Tibetan Plateau Atmospheric Science Experiment (TIPEX-III) has been underway. As part of this experiment, routine automatic sounding systems were deployed in the western plateau region, filling the gap of routine operational sounding stations previously lacking in this area. Additionally, observational networks for soil temperature and moisture in the central and western TP were established. TIPEX-III also conducted plateau and regional-scale boundary layer observations, measured cloud-precipitation microphysical characteristics through the use of multiple radars and aircraft campaigns, and collected tropospheric-stratospheric atmospheric compositions at multiple sites [22].
With the development of numerical simulations and data assimilation techniques, regional numerical models have become widely used for studying climate change over the QTP [23][24][25][26]. However, existing numerical models have defects in reflecting the earthatmosphere coupling process, particularly the physical process of clouds and precipitation, and their influence on complex terrain conditions [27][28][29]. The simulation of the 2 m temperature over the QTP using regional climate and global circulation models tends to have a consistent underestimation or cold bias. This systematic error affects the accuracy of the climate model simulations for the QTP region [30], and the cold bias of different regional models over the QTP is between −11 • C and −0.3 • C [31][32][33][34][35]. At present, there are large uncertainties in the numerical simulation data of the precipitation in high-altitude mountainous areas. The rugged and steep terrain, location of observation sites, wind-blown snow, and low-temperature environments hinder the establishment of high-quality grid precipitation data [36]. Because of the coarse resolution, model physics, and dynamics, the global climate model showed a substantial bias over the QTP, and the simulated precipitation was significantly overestimated over this area. The dynamic downscaling simulation of precipitation is better than that of the global climate model, but the overestimation of precipitation still exists, with the highest value of 35% [34]. Atmospheric reanalysis data and various remote sensing products in the QTP region also have significant uncertainties [37][38][39][40]. These factors cause a low accuracy of regional numerical models in the QTP region, which seriously affects the assessment of climate change in this region [41].
With advances in computational resources, dynamic downscaling has become a candidate for climate studies with small-scale information, which can now be simulated over decades at a 1-10 km resolution [42,43]. The regional model can generate high spatial and temporal resolutions and climate-relevant long-term series of local climatic factors. In addition, the consistency of model physical processes can be used to generate datasets across different climatic scales [44]. Therefore, with the continuous improvement of model resolution and the establishment of a rapid assimilation update cycle system [45], ground, radiosonde, satellite, and aerial observation data with high spatial and temporal resolutions are playing an increasingly important role in assimilation systems [46,47].
In the assimilation process, the assimilation of large amounts of ground observation data not only makes the ground elements of the model closer to the actual ground observations, but also affects the simulation of the boundary layer and above the boundary layer, as well as the vertical motion and structure related to the surface features observed by these ground elements [46,48]. Therefore, effectively integrating multi-source observational data into the numerical model system improves the numerical simulation results of the complex underlying surface of the QTP, enhances the simulation accuracy of the numerical model, and uses the assimilation data results to analyze the occurrence and evolution of extreme temperature and precipitation events. Understanding and exploring the formation mechanism and response to global warming of extreme weather and climate events in the southeastern plateau of the QTP is of significant importance.
For the first time, we generated high spatial and temporal resolution (horizontal 5 × 5 km, once every 1 h) assimilation data in the QTP (QTP-HRAD) over 11 years (2010-2020). The assimilation data include 28 near-surface meteorological variables, including the radiation and energy field variables. The hourly values of the 2 m temperature, land surface temperature, relative humidity, surface pressure, water vapor mixing ratio, 10 m U and V wind components, precipitation, and dew point temperature at 5 × 5 km are available online [49].
Our compiled QTP-HRAD is expected to help more users to understand the spatialtemporal characteristics of key attributes in the QTP region of China and be applied in different fields. In addition, the QTP-HRAD can provide data references for the comprehensive evaluation of plateau ecological environment, extreme weather, and climatology. The flowchart for constructing the QTP-HRAD is shown in Figure 1.

Introduction to WRF Model Configuration
This dataset is generated and produced by the Weather Research and Forecasting (WRF) model version 4.2.2 and the three-dimensional variation (3DVAR) module. The at-

Introduction to WRF Model Configuration
This dataset is generated and produced by the Weather Research and Forecasting (WRF) model version 4.2.2 and the three-dimensional variation (3DVAR) module. The atmospheric vertical layer has 35 layers, the atmospheric pressure at the upper boundary of the WRF model is 50 hPa, and the "CONUS" physics suite was selected as the parameterization series scheme of the simulation and assimilation. The "CONUS" series scheme is a combination of physical options that are highly tested and show reasonable results. The configuration is as follows: The microphysics was characterized using the Thompson scheme [50], and the Rapid Radiative Transfer Model for General Circulation Models (RRTMG) [51,52] was utilized to determine the calculation of longwave and shortwave radiations and their atmospheric transfer. The cumulus clouds were modeled using the "Tiedtke and Zhang" convection scheme [53,54]. The Noah land surface model (Noah-LSM) is particularly effective in simulating soil temperature and moisture in multiple layers, which allows for a more precise representation of the vertical structure of the land surface within the WRF, taking into account soil temperature and moisture in four layers, fractional snow cover, and frozen soil physics. This helps to accurately represent processes over ice sheets and snow-covered areas [55]. The Mellor-Yamada-Janjic (MYJ) scheme was utilized to resolve planetary boundary layer processes [56]. The thermal roughness length and standard similarity functions were obtained from look-up tables in the surface layer scheme, which is based on the Monin-Obukhov theory with Zilitinkevich and used in the Eta model [57]. During long simulations, the model offers the option to input time-varying data and continuously update these fields [58,59]. The 12-month values of vegetation fraction and albedo were obtained from the geogrid program to be used in the WRF model, but it does not have the capability of predicting sea surface temperature, vegetation fraction, albedo, or sea ice. To overcome this limitation, time-varying sea surface temperature (SST) and sea ice fields must be read into the model, allowing for updates of these fields during long simulations.
The WRF configuration is presented in Table 1. In this study, a double-nested grid system was utilized, with a horizontal grid spacing of 25 km in domain 1 and 5 km in domain 2 ( Figure 2). The two domains used the geographic data resolution of the "usgs_lakes + default" option. The central latitude and longitude of domain 2 are 87.76 • E and 33.41 • N, and the latitude and longitude of domain 2 ranges from 72.3 to 103.2 • E and from 25.9 to 40.3 • N, respectively. The National Centers for Environmental Prediction (NCEP) Final Analysis with a 6-h resolution and 1 degree × 1 degree spatial resolution were used as the initial and lateral boundary conditions for the WRF model. The data were obtained from the Global Forecast System (GFS). The simulation covered a period of 10 years, starting on 1 January 2010, at 00:00 UTC and ending on 31 December 2020, at 23:00 UTC.

Introduction to NCEP_ADP Data and Assimilation Methods
For the assimilation data source, global observation data from the National Centers for Environmental Prediction (NCEP) Automated Data Processing (ADP) was utilized. The timeframe of this data spans from 1 January 2010 to 31 December 2020, and was incorporated into the WRF-3DVAR system for assimilation.
The NCEP-ADP data contain surface synoptic observations (SYNOP), radio-sounding data (SOUNDING), satellite observations (SATOBs), and a few aircraft reports (AIRREPs) and the Meteorological Terminal Air Report (METAR). Satellite observations (SATOBs) track cloud and wind data using cloud motion obtained using space-borne infrared derivations or water vapor imagers. These data and reports include air pressure, geopotential height, air temperature, dew point temperature, wind direction, and wind speed, with up to 20 layers of upper-air observations available ranging from 1000 to 10 hPa. The reporting intervals range from hourly to 12 h (https://rda.ucar.edu/datasets/ds461.0/, accessed on 31 December 2020). and 33.41°N, and the latitude and longitude of domain 2 ranges from 72.3 to 103.2°E and from 25.9 to 40.3°N, respectively. The National Centers for Environmental Prediction (NCEP) Final Analysis with a 6-h resolution and 1 degree × 1 degree spatial resolution were used as the initial and lateral boundary conditions for the WRF model. The data were obtained from the Global Forecast System (GFS). The simulation covered a period of 10 years, starting on 1 January 2010, at 00:00 UTC and ending on 31 December 2020, at 23:00 UTC.   The NCEP-ADP was used in the NCEP-FNL analysis. However, due to the resolution of FNL data being 1 × 1 degree, it is difficult to reconcile multiple observation data within a grid when assimilating these dense observation data. Therefore, when using the WRF model with 3DVAR method for assimilation, we not only increase the grid resolution, but also consider some control and constraint options when setting assimilation parameters to better iterate and converge. For example: (1) The multi-level q adjustment should be made while ensuring the conservation of column-integrated water vapor, considering the need to verify the supersaturation (RH > 95%) and the minimum relative humidity (RH < 11%) requirements. (2) The assimilation of surface observations will rely on the surface similarity theory within the planetary boundary layer instead of using the lowest model level's first guess. (3) Any stations where the model's interpolated height differs from the actual observation station's height by more than 100 m will be deemed invalid and excluded. (4) This specifies the method used to interpolate the background into the observation space. The method involves selecting the neighboring model grid point with the smallest height difference, while disregarding the grid points over water. Figure 2 shows a schematic diagram of the simulated region, underlying land use/vegetation types, and distribution of the NCEP_ADP observation data. It can be seen that the surface SYNOP and SOUNDING sites are mostly distributed in the eastern part of the QTP, whereas there are only a few ground and radio-sounding sites in the western part of the plateau. SATOBs were evenly distributed in the entire simulated region. A few AIRREPs and the METAR are available for the areas south of the Himalayas.
The three-dimensional variational assimilation theory entails the minimization of the cost function as follows [61,62]: where y 0 is the observed data, x is the model state, x b is the background guess, and B is the background error covariance matrix. The first term of the cost function represents the difference between the observed data and the model state, and the second term represents the difference between the model state and the background. The purpose of the assimilation process is to minimize the cost function, which means that the difference between the observed data and the model state is reduced as much as possible, and the background error is controlled [63]. The last stage involves updating the lower boundary and inserting the analysis obtained from the WRF data assimilation system into the WRF model. This study utilizes a cycling assimilation approach that incorporates a quasi-isentropic coordinate for the analysis increment. By utilizing the potential temperature structure around the observation, this method dynamically adjusts the impact of observations. The 6 h update cycle guarantees that the analysis and short-term forecast remain current [46]. Every 6 h, the cycling 3DVAR technique assimilates recent observations by utilizing the previous 6 h model output as a reference to generate a fresh estimate of 3D atmospheric fields. The approach involves analyzing the observation-minus-simulation residuals, or innovations, to obtain a 3D multivariate error field estimate known as the analysis increment. This increment is then combined with the 6 h forecast background to produce the updated analysis. Through the model's filter, the 6 h forecast incorporates insights from previous observations into the current analysis [46]. The 3DVAR system operates in cycling mode, which allows the background cloud water and rainwater from the previous cycle to be incorporated into the current cycle. By including these variables in the 3DVAR system, their information can be transferred to the subsequent cycle, reducing the time required for the model's cloud water and rainwater spin-up during the integration from the analysis stage [64]. At the same time, the "SST update" option is turned on to update the lower boundary conditions in the WRF model, including sea surface temperature (SST), background albedo (ALBBCK), green vegetation fraction (GVF), and underlying soil temperature (TMN) once every 6 h. These variables are updated to the forecast file via "warm start" after the assimilation.
A schematic diagram of the cycling assimilation process is shown in Figure 3.
background error covariance matrix. The first term of the cost function represents the difference between the observed data and the model state, and the second term represents the difference between the model state and the background. The purpose of the assimilation process is to minimize the cost function, which means that the difference between the observed data and the model state is reduced as much as possible, and the background error is controlled [63]. The last stage involves updating the lower boundary and inserting the analysis obtained from the WRF data assimilation system into the WRF model. This study utilizes a cycling assimilation approach that incorporates a quasi-isentropic coordinate for the analysis increment. By utilizing the potential temperature structure around the observation, this method dynamically adjusts the impact of observations. The 6 h update cycle guarantees that the analysis and short-term forecast remain current [46]. Every 6 h, the cycling 3DVAR technique assimilates recent observations by utilizing the previous 6 h model output as a reference to generate a fresh estimate of 3D atmospheric fields. The approach involves analyzing the observation-minus-simulation residuals, or innovations, to obtain a 3D multivariate error field estimate known as the analysis increment. This increment is then combined with the 6 h forecast background to produce the updated analysis. Through the model's filter, the 6 h forecast incorporates insights from previous observations into the current analysis [46]. The 3DVAR system operates in cycling mode, which allows the background cloud water and rainwater from the previous cycle to be incorporated into the current cycle. By including these variables in the 3DVAR system, their information can be transferred to the subsequent cycle, reducing the time required for the model's cloud water and rainwater spin-up during the integration from the analysis stage [64]. At the same time, the "SST update" option is turned on to update the lower boundary conditions in the WRF model, including sea surface temperature (SST), background albedo (ALBBCK), green vegetation fraction (GVF), and underlying soil temperature (TMN) once every 6 h. These variables are updated to the forecast file via "warm start" after the assimilation.
A schematic diagram of the cycling assimilation process is shown in Figure 3.  The schematic diagram of cycling assimilation data in the WRF model. The bottom brackets mean every cycle of assimilation, and the bold arrows mean the observations are assimilated from the time window "00", "06", "12", and "18".

Introduction to Observational Data and ERA-5 Data
The National Meteorological Information Center (NMIC) of the China Meteorological Administration (CMA) published the China surface climatic daily dataset (V3.0) on http://data.cma.cn/, accessed on 29 March 2023. This dataset comprises daily meteorological observations from 91 stations located in the simulated QTP region. This dataset includes the daily averaged surface pressure (hPa), 2 m temperature ( • C), 2 m relative humidity (%), precipitation (mm), evaporation (mm), 10 m wind direction ( • ), wind speed (m/s), and ground surface temperature ( • C) from 1 January 2010 to 31 December 2019. Before being utilized, the NMIC conducts quality control and inspection on the data. We used these observation data for the model evaluation.
The European Center for Medium-Range Weather Forecasts Reanalysis V5 (ERA-5) reanalysis data were used to provide additional information on atmospheric variables such as temperature, humidity, wind speed, wind direction, and precipitation in the QTP region [65,66]. These data were used in conjunction with the China surface climatic daily dataset to evaluate the performance of the WRF model simulations. To encompass the timeframe from January 2010 to December 2020, monthly mean data from ERA-5 were employed in this investigation. These data cover a 30 km grid and span 137 atmospheric levels, up to 80 km in altitude from the surface [67].
The QTP-HRAD from the WRF model was averaged to a daily time scale for comparison and error analysis with observations from meteorological stations. We also compared the QTP-HRAD with the gridded dataset of ERA-5, comparing variables such as temperature, 2 m temperature, surface temperature, surface pressure, 2 m specific humidity, 10 m wind speed, and total precipitation. We used patch interpolation and the Earth System Modeling Framework (ESMF) re-gridding toolbox in the National Center for Atmospheric Research (NCAR) Command Language (NCL) (https://www.ncl.ucar.edu/ Document/Functions/ESMF/ESMF_regrid.shtml, accessed on 8 June 2022) and calculated the difference between the two datasets.

Comparison of Observations and QTP-HRAD Simulations from 10 Weather Stations
We selected 10 stations (Qamdo, Garz, Hotan, Nyingchi, Nagqu, Pulan, Qumacai, Zoige, Tuotuohe, and Yushu) at different locations in and around the QTP as representative stations to compare the daily mean values of the 2 m temperature, relative humidity, precipitation, and evaporation. Figure 4 shows the time series comparison of the 2 m temperature and relative humidity observations and the QTP-HRAD data at 10 stations from 2010 to 2019. It can be seen that the observed values of the 2 m temperature at the Garz, Nyingchi, and Pulan stations are all higher than the simulated values throughout the year, and the observed values at the other stations are in good agreement with the QTP-HRAD values. It can also be seen from the error analysis table ( Table 2) that the RMSE between the observed and simulated daily mean 2 m temperatures of the Garz, Nyingchi, and Pulan stations are higher than 6.8 • C, and the bias is below −6.6 • C. The RMSE and bias of the other stations between the observations and simulations were lower in the Hotan, Nagqu, Zoige, and Tuotuohe stations, at approximately 2 • C (RMSE)and between −1 and 1 • C (bias), respectively. The correlation coefficients of the 2 m temperature between the observed and simulated values were all from 0.95 to 0.99 at these 10 stations, which all passed the 95% significance test.
The simulated value of the relative humidity also varied annually. Except for the Hotan station, the simulated values of the other stations showed that the relative humidity peaks in the summer and valleys in the winter, mainly due to the higher temperature in the summer and more water vapor evaporating into the air. The Hotan station is located at the edge of the Taklimakan Desert. Owing to the lack of adequate water supply on the underlying surface, less water vapor evaporates into the atmosphere in the summer. The relative humidity increases in the winter when the temperatures are lower, and therefore, the annual variation in the relative humidity and temperature is reversed. Both the observed and simulated results reflect this trend. The average annual relative humidity and specific humidity at the Hotan station are 37% and 4.1 g/kg, respectively, which are much lower than those at the other stations. The statistical error results ( Table 2) show that the RMSE and bias of the specific humidity in the Hotan station are the lowest among the 10 observation stations (0.75 and 0.26 g/kg, respectively). The Garz station, located in southeast Tibet, has the highest annual average relative humidity of 79% and a specific humidity of 5.1 g/kg.
In general, the WRF model showed a good simulation effect on the 2 m temperature and relative humidity in the QTP region. The daily averaged simulated values were consistent with the observed values, and accurately reflected the annual and interannual variations in the temperature and humidity. For some stations located in the complex terrain area, the simulation errors for the temperature and humidity were slightly larger than those in the flat terrain area.   The simulated value of the relative humidity also varied annually. Except for the Hotan station, the simulated values of the other stations showed that the relative humidity peaks in the summer and valleys in the winter, mainly due to the higher temperature in the summer and more water vapor evaporating into the air. The Hotan station is located at the edge of the Taklimakan Desert. Owing to the lack of adequate water supply on the underlying surface, less water vapor evaporates into the atmosphere in the summer. The   Figure 5 shows a comparison of the daily total precipitation and the daily mean actual evaporation at 10 stations in the QTP region. The red and blue column lines represent the observed and simulated daily total precipitation, respectively, and the gray line represents the actual simulated daily evaporation. As no actual evaporation was observed, only the simulated evaporation was plotted on the graph. It can be seen that the annual variations of the simulated and observed precipitation were relatively consistent, with more precipitation in the summer and less in the winter. Except for the Hotan station, the maximum daily precipitation of the other stations can reach between 30 and 40 mm in the summer, or even 50 mm, which indicates heavy rain and rainstorm. The simulated value reflects the peak value of summer precipitation. At the Pulan station, more precipitation days were observed than those that were simulated. At the Nyingchi station, the observed precipitation in the winter was slightly greater than the simulated value. The annual evaporation was positively correlated with the precipitation. The daily average value of evaporation in the summer is approximately 2-4 mm/d, and the evaporation in the winter tends to be zero. In general, the stations in the eastern and southern parts of the QTP, such as the Qamdo, Garz, Nyingchi, and Zoige stations, have more precipitation and evaporation, while the western stations and the stations in the arid region on the north side of the plateau, such as Hotan and Tuotuohe, have less precipitation and evaporation. The variation trend of precipitation was simulated well by the WRF model, which was in good agreement with the observed values. The observed and simulated hourly values of 2 m air temperature, dew point temperature, and precipitation at 10 representative stations on the plateau were also compared (figure omitted). The hourly simulated values of some stations are closer to the observed values, such as in the Hotan, Nagqu, Zoige, Tuotuohe, and Yushu stations, but the simulated values of some stations are higher than the observed values by about 3 to 5 °C, such as the Garz, Nyingchi, and Pulan stations. The simulated hourly precipitation can also reflect sporadic precipitation events over the plateau in the summer. At the Hotan and Pulan stations, precipitation events rarely occur, and the simulated value is almost zero. The large bias between the hourly simulation values and the observation values at stations such as Garz, Nyingchi, and Pulan is caused by the large difference between the model grid altitude and the station altitude under complex terrain. Therefore, it is recommended to make corrections to the "pressure-altitude" height difference in order to make The observed and simulated hourly values of 2 m air temperature, dew point temperature, and precipitation at 10 representative stations on the plateau were also compared (figure omitted). The hourly simulated values of some stations are closer to the observed values, such as in the Hotan, Nagqu, Zoige, Tuotuohe, and Yushu stations, but the simulated values of some stations are higher than the observed values by about 3 to 5 • C, such as the Garz, Nyingchi, and Pulan stations. The simulated hourly precipitation can also reflect sporadic precipitation events over the plateau in the summer. At the Hotan and Pulan stations, precipitation events rarely occur, and the simulated value is almost zero.
The large bias between the hourly simulation values and the observation values at stations such as Garz, Nyingchi, and Pulan is caused by the large difference between the model grid altitude and the station altitude under complex terrain. Therefore, it is recommended to make corrections to the "pressure-altitude" height difference in order to make better comparisons between the assimilation data and the station observation data.
It can be seen from Table 2 that the RMSE between the observed and simulated 2 m temperatures at the Garz, Pulan, and Nyingchi stations are higher at 8.39, 8.66, and 6.84 • C, respectively, and the bias is negative, indicating that the simulated values are lower than the observed values. Nevertheless, the correlation coefficient was above 0.95 at these 10 stations and passed the 95% confidence test, indicating that the daily variations in the simulated and observed values are in good agreement. The RMSE between the observed and simulated values of the specific humidity at the 10 stations was all below 1.24 g/kg, and the bias was also negative. The minimum bias value appeared in Zoige (−0.22 g/kg). The error statistics of the surface pressure show that the observed and simulated values of the RMSE and bias in the Nagqu, Hotan, and Zoige stations are lower than those at the other stations, which are approximately 10 hPa, and the RMSE and bias between the simulated and observed values of the 2 m temperature and specific humidity were also lower in these three stations. The RMSE between the simulated and observed surface pressure in the Garz, Pulan, and Nyingchi stations were higher, which was above 60 hPa, and the simulated 2 m temperature and specific humidity also showed a larger bias compared with the observations. This may be due to the large difference in altitude between the observation site and the grid corresponding to the model, which increases the simulation errors of other meteorological elements, such as temperature and humidity. The simulated value can be corrected using the actual altitude of the observation site to reduce the errors. From the simulation error results of the 10 m wind speed, the correlation coefficient of only two sites failed the 95% confidence test, and the RMSE and bias of the simulated and observed values of the 10 sites were between 1 and 2 m/s. For high altitude complex terrain mountain areas, the location and orientation of the weather stations, as well as the type of underlying surface, can significantly impact the simulation accuracy of the temperature, humidity, wind direction, and wind speed. Table 2 shows that the root mean square error (RMSE) and bias of the temperature, humidity, and wind speed are also significant when there is a large difference between the simulated and observed atmospheric pressure values at the stations. For instance, the bias of the observed and simulated surface pressure values at the Garz, Nyingchi, and Pulan stations reaches −93.84, −74.89, and −62.44 hPa, respectively. Meanwhile, the bias of the observed and simulated 2 m temperature values at the three stations also reached −8.11, −6.61, and −8.24 • C, respectively, with the RMSE values of 8.39, 6.84, and 8.66 • C. This may be due to the significant difference in elevation between the simulated grid and the observation site. Despite generating 5 × 5 km horizontal simulation data in the QTP region, the grid distance of 5 × 5 km is still relatively coarse compared to these sites. At the Hotan, Nagqu, and Zoige stations, there was little difference between the simulated surface pressure and observed values, and the simulated 2 m temperature, relative humidity, and wind speed were also consistent with the observed values. The RMSE and bias of the 2 m temperature and humidity were the lowest among the ten representative stations. This is likely because the three stations are not obstructed by high mountains and the terrain is relatively flat, and the elevation of the observation stations is close to that of the model grid points. When using these simulation data, it is better to correct the altitude to reduce the error.

Comparison of Observations and QTP-HRAD over QTP Region
We selected 91 meteorological stations located in the QTP from 2010 to 2019 to compare the statistical errors between the simulated and observed daily mean values of the 2 m temperature, surface temperature, surface pressure, relative humidity, and 10 m wind speed, and plotted the regional distribution of the correlation coefficients, RMSE, and bias. It can be seen from Figure 6a that the correlation coefficients of the 2 m temperature between the observed and simulated values are all above 0.9, and those in the north of the plateau are above 0.97. The RMSE ranged from 1.4 to 12 • C. The RMSE of some stations in the eastern and northern parts of the QTP was lower, while the RMSE of stations along the Himalayan Mountains, Tanggula Mountains, and Qilian Mountains was higher. This may be caused by the large difference between the station altitude and the model grid altitude owing to the complex topography of the mountains. A total of 70% of the stations (64 stations) had negative bias values, indicating that the simulated values were lower than the observed values, and only some stations in the southeastern plateau and north of the Kunlun Mountains had a positive bias. The statistical error distribution of the land surface temperature was similar to the 2 m temperature (Figure 6b), and the correlation coefficient between the simulations and the observations was slightly lower than that of the 2 m temperature. However, the correlation coefficient for almost all stations exceeded 0.9. The RMSE and bias of the surface temperature were larger than those of the 2 m temperature. The RMSE of 70% of the stations (64 stations) was above 4 • C, and the bias of 85% of the stations (76 stations) was negative. This may be due to the underestimation of the air and surface temperatures in the QTP by the NCEP data of the forcing field in the WRF model. Studies have shown that the temperature in the QTP is underestimated in the NCEP data [68,69]. According to the research conducted by Frauenfeld et al. [70], there was an approximate difference of 7 • C between the ERA-40 reanalysis data and the observations. Similarly, Baolin et al. [71] discovered that the NCEP reanalysis data significantly underestimated the observational data for the QTP.
As shown in Figure 6c, the correlation coefficient between the observed and simulated values of the surface pressure was above 0.9. There were 17 stations with the RMSE exceeding 80 hPa, accounting for 19% of all stations, and only 15 stations had an RMSE between 1 and 10 hPa, accounting for 16.7% of the total. The distribution of bias also showed that the values of the simulated surface pressure at 68 stations were lower than the observed values, and 50% of the stations had a bias between −20 and −75 hPa. The low simulated surface temperature values can be explained by the simulation error of the surface pressure. The average elevation of the model grid cell was higher than the elevation of the observation station point, which led to a low simulated surface pressure and low simulated 2 m and surface temperatures.
The correlation coefficients between the observed and simulated values of relative humidity between 0.35 and 0.9, all passing the 95% significance level test, are illustrated in Figure 6d. The RMSE with 60 stations (66.7%) was less than 15%, and only 14 stations had an RMSE between 17% and 30%. The bias of stations along the Himalayan Mountains and near the Hengduan Mountains was positive (30 stations, accounting for 33%), indicating that the simulated value was higher than the observed value. The simulation error of the 10 m wind speed showed that the simulated value of the wind speed at most stations was approximately 2 m/s higher, and 21 stations had a correlation coefficient higher than 0.5, accounting for 24% of the total (Figure 6e). The RMSE ranged between 0.5 and 3.2 m/s, and the bias error statistics showed that the bias in most of the stations was positive (60 sites), and the simulated value was higher than the observed value. The simulated wind speed in the WRF model was high, possibly because the model underestimated the roughness of the underlying surface. This is because the simulation of the wind speed requires the consideration of many parameterization schemes, such as turbulence parameterization schemes and boundary layer parameterization schemes, and the inaccurate estimation of parameters, such as surface roughness, within these schemes can cause errors. Due to the uncertainty and randomness of atmospheric turbulence, the current mesoscale meteorological models have a lower simulation accuracy for the wind speed compared to other variables such as temperature, humidity, and atmospheric pressure [72]. Due to the complex terrain and the limited data input into the meteorological model, the deviation of the average wind speed between the simulation and observation can reach 2 m/s, and the RMSE can be as high as 6 m/s. Even with better models, reducing uncertainties in the simulations of near-surface wind speed and direction can be challenging, as they stem from random or turbulent fluctuations. Furthermore, the randomness is compounded by variations in sub-grid topography and land use [72][73][74].
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 21 the 2 m temperature. However, the correlation coefficient for almost all stations exceeded 0.9. The RMSE and bias of the surface temperature were larger than those of the 2 m temperature. The RMSE of 70% of the stations (64 stations) was above 4 °C, and the bias of 85% of the stations (76 stations) was negative. This may be due to the underestimation of the air and surface temperatures in the QTP by the NCEP data of the forcing field in the WRF model. Studies have shown that the temperature in the QTP is underestimated in the NCEP data [68,69]. According to the research conducted by Frauenfeld et al. [70], there was an approximate difference of 7 °C between the ERA-40 reanalysis data and the observations. Similarly, Baolin et al. [71] discovered that the NCEP reanalysis data significantly underestimated the observational data for the QTP.  Overall, the QTP-HRAD showed good results for the QTP. Except for the wind speed, the correlation coefficients between the simulated and observed values of the temperature, humidity, and surface pressure were higher than 0.8. However, the QTP-HRAD underestimated the air temperature and surface temperature in the QTP region, and underestimated the surface pressure at most observation stations, with simulated biases ranging from −20 to −180 hPa. The model overestimated the relative humidity at the southern and southeastern stations and underestimated the relative humidity in the northern part of the plateau. The simulated wind speed value of the QTP-HRAD was higher, and the bias was between 1 and 2.8 m/s.

Comparison of QTP-HRAD and ERA-5 Data over QTP Region
We compared the QTP-HRAD data with the ERA-5 data for 2020 and calculated the difference between the QTP-HRAD and ERA-5 data in the QTP. To calculate the difference, we used bilinear interpolation to scale up the QTP-HRAD data from 5 × 5 km to 0.25 • × 0.25 • , corresponding to the grid size of the ERA-5 data, and then subtracted. Six meteorological elements, including 2 m temperature, land surface temperature, surface pressure, 2 m specific humidity, 10 m wind speed, and annual accumulated precipitation were compared. Figure 7 shows the spatial distribution of the QTP-HRAD and ERA-5. As can be seen from Figure 7, the spatial distribution characteristics of the meteorological elements of the QTP-HRAD are consistent with those of the ERA-5, which reflects the distribution of the 2 m temperature, surface pressure, specific humidity, and 10 m wind speed over the QTP. As the spatial resolution of the QTP-HRAD is 5 × 5 km, the distribution of near-surface meteorological elements is finer than that of the ERA-5, which can better demonstrate the differences caused by large topographic fluctuations.
The spatial distribution of the 2 m temperature data is depicted in Figure 7a. In the plateau, the annual average temperature at this height is below 0 • C, with the western part of the plateau having a temperature of around −10 • C, and the eastern part having a temperature of around 0 • C. The 2 m temperature of the ERA-5 is more consistent with the spatial distribution of the QTP-HRAD, but is not as fine as that of the QTP-HRAD. The difference diagram reveals that, with the exception of the central and western regions of the plateau, the difference value is around −4 • C and negative, while the rest of the regions show positive values. The spatial distribution of the surface temperature is comparable to that of the 2 m air temperature. Both the QTP-HRAD and ERA-5 are capable of capturing the extremely low surface temperatures in the western part of the plateau, near the Karakoram Mountains, which are around −15 • C. As the average altitude of the Karakoram Mountains is above 5500 m, the annual 0 • C isotherm is approximately the same as the 4200 m contour line, which is caused by low temperatures throughout the year in the vast mountainous area. From the difference map of the surface temperature, it can be seen that the difference between the central and western parts of the plateau and the Qaidam Basin in Qinghai Province was also found to be negative, and the lowest value was under −5 • C. This value was positive in the central and northern parts of the plateau and the southeastern Hengduan Mountains (6-10 • C).
The regional distribution of the annual average surface pressure is as follows: the surface pressure in the plateau area is low (below 600 hPa), where the lowest in the central and western parts of the plateau can reach 500 hPa, and the surface pressure in the Qaidam Basin on the north side of the plateau is approximately 700 hPa. The areas with significant discrepancies between the QTP-HRAD and ERA-5 are primarily located along the southern Himalayas and the northern Kunlun Mountains, Altun Mountains, and Qilian Mountains. The difference in surface pressure is between −50 and 50 hPa (Figure 7c). The 2 m specific humidity of the QTP-HRAD is a high value area in the south of the Himalayan Mountains, above 10 g/kg, while the annual average specific humidity in the plateau area is approximately 5 g/kg. The specific humidity of the ERA-5 is lower than that of the QTP-HRAD in the plateau area at approximately 3 g/kg, and the difference value is positive (Figure 7d). The 10 m wind speed of the QTP-HRAD in the eastern part of the plateau is below 3 m/s, and the wind speed in other parts of the plateau is higher. The ERA-5 data only reflect that the wind speed in the central and western regions of the plateau is higher than 3 m/s, whereas the wind speed in the Pamir Plateau and Hengduan Mountains is lower than 1 m/s ( Figure 7e). As there are only a few surface observation stations in the central and western parts of the plateau, the QTP-HRAD and ERA-5 data have large differences in the 10 m wind speed simulation of the Pamir region, and more observational data need to be obtained in the future for verification. The spatial distribution of the total annual precipitation is shown in Figure 7f. Along the Himalayan Mountains, the precipitation on the southern side can reach more than 2000 mm/year, which occurs on the windward slope and Hengduan Mountains in the southern QTP, while the precipitation in the hinterland of the plateau is less than 200 mm/year. There is slightly more precipitation in the eastern part of the plateau, which is between 500 and 1000 mm/year, and the difference between these two datasets is positive in the southeastern part of the plateau, and negative in other areas, indicating that the total annual precipitation simulated by the QTP-HRAD is higher than that of the ERA-5 (Figure 7f).  Figure 7 shows the spatial distribution of the QTP-HRAD and ERA-5. As can be seen from Figure 7, the spatial distribution characteristics of the meteorological elements of the QTP-HRAD are consistent with those of the ERA-5, which reflects the distribution of the 2 m temperature, surface pressure, specific humidity, and 10 m wind speed over the QTP. As the spatial resolution of the QTP-HRAD is 5 × 5 km, the distribution of near-surface meteorological elements is finer than that of the ERA-5, which can better demonstrate the differences caused by large topographic fluctuations. The spatial distribution of the 2 m temperature data is depicted in Figure 7a. In the plateau, the annual average temperature at this height is below 0 °C, with the western part of the plateau having a temperature of around −10 °C, and the eastern part having a temperature of around 0 °C. The 2 m temperature of the ERA-5 is more consistent with the spatial distribution of the QTP-HRAD, but is not as fine as that of the QTP-HRAD. The difference diagram reveals that, with the exception of the central and western regions of the plateau, the difference value is around −4 °C and negative, while the rest of the regions show positive values. The spatial distribution of the surface temperature is comparable to that of the 2 m air temperature. Both the QTP-HRAD and ERA-5 are capable of capturing the extremely low surface temperatures in the western part of the plateau, near the Kara-

Data Records and Availability
The assimilation data in the QTP were generated with a spatial resolution of 5 × 5 km at a temporal resolution of once every hour from 2010 to 2020. The assimilation data include 28 near-surface meteorological variables, including the radiation and energy field variables. The latitude and longitude of the dataset area range from 72.3 • E to 103.2 • E and from 25.9 • N to 40.3 • N. This dataset began at 00:00 (UTC) on 1 January 2010 and ended at 23:00 (UTC) on 31 December 2020, which includes the 2 m temperature (K), land surface temperature (K), relative humidity (%), surface pressure (Pa), 2 m water vapor mixing ratio (kg/kg), 10 m U and V wind components (m/s), 10 m wind speed (m/s), wind direction ( • ), precipitation (mm), and dew point temperature ( • C), are available online (https://doi.org/10.57760/sciencedb.01840, accessed on 31 December 2020).
The structure of the dataset is as follows: <Year-Short_name>.nc. Here, "Year" represents the data of a certain year, and "Short_name" represents the short name of the variable as presented in Table 3. The "Short_Name", "Long_name", "Missing_value", and the "Unit" of the variables are also summarized in Table 3.

Conclusions
The WRF model and its assimilation system can precisely replicate the near-surface meteorological parameters in the QTP region, and the simulation data time series demonstrated good correspondence with the observed data. The simulation results reflect the interannual variations and regional distribution characteristics of meteorological field elements such as temperature, humidity, precipitation, evaporation, air pressure, and wind speed. A comparison between the observations and simulations shows that the simulated values are more consistent with the observations in flat areas, whereas the statistical error is larger in areas with complex terrain conditions. The main conclusions of this study are as follows:

1.
We selected 10 representative weather stations in the QTP region and compared the observed and simulated values of the 2 m temperature, relative humidity, and precipitation for the period 2011-2020. It was found that the simulated surface pressure bias and RMSEs of the stations located in complex mountainous and terrain areas on the plateau were significantly different from the observed values. The bias between the observed and simulated surface pressure values at the Garz, Nyingchi, and Pulan stations were −93.84 hPa, −74.89 hPa, and −62.44 hPa, respectively, and the temperature and relative humidity bias of these sites were also large. However, at the stations with relatively flat terrain on the plateau such as Hotan, Nagqu, and Zoige, the disparities between the simulated and observed surface pressure values were minor, and the simulated values of the 2 m temperature, relative humidity, and wind speed were mostly in agreement with the observed values.

2.
The annual variations in the simulated and observed precipitation are relatively consistent. The simulated values can reflect the peak periods and magnitudes of summer precipitation on the plateau. The daily average evaporation and precipitation at the 10 stations show a significant positive correlation. The daily average evaporation in the summer is about 2-4 mm/d, while in the winter, it tends to be zero. Overall, there is more precipitation and evaporation at the stations in the eastern and southern parts of the QTP, such as Qamdo, Garz, Nyingchi, and Zoige, while the stations in the western and northern arid areas of the plateau, such as Hotan and Tuotuohe, have less precipitation and evaporation.

3.
An error statistical analysis was conducted on the observed and simulated values of the 2 m temperature, surface temperature, surface pressure, relative humidity, and 10 m wind speed at all 91 meteorological stations in the QTP region. It was found that the correlation coefficients of the 2 m temperature, surface temperature, and surface pressure were relatively high, all above 0.9. The correlation coefficient between the observed and simulated values of relative humidity was around 0.7 for most stations on the plateau, with positive bias at the stations in the southern part of the plateau and negative bias at the stations in the northern part. The correlation coefficient between the observed and simulated values of wind speed was relatively low, around 0.5, with positive bias at most stations on the plateau, indicating that the simulated values of the near-surface wind speed were higher than the observed values.

4.
We also compared the QTP-HRAD with ERA-5 on a regional scale and calculated their differences. The spatial distribution patterns of the two datasets for the 2 m temperature and surface temperature are quite consistent, with the QTP-HRAD values slightly lower than the ERA-5 in the southwest region of the plateau. The difference in the surface pressure between the two datasets shows clear positive and negative differences along the plateau boundary. The QTP-HRAD simulated specific humidity is about 2-5 g/kg higher than the ERA-5. The simulated values of the 10 m wind speed near the ground are also higher than the ERA-5, except for the central plateau area, with a difference of 1-3 m/s. The simulated precipitation amount shows the largest difference along the southern side of the plateau, particularly along the Himalayan range, and the annual total simulated precipitation amount by the QTP-HRAD is higher than that of the ERA-5.
When using the QTP-HRAD, it should be noted that the grid data near the Himalayan Mountains in the south, the Hengduan Mountains in the southeast, and the Kunlun Mountains in the northwest of the plateau may be inaccurate, which is the limitation of using this dataset.

Data Availability Statement:
The hourly values of 2 m temperature, land surface temperature, relative humidity, surface pressure, water vapor mixing ratio, 10 m U and V wind components, precipitation, and dew point temperature at 5 × 5 km are available online at https://doi.org/10.577 60/sciencedb.01840 (accessed on 31 December 2020). Due to the extensive amount of work required to validate and compare the dataset with other reanalyzed datasets, this will be our main focus for future research. After publishing this dataset online, we welcome researchers to validate and compare the dataset, and provide us with feedback for further improvement. If there are any questions during the use of the dataset, please contact wenzerg@126.com.