A GRNN-Based Model for ERA5 PWV Adjustment with GNSS Observations Considering Seasonal and Geographic Variations

: Precipitation water vapor (PWV) is an important parameter in numerical weather forecasting and climate research. However, existing PWV adjustment models lack comprehensive consideration of seasonal and geographic factors. This study utilized the General Regression Neural Network (GRNN) algorithm and Global Navigation Satellite System (GNSS) PWV in China to construct and evaluate European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Reanalysis (ERA5) PWV adjustment models for various seasons and subregions based on meteorological parameters (GMPW model) and non-meteorological parameters (GFPW model). A linear model (GLPW model) was established for model accuracy comparison. The results show that: (1) taking GNSS PWV as a reference, the Bias and root mean square error (RMSE) of the GLPW, GFPW, and GMPW models are about 0/1 mm, which better weakens the systematic error of ERA5 PWV. The overall Bias of the GLPW, GFPW, and GMPW models in the Northwest (NWC), North China (NC), Tibetan Plateau (TP), and South China (SC) subregions is approximately 0 mm after adjustment. The adjusted overall RMSE of the GLPW, GFPW, and GMPW models of the four subregions are 0.81/0.71/0.62 mm, 1.15/0.95/0.77 mm, 1.66/1.26/1.05 mm, and 2.11/1.35/0.96 mm, respectively. (2) The accuracy of the three models is tested using GNSS PWV, which is not involved in the modeling. The adjusted overall RMSE of the GLPW, GFPW, and GMPW models in the four subregions are 0.89/0.85/0.83 mm, 1.61/1.58/1.27 mm, 2.11/1.75/1.68 mm and 3.65/2.48/1.79 mm, respectively. As a result, the GFPW and GMPW models have better accuracy in adjusting ERA5 PWV than the linear model GLPW. Therefore, the GFPW and GMPW models can effectively contribute to water vapor monitoring and the integration of multiple PWV datasets.


Introduction
Atmospheric water vapor plays an important role in global atmospheric radiation, water cycle, and energy balance.High-precision water vapor data are helpful in monitoring and forecasting severe weather such as rainstorms, cold currents, typhoons, major droughts, and waterlogging disasters [1][2][3].Besides being an indicator of climate change, water vapor in the atmosphere affects the refractivity of radio signals and thus becomes a major error source in radio-based geodetic techniques [4].
Precipitable water vapor (PWV) is commonly used to describe the content of water vapor in the atmosphere and can be retrieved from observations and models [5,6].With the development of PWV detection technology, the data sources of PWV products are becoming more abundant, such as radiosonde (RS), Global Navigation Satellite System (GNSS), satellite sensors, and atmospheric reanalysis data [7][8][9][10][11][12].While RS sites can provide high-precision PWV data and are commonly used to assess the accuracy of other PWV products, they are limited in their ability to depict the spatiotemporal variation of water vapor due to their high cost, sparse distribution and low temporal resolution [13,14].The GNSS water vapor inversion method proposed by Bevis [15] is widely applied due to its advantages, including all-weather observation, high spatiotemporal resolution, and low cost [16,17].Despite being high-precision point data, both RS and GNSS data require spatial interpolation for their extended application in climate research [18].Remote sensing water vapor inversion can depict the regional distribution of atmospheric water vapor, but it possesses limited temporal resolution and necessitates clear and cloudless atmospheric observation conditions [19].The PWV products of atmospheric reanalysis data have a long time scale and comprehensive spatial coverage, but the accuracy and reliability of the areas with few or missing assimilation data need further improvement [20].
The single detection method cannot provide high-precision water vapor information due to systematic Bias stemming from differences in temporal and spatial resolutions and coverage.Therefore, multisource data fusion is introduced to overcome the above limitations of detection technology, including system Bias adjustment and the optimization of unbiased datasets [21,22].Khaniani et al. [23] found that the deviation between Moderate-Resolution Imaging Spectroradiometer (MODIS) PWV and 38 GNSS-derived PWV in Iran is linear with the site's height and established a highly correlated linear model, which reduces the error of MODIS PWV.Bai et al. [24] evaluated the MODIS PWV and the fifthgeneration European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Reanalysis (ERA5) PWV of 260 Chinese GNSS sites and established a MODIS PWV linear adjustment model considering climate and regional differences.Zhu et al. [25] evaluated the MODIS PWV against GNSS PWV and the ERA5 PWV from 2013 to 2018 in China and established a Chinese regional PWV adjustment model according to the annual and semi-annual characteristics of MODIS PWV and ERA5 PWV deviations at each grid point.Wang, XZ et al. [26] proposed a new empirical PWV grid model called ASV-PWV, which uses the zenith wet delay from the Askne model and is improved by spherical harmonics and vertical adjustment.Alshawaf et al. [27] used the fixed-rank Kriging interpolation method to fuse PWV datasets, including GNSS, Interferometric Synthetic Aperture Radar (In SAR), and Weather Research and Forecasting (WRF) modeling system.Shikhovtsev et al. [28] proposed a PWV correction method that takes the underlying surface into account, using the Precipitation water vapor content decays exponentially with altitude and based on the average altitude of the grid nodes around the location.
Moreover, scholars have studied the PWV adjustment model based on machine learning algorithms due to the inevitable deviation caused by the imperfection of the assumptions behind the interpolation methods [29,30].The quality of MODIS PWV data in China is poor and not suitable for studying subtle variations in PWV.WANG LiLi et al. [31] pointed out that the comprehensive utilization rate and applicability of MODIS products vary significantly across different subregions, with particularly low utilization rates of MODIS data in the Qinghai-Tibet Plateau.Lu et al. [32] constructed the convolutional neural network (CNN) fusion model of MODIS PWV and ERA5 PWV on the west coast of America and claimed that the fused PWV was in good agreement with the GNSS PWV.Taking GNSS as the reference value, Xiong et al. [33] employed three methods-Random Forest (RF), Generalized Regression Neural Network (GRNN), and Back Propagation Neural Network (BPNN)-to construct MODIS PWV adjustment models for the Chinese region separately on both monthly and annual scales.Ma et al. [34] combined the water vapor data from the FengYun-3 meteorological satellite's Moderate-Resolution Spectral Imager (MERSI) with various geospatial data to establish a PWV GRNN model for the San-Jiang-yuan area of China and proposed a daily systematic error adjustment method for reconstructing MERSI PWV based on GNSS PWV.
Currently, the linear models of ERA5 [35] and MODIS [36] are adjusted with GNSS and have experienced a transformation from a single-factor linear model to a multi-factor linear model, but it also requires a large amount of factor data and has low accuracy [37].With the development of artificial intelligence, machine learning has shown strong nonlinear model-building advantages with superior performance [38,39].This study constructs the adjustment models of meteorological and non-meteorological parameters based on the General Regression Neural Network (GRNN) method for the entire China region and four subregions by utilizing PWV values derived from GNSS and ERA5.Moreover, we analyze the correlation between multiple meteorological parameters and PWV values to establish a suitable meteorological adjustment model, considering the complex seasonality and geographic variations of water vapor in China.To test the performance of the model, the 10-fold cross-validation (CV) technique [40] is employed.Bias, standard deviation (STD), root mean square error (RMSE), and correlation coefficient (R) are used as criteria to assess the performance [41].The aim of this study is to provide a more important reference for the construction of regional moisture fusion products in China.

GNSS PWV
The diverse topography and intricate climate of China's land area result in complex variations in PWV.To facilitate the discussion on spatiotemporal PWV variability and the accuracy of the water vapor adjustment model, the China region is divided into four subregions: North China (NC), South China (SC), Tibet Plateau (TP), and Northwest China (NWC).The study area and site distribution are shown in Figure 1.
With the development of artificial intelligence, machine learning has shown strong nonlinear model-building advantages with superior performance [38,39].This study constructs the adjustment models of meteorological and non-meteorological parameters based on the General Regression Neural Network (GRNN) method for the entire China region and four subregions by utilizing PWV values derived from GNSS and ERA5.Moreover, we analyze the correlation between multiple meteorological parameters and PWV values to establish a suitable meteorological adjustment model, considering the complex seasonality and geographic variations of water vapor in China.To test the performance of the model, the 10-fold cross-validation (CV) technique [40] is employed.Bias, standard deviation (STD), root mean square error (RMSE), and correlation coefficient (R) are used as criteria to assess the performance [41].The aim of this study is to provide a more important reference for the construction of regional moisture fusion products in China.

GNSS PWV
The diverse topography and intricate climate of China's land area result in complex variations in PWV.To facilitate the discussion on spatiotemporal PWV variability and the accuracy of the water vapor adjustment model, the China region is divided into four subregions: North China (NC), South China (SC), Tibet Plateau (TP), and Northwest China (NWC).The study area and site distribution are shown in Figure 1.
Adhering to the criterion of the data time series greater than 60%, this study selected hourly observations of 339 GNSS sites from the Crustal Movement Observation Network of China (CMONC) from 2016 to 2018.We selected the MET sites closest to the GNSS site in the China Meteorological Administration (CMA) and used the hourly meteorological parameters of the site to conduct correlation analysis and use them as input factors to build the ERA5 PWV adjustment models.The GNSS zenith total delay (ZTD) is the sum of the zenith hydrostatic delay (ZHD) and the zenith wet delay (ZWD).The ZHD could be obtained by the Saastamoinen model [24].Adhering to the criterion of the data time series greater than 60%, this study selected hourly observations of 339 GNSS sites from the Crustal Movement Observation Network of China (CMONC) from 2016 to 2018.We selected the MET sites closest to the GNSS site in the China Meteorological Administration (CMA) and used the hourly meteorological parameters of the site to conduct correlation analysis and use them as input factors to build the ERA5 PWV adjustment models.
The GNSS zenith total delay (ZTD) is the sum of the zenith hydrostatic delay (ZHD) and the zenith wet delay (ZWD).The ZHD could be obtained by the Saastamoinen model [24].
where P s , φ and h o are the surface pressure (hPa) of the MET site, the latitude (radians), and the elevation (km) of the GNSS site, respectively.
where ρ w is the density of liquid water (1 × 10 3 kg/m 3 ), R v is the water vapor gas constant (461.495J•kg −1 •k −1 ), k ′ 2 and k 3 are the empirical values (k ′ 2 = 22.13 ± 2.20 K/hPa, k 3 = (3.739± 0.012) × 10 5 K 2 /hPa).T m is the atmospheric weighted mean calculated by the Chinese empirical regionalization model, and the additional information about the procedure can be found in Huang et al. [25].
T m (T s , h 0 , φ, DOY)= a 0 +a 1 T s +a 2 h + a 3 φ+b 1 cos DOY 365.25 where T s is the surface temperature of the MET site, DOY is day of the year.a 0 , a 1 , a 2 , a 3 , b 1 , b 2 , b 3 and b 4 are the model coefficients.

ERA5 PWV
ERA5 reanalysis data are ECMWF's fifth-generation global climate reanalysis data set and its latest generation of atmospheric reanalysis products [42].ERA5 is generated using the 4D-Var data assimilation scheme in the CY41R2 model of ECMWF's comprehensive forecasting system, with high spatial and temporal resolution.Its time resolution is 1 h, and its horizontal spatial resolution is 0.25 • × 0.25 • (longitude × latitude, about 31 km) [43][44][45].Using GNSS data as the reference value for evaluating ERA5 PWV, the results show that the correlation coefficient, Bias, and RMSE in China are 0.99 mm, 0.38 mm, and 1.99 mm, respectively [46].When applied to South China, the average performance makes a big difference.In addition, we vertically correct and horizontally interpolate the PWV of the four closest ERA5 grid points around the GNSS site [47].

RS PWV
The RS dataset, collected by radiosondes daily at 00:00 and 12:00 UTC, includes vertical atmospheric data.Each radiosonde site mentions stratified meteorological parameters such as air pressure, temperature, and relative humidity from the ground to about 30 km above the Earth's surface [48,49].In addition, the meteorological data of each pressure layer directly provided by the radiosonde are used to obtain the PWV data of the site surface through integration and summation.In this study, we used the PWV of the RS stratification in 2018 as the reference value to evaluate the accuracy of the vertical adjustment models.RS profiles can be accessed from the upper air archives on the University of Wyoming (UW) website [50][51][52] where P i and P i+1 are the pressure (Pa) at the lower and upper layers, respectively.g is the acceleration of gravity, e is the vapor pressure (Pa), and T d is dew point temperature ( • C).
where PWV t and PWV r are the PWV value (mm) at different heights (km) h t and h r , respectively.r is PWV lapse rate (mm/km), which is the key factor for vertical water vapor adjustment.β 0 , β 1 , β 2 , β 3 and β 4 are the model coefficients.
The distribution of water vapor is significantly different in the vertical direction, following an exponential decay trend with increasing height [53].The PWV vertical adjustment model is crucial to the study of water vapor.Therefore, it is necessary to select the PWV vertical adjustment model to ensure data accuracy before constructing the water vapor adjustment model [54].We established water vapor adjustment models for the entire subregion and subregions (CT-PWV) based on the PWV profiles of 88 RS sites from 2012 to 2017.Furthermore, RS PWV profiles for 2018 were used as the true values, while the model CT-PWV, the traditional adjustment model (E-PWV) with a constant lapse rate of −0.5 mm/km [46], and the adjustment model C-PWV [55] built on the ERA5 dataset were used as reference values.The coefficients of the CT-PWV model, fitted using the least squares method, and the accuracy statistics are shown in Tables 1 and 2, respectively.As shown in Table 2, the mean Bias and mean RMSE of the E-PWV, C-PWV, and CT-PWV models in China are −3.32/5.4mm, −0.26/1.59mm and 0.30/1.74mm, respectively.For Bias, the C-PWV model shows a small Bias fluctuation range, and its adjustment accuracy in China and four subregions is more stable compared to the E-PWV and CT-PWV models.For RMSE, the C-PWV model shows better adjustment accuracy in the NWC, NC, and SC subregions compared with the other two models, while the CT-PWV model shows the best adjustment accuracy in the TP subregion.In the SC and NC subregions, the range of the maximum value of RMSE after vertical adjustment by the C-PWV and CT-PWV models is 3-4 mm, while the maximum RMSE by the E-PWV model is 11.89 mm.Therefore, this study selected the C-PWV model to perform spatial adjustment of PWV.

Generalized Regression Neural Network
The General Regression Neural Network (GRNN) method shows good accuracy in building water vapor adjustment models [33,56].GRNN is a regression neural network designed for nonlinear mapping and pattern recognition tasks proposed by Donald F. Specht in 1991 [57].The algorithm comprehends intricate nonlinear relationships inherent in complex datasets and excels in capturing the underlying patterns that may elude traditional linear models.GRNN employs a radial basis function to model the input-output relationships, with a unique smoothing parameter that controls the spread of the radial basis functions [58].Moreover, GRNN demonstrates rapid overall convergence characteristics as it operates as a feedforward neural network, eliminating the need for the iterative process employed in backpropagation networks.These advantages have made GRNN a powerful tool for regression, approximation, fitting, and prediction.
GRNN consists of four layers: input layer, pattern layer, summation layer, and output layer.The input layer consists of p neuron outputs, which are consistent with the dimensions of the input support x and are continuous with the pattern layer.The pattern layer has n nodes, the number of nodes is equal to the number of training samples, and the value of the Gaussian function of each sample is calculated.The output of the pattern single layer is sent to the summation layer.The summation layer calculates the sum of Y I K X, X i and the sum of K X, X i , both of which are sent to the output layer.The number of neurons in the output layer is equal to the dimension supported by the output in the learning sample.The output layer receives the output of the summation layer and estimates Ŷ(X): In reality, the PDF f(x, y) is unknown and can be obtained through non-parametric estimation based on observed samples of x and y. f(X, Y) can be estimated by: where n is the number of sample observations, p is the dimensionality of the random variable x, σ is spread parameter which is the only unknown parameter in GRNN.X i and Y i represent the observed values of the random variables X and Y, respectively.Then, define a scalar function D 2 i and a Gaussian kernel: By combining Equations ( 9)-( 12), we will obtain: The GRNN algorithm consists of four layers: an input layer, a pattern layer, a summation layer, and an output layer.

Correlation Analysis and Parameter Selection
The variations in PWV are influenced by geographical factors, such as latitude, sea and land distribution, and altitude, exhibiting distinct temporal characteristics, including seasonality and periodic changes [59].Therefore, we first used longitude, latitude, altitude, Day of Year (DOY), and Hour of Day (HOD) as basic model-influencing factors.
In the context of the atmospheric moisture cycle, the transportation of water vapor resulting from rainfall, evaporation, and atmospheric circulation contributes to fluctuations in PWV.The atmospheric precipitation and evaporation processes are influenced by four fundamental meteorological elements: pressure, temperature, humidity, and wind.In this study, we employed the Spearman [60,61]

Model Construction
Utilizing the GRNN algorithm [64], we selected 80% of the 339 GNSS sites to construct the ERA5 PWV adjustment models: the linear model (GLPW), the foundational PWV adjustment model excluding meteorological factors (GFPW model), and the PWV adjustment model integrating multisource parameters (GMPW model) [65,66].We used the remaining 20% of GNSS site data to test the accuracy of these ERA5 PWV adjustment models.The architecture of GFPW and GMPW models are shown in Figure 3.The input layer of the GFPW model consists of six neurons: latitude, longitude, altitude, DOY, HOD, and ERA5 PWV.The input layer of the GMPW model includes nine neurons representing latitude, longitude, altitude, DOY, HOD, ERA5 PWV, P, T, and RH.The output layer of the two models consists of one neuron: GNSS PWV.Correlation between parameters is evaluated using the standard R-absolute value, falling into 0.7-1 for strong correlation, 0.4-0.7 for moderate correlation, and 0-0.4 defined as weak correlation.In Figure 2, the correlation coefficient between GNSS PWV and ERA5 PWV is 0.99, indicating that the two datasets are highly correlated and ensure data quality for training GRNN models.The correlation coefficients between the ERA5 PWV, GNSS PWV data sets, and air pressure are 0.76/0.75, the correlation coefficients with temperature are 0.44/0.43,and the correlation coefficients with precipitation are 0.38/0.39.The correlation coefficient between the PWV data set and the two humidities is 0.68.Higher values of relative humidity are robustly associated with increased atmospheric water vapor content, explaining the correlation between the meteorological parameters and PWV.The complex atmospheric thermodynamic circulation mechanism and heavy rainfall affect the dramatic changes in atmospheric water vapor, which may be the main reason for the weak correlation between the PWV and precipitation [62,63].Therefore, we introduced three meteorological elements: pressure, temperature, and relative humidity, to participate in constructing the PWV adjustment model.

Model Construction
Utilizing the GRNN algorithm [64], we selected 80% of the 339 GNSS sites to construct the ERA5 PWV adjustment models: the linear model (GLPW), the foundational PWV adjustment model excluding meteorological factors (GFPW model), and the PWV adjustment model integrating multisource parameters (GMPW model) [65,66].We used the remaining 20% of GNSS site data to test the accuracy of these ERA5 PWV adjustment models.The architecture of GFPW and GMPW models are shown in Figure 3.The input layer of the GFPW model consists of six neurons: latitude, longitude, altitude, DOY, HOD, and ERA5 PWV.The input layer of the GMPW model includes nine neurons representing latitude, longitude, altitude, DOY, HOD, ERA5 PWV, P, T, and RH.The output layer of the two models consists of one neuron: GNSS PWV.

Model Construction
Utilizing the GRNN algorithm [64], we selected 80% of the 339 GNSS sites to construct the ERA5 PWV adjustment models: the linear model (GLPW), the foundational PWV adjustment model excluding meteorological factors (GFPW model), and the PWV adjustment model integrating multisource parameters (GMPW model) [65,66].We used the remaining 20% of GNSS site data to test the accuracy of these ERA5 PWV adjustment models.The architecture of GFPW and GMPW models are shown in Figure 3.The input layer of the GFPW model consists of six neurons: latitude, longitude, altitude, DOY, HOD, and ERA5 PWV.The input layer of the GMPW model includes nine neurons representing latitude, longitude, altitude, DOY, HOD, ERA5 PWV, P, T, and RH.The output layer of the two models consists of one neuron: GNSS PWV.Before constructing the PWV adjustment model using the GRNN algorithm, it is essential to perform registration and quality control between different datasets.We employed a threshold based on three times the standard deviation of the Bias between GNSS PWV and ERA5 PWV to eliminate associated outliers and obtain the GNSS-ERA5 data pair.In addition, the input vectors used for model construction are normalized [−1,1] to eliminate the influence of different units and values on modeling results.Moreover, constructing the PWV adjustment model based on GRNN requires determining the optimal spread parameter σ that significantly influences the model's performance.Building upon previous studies, we specified the range of σ from 0.01 to 0.1 with steps of 0.01 and determined the optimal value of σ using the smallest RMS generated from a 10-fold cross-validation [67,68].The variation of optimal values of σ and RMSE during 10-fold cross-validation for the GFPW model in the spring NWC subregion is shown in Figure 4.
As shown in Figure 4, the RMSE initially decreases and then increases with the augmentation of σ, reaching its minimum at σ = 0.02.Consequently, σ = 0.02 is chosen to construct the optimal PWV adjustment model based on GRNN.Given the regional disparities and seasonal fluctuations in PWV, models GFPW and GMPW were constructed and compared, accounting for diverse geographical subregions and varying seasons.The four seasons are defined as follows: spring is from March to May, summer is from June to August, fall is from September to November, and winter is from December to February.The optimal spread parameters σ and their corresponding accuracies for models GFPW and GMPW in four subregions and four seasons are presented in Tables 3 and 4, respectively.
pair.In addition, the input vectors used for model construction are normalized [−1,1] to eliminate the influence of different units and values on modeling results.Moreover, constructing the PWV adjustment model based on GRNN requires determining the optimal spread parameter  that significantly influences the model's performance.Building upon previous studies, we specified the range of  from 0.01 to 0.1 with steps of 0.01 and determined the optimal value of  using the smallest RMS generated from a 10- fold cross-validation [67,68].The variation of optimal values of  and RMSE during 10- fold cross-validation for the GFPW model in the spring NWC subregion is shown in Figure 4.As shown in Figure 4, the RMSE initially decreases and then increases with the augmentation of  , reaching its minimum at   0.02 .Consequently,   0.02 is chosen to construct the optimal PWV adjustment model based on GRNN.Given the regional disparities and seasonal fluctuations in PWV, models GFPW and GMPW were constructed and compared, accounting for diverse geographical subregions and varying seasons.The four seasons are defined as follows: spring is from March to May, summer is from June to August, fall is from September to November, and winter is from December to February.The optimal spread parameters  and their corresponding accuracies for models GFPW and GMPW in four subregions and four seasons are presented in Table 3 and Table 4, respectively.To better compare the results with those of other similar studies [69], we also established a classic ERA5 PWV linear adjustment model (GLPW), the model formula, and the model factors a and b in Table 5: where a and b are the model coefficients.

Overall Accuracy
Take GNSS PWV as the reference, the Bias, MAE, and RMSE [70] between ERA5 PWV adjusted by models GLPW, GFPW, and GMPW and the unadjusted ERA5 PWV are compiled in Table 6.The variations of these metrics across different subregions and four seasons are depicted in Figures 5 and 6.Table 6.The accuracy statistics between the unadjusted ERA5 PWV and ERA5 PWV adjusted by GLPW, GFPW, and GMPW models using GNSS PWV as the reference (The unit is mm, the asterisk (*) represents '×10 −3 ', and the UA represents 'Unadjusted').As shown in Table 6, the Bias adjusted by the models GLPW, GFPW, and GMPW are basically close to 0 mm, which shows that the systematic differences between ERA5 PWV and GNSS PWV are basically eliminated.For MAE, the ranges of the GLPW model in the NWC, TP, NC, and SC subregions are 0.68-1.71mm, 1.07-2.02mm, 0.74-2.00mm, and 1.35-2.28mm, respectively.For MAE, the ranges of the GFPW model in the NWC, TP, NC, and SC subregions are 0.41-0.72 mm, 0.60-0.86mm, 0.55-1.70mm, and 0.75-1.03mm, respectively, while the value range of the GMPW model is 0.02-0.49mm, 0.03-0.64mm, 0.18-1.33mm and 0.10-0.55mm.Compared with unadjusted ERA5 PWV, the adjusted MAE of GLPW, GFPW, and GMPW models are reduced, and the GMPW model performs better.
As shown in Figure 5, the adjusted Bias values of the GLPW, GFPW, and GMPW models decrease significantly in each season, indicating that the systematic differences between ERA5 PWV and GNSS PWV have been effectively eliminated.Therefore, the unadjusted ERA5 PWV exhibits significant negative Biases in spring, summer, and fall in the TP and SC subregions.The Biases of the adjusted GFPW and GMPW models generally show positive Biases in the four subregions and four seasons.The Bias of the GMPW model is lower than that of the GFPW model, indicating that the stability of the GMPW model is better.This also shows that the accuracy of ERA5 PWV before adjustment varies in different subregions in each season and is unevenly distributed.As shown in Table 6, the Bias adjusted by the models GLPW, GFPW, and GMPW are basically close to 0 mm, which shows that the systematic differences between ERA5 PWV and GNSS PWV are basically eliminated.For MAE, the ranges of the GLPW model in the NWC, TP, NC, and SC subregions are 0.68-1.71mm, 1.07-2.02mm, 0.74-2.00mm, and 1.35-2.28mm, respectively.For MAE, the ranges of the GFPW model in the NWC, TP, NC, and SC subregions are 0.41-0.72 mm, 0.60-0.86mm, 0.55-1.70mm, and 0.75-1.03mm, respectively, while the value range of the GMPW model is 0.02-0.49mm, 0.03-0.64mm, 0.18-1.33mm and 0.10-0.55mm.Compared with unadjusted ERA5 PWV, the adjusted MAE of GLPW, GFPW, and GMPW models are reduced, and the GMPW model performs better.Table 6.The accuracy statistics between the unadjusted ERA5 PWV and ERA5 PWV adjusted by GLPW, GFPW, and GMPW models using GNSS PWV as the reference.(The unit is mm, the asterisk (*) represents '×10 −3 ', and the UA represents 'Unadjusted'.).As shown in Figure 6, the RMSE of unadjusted ERA5 PWV in the NWC subregion compared with GNSS PWV and the RMSE after adjustment by the GLPW model across four seasons is 1.43/0.83mm (spring), 2.09/1.07mm (summer), 1.40/0.84mm (fall) and 0.92/0.53mm (winter), respectively.Compared to the unadjusted RMSE, the RMSE of GFPW and GMPW models in the corresponding seasons decreased by 0.72/0.86mm (50.34%/60.13%),1.11/1.26mm (53.11%/60.28%),0.75/0.76mm (53.57%/54.28%)and 0.44/0.46mm (47.82%/50.00%),respectively.The optimization performance of the GMPW model is commensurate in four seasons, and the accuracy optimization of the GFPW model is more pronounced in summer and winter.The RMSE improvement of the GMPW model in the four seasons is superior to that of the GFPW model by 0.14 mm (9.79%), 0.15 mm (7.17%), 0.01 mm (0.71%) and 0.02 mm (2.17%), correspondingly.This indicates that the accuracy of the GMPW model is better in spring and summer, and the accuracy of the two models is equivalent in fall and winter.

Subregion
Winter 0.12 0.08  As shown in Figure 5, the adjusted Bias values of the GLPW, GFPW, and GMPW models decrease significantly in each season, indicating that the systematic differences between ERA5 PWV and GNSS PWV have been effectively eliminated.Therefore, the unadjusted ERA5 PWV exhibits significant negative Biases in spring, summer, and fall in the TP and SC subregions.The Biases of the adjusted GFPW and GMPW models generally show positive Biases in the four subregions and four seasons.The Bias of the GMPW model is lower than that of the GFPW model, indicating that the stability of the GMPW model is better.This also shows that the accuracy of ERA5 PWV before adjustment varies in different subregions in each season and is unevenly distributed.
The results indicate that the accuracy of ERA5 PWV has been significantly improved after model adjustment, with the systematic differences between ERA5 PWV and GNSS PWV almost disappearing.Moreover, the GMPW model outperforms the GFPW model overall, demonstrating a significant improvement in RMSE accuracy during the summer and a slight enhancement in the fall and winter.The larger PWV errors in summer might result from higher water vapor content, while the reduced errors in fall and winter could stem from less variability in meteorological parameters and water vapor.Thus, PWV adjustment models that account for seasonal differences exhibit better performance and contribute to the production of high-quality PWV products in China.

Spatiotemporal Properties Analysis
To evaluate the temporal and spatial characteristics of the optimization performance of the PWV adjustment model, we applied seasonal PWV subregion adjustment models to adjust the corresponding ERA5 PWV.Figures 7 and 8 show the distributions of Bias and RMSE between ERA5 PWV and GNSS PWV before and after optimization.The Bias and RMSE between unadjusted ERA5 PWV and GNSS PWV are notably high, especially in the southeastern coastal subregions.The adjusted Bias is nearly zero, and the overall RMSE has significantly decreased, suggesting that the PWV adjustment model effectively reduces systematic errors.
As shown in Figure 7, the unadjusted ERA5 PWV in the NC subregion shows a pronounced positive Bias in spring and winter while exhibiting a significant negative Bias in the other subregions.This indicates that the accuracy of unadjusted ERA5 PWV at different subregional sites varies and exhibits notable regional distribution characteristics.Moreover, the Bias in the adjusted ERA5 PWV has significantly decreased across all seasons, indicating that the spatial variability and land-sea variability of PWV have improved from high latitudes to low latitudes and from coastal to inland areas.Therefore, it can be concluded that the GFPW and GMPW models can effectively adjust the Bias between ERA5 PWV and GNSS PWV.
As shown in Figure 8, the RMSE after adjustment by the GLPW model is basically consistent with that before adjustment.This shows that the linear model is not very suitable for ERA5PWV adjustment in China.After adjustment by GFPW and GMPW, the seasonal RMSE of ERA5 PWV in each subregion is significantly reduced to a range of 0-2 mm, with an especially notable improvement in the SC subregion.It can be seen from Figure 7 that the adjustment effect of the GMPW model on the NC subregion in summer is lower than the adjustment accuracy of the GFPW model.This may be because the summer climate changes are complex, and the GMPW model does not apply the ERA5 PWV adjustment for this site.The variations in PWV across different subregions are associated with geographical conditions, typically exhibiting a significant decrease from the southeastern coastal areas to the northwestern inland subregions, with the highest values observed in the southeastern coastal areas.
Furthermore, the GMPW model effectively adjusted the PWV differences between coastal and inland areas in the SC subregion, resulting in a more consistent RMSE distribution of ERA5 PWV across these areas compared to the unadjusted ERA5 PWV.The RMSE of ERA5 PWV adjusted by the GFPW model in the southwestern coastal subregion shows a significant difference compared to the inland areas, especially in three seasons other than winter.
RMSE has significantly decreased, suggesting that the PWV adjustment model effectively reduces systematic errors.As shown in Figure 7, the unadjusted ERA5 PWV in the NC subregion shows a pronounced positive Bias in spring and winter while exhibiting a significant negative Bias in the other subregions.This indicates that the accuracy of unadjusted ERA5 PWV at different subregional sites varies and exhibits notable regional distribution characteristics.Moreover, the Bias in the adjusted ERA5 PWV has significantly decreased across all seasons, indicating that the spatial variability and land-sea variability of PWV have improved from high latitudes to low latitudes and from coastal to inland areas.Therefore, it can be concluded that the GFPW and GMPW models can effectively adjust the Bias between ERA5 PWV and GNSS PWV.
As shown in Figure 8, the RMSE after adjustment by the GLPW model is basically consistent with that before adjustment.This shows that the linear model is not very suitable for ERA5PWV adjustment in China.After adjustment by GFPW and GMPW, the seasonal RMSE of ERA5 PWV in each subregion is significantly reduced to a range of 0-2 mm, with an especially notable improvement in the SC subregion.It can be seen from Figure 7 that the adjustment effect of the GMPW model on the NC subregion in summer is lower than the adjustment accuracy of the GFPW model.This may be because the summer climate changes are complex, and the GMPW model does not apply the ERA5 PWV adjustment for this site.The variations in PWV across different subregions are associated The Bias and RMSE of unadjusted ERA5 PWV across the entire area range from −1.42 to 0.31 mm and 0.92 to 2.86 mm, respectively.After adjustment by the GLPW model, the RMSE distribution of ERA5 PWV in each subregion is generally consistent, ranging from 0.83 to 2.15 mm.For the GFPW model, the RMSE of adjusted ERA5 PWV in the NC, TP, and NWC subregions is concentrated within the range of 0.41 mm to 1.74 mm.After adjustment by the GMPW model, the RMSE distribution of ERA5 PWV in each subregion is generally consistent, ranging from 0.14 to 1.17 mm.In the SC subregion, following adjustment by the GFPW model, the RMSE values differ significantly between the southwest coastal area and the inland sites, with differences ranging from 1 to 2 mm.Compared to the unadjusted ERA5 PWV in the SC subregion, the RMSE differences for GFPW and GMPW models in spring, summer, fall, and winter are 1.34/0.78,1.42/0.83,1.48/0.85,and 1.19/0.06mm, respectively.This may be related to the obvious climate change in the SC subregion and the more obvious water vapor fluctuations in the three seasons except winter [71].
In conclusion, the GLPW model has the worst accuracy in adjusting ERA5 PWV.Both the GFPW and GMPW adjustment models effectively address the accuracy differences between ERA5 PWV and GNSS PWV, demonstrating robust accuracy and applicability.The GMPW model notably enhances the accuracy disparities between land and sea in ERA5 PWV and exhibits superior applicability in the SC subregion.
To evaluate the PWV adjustment model more comprehensively, we randomly selected 70 sites that did not participate in the modeling and applied seasonal PWV subregion adjustment models to adjust the corresponding ERA5 PWV.Taking the GNSS PWV as the reference, the Bias, MAE, and RMSE between the adjusted ERA5 PWV and the unadjusted ERA5 PWV of the GLPW, GFPW, and GMPW models are summarized in Table 7.
with geographical conditions, typically exhibiting a significant decrease from the southeastern coastal areas to the northwestern inland subregions, with the highest values observed in the southeastern coastal areas.Furthermore, the GMPW model effectively adjusted the PWV differences between coastal and inland areas in the SC subregion, resulting in a more consistent RMSE distribution of ERA5 PWV across these areas compared to the unadjusted ERA5 PWV.The RMSE of ERA5 PWV adjusted by the GFPW model in the southwestern coastal subregion shows a significant difference compared to the inland areas, especially in three seasons other than winter.
The Bias and RMSE of unadjusted ERA5 PWV across the entire area range from −1.42 to 0.31 mm and 0.92 to 2.86 mm, respectively.After adjustment by the GLPW model, the RMSE distribution of ERA5 PWV in each subregion is generally consistent, ranging from 0.83 to 2.15 mm.For the GFPW model, the RMSE of adjusted ERA5 PWV in the NC, TP, and NWC subregions is concentrated within the range of 0.41 mm to 1.74 mm.After adjustment by the GMPW model, the RMSE distribution of ERA5 PWV in each subregion is generally consistent, ranging from 0.14 to 1.17 mm.In the SC subregion, following adjustment by the GFPW model, the RMSE values differ significantly between the southwest coastal area and the inland sites, with differences ranging from 1 to 2 mm.Compared to the unadjusted ERA5 PWV in the SC subregion, the RMSE differences for GFPW and GMPW models in spring, summer, fall, and winter are 1.34/0.78,1.42/0.83,1.48/0.85,and Table 7.The external accuracy statistics between the unadjusted ERA5 PWV and ERA5 PWV adjusted by GLPW, GFPW, and GMPW models using GNSS PWV as the reference (The unit is mm, and the UA represents "Unadjusted").As shown in Table 7, the Bias adjusted by the models GFPW and GMPW are basically close to 0 mm, which shows that the systematic differences between ERA5 PWV and GNSS PWV are basically eliminated.The Bias adjusted by model GLPW is slightly lower than before unadjusted, but there is still a certain Bias, which shows that the GLPW model is less effective in adjusting ERA5 PWV.For MAE, the ranges of the GLPW model in the NWC, TP, NC, and SC subregions are 0.71-2.54mm, 1.29-2.39mm, 1.02-3.36mm, and 1.76-3.95mm, respectively.For MAE, the ranges of the GFPW model in the NWC, TP, NC, and SC subregions are 0.68-1.82mm, 1.24-2.23 mm, 0.76-2.05mm, and 1.47-2.60mm, respectively, while the value range of the GMPW model is 0.65-1.75mm, 0.96-1.89mm, 0.74-2.00mm, 1.43-2.28mm.Compared with unadjusted PWV, the adjusted MAE of GLPW, GFPW, and GMPW models are reduced, and the GMPW model performs better.
As shown in Figures 9 and 10, the Bias and RMSE spatial distribution between ERA5 PWV and GNSS PWV before and after adjustment.In Figure 8, the Bias after adjustment of the GLPW model is basically the same as before the adjustment, which shows that the adjustment accuracy of the linear model is poor and is not suitable for ERA5 PWV water vapor adjustment.In addition, the Bias of the ERA5 PWV corrected by the GFPW and GMPW models is significantly reduced in each subregion, and the GMPW model adjustment results are better.After adjustment by GFPW and GMPW, the seasonal Bias of each partition of ERA5 PWV was significantly reduced to approximately 0, with the NC subregion improving particularly significantly.This shows that both the GFPW and GMPW models can adjust the ERA5 PWV of external sites well.As shown in Figures 9 and 10, the Bias and RMSE spatial distribution between ERA5 PWV and GNSS PWV before and after adjustment.In Figure 8, the Bias after adjustment of the GLPW model is basically the same as before the adjustment, which shows that the adjustment accuracy of the linear model is poor and is not suitable for ERA5 PWV water vapor adjustment.In addition, the Bias of the ERA5 PWV corrected by the GFPW and GMPW models is significantly reduced in each subregion, and the GMPW model adjustment results are better.After adjustment by GFPW and GMPW, the seasonal Bias of each partition of ERA5 PWV was significantly reduced to approximately 0, with the NC subregion improving particularly significantly.This shows that both the GFPW and GMPW models can adjust the ERA5 PWV of external sites well.As shown in Figure 10, the RMSE of ERA5 PWV at the external site of the GLPW model is basically the same after adjustment as before adjustment, which indicates that the linear model is not suitable for PWV adjustment in China.After adjustment by the GFPW and GMPW models, the seasonal RMSE in each subregion of the ERA5 PWV was As shown in Figure 10, the RMSE of ERA5 PWV at the external site of the GLPW model is basically the same after adjustment as before adjustment, which indicates that the linear model is not suitable for PWV adjustment in China.After adjustment by the GFPW and GMPW models, the seasonal RMSE in each subregion of the ERA5 PWV was significantly reduced to the range of 1-2 mm, with the NC subregion improving particularly significantly in fall and winter.In spring and summer, the adjusted RMSE of the GLPW model is around 2.5 mm, and the adjusted RMSE of the GFPW and GMPW models is around 2.0 mm.In addition, in the summer of the GFPW model, the RMSE value of the QHTR site on the west side of the TP subregion is very large, about 5 mm.This may be because the site is surrounded by mountains near its location, resulting in complex precipitation there [72].Therefore, the adjusted RMSE of ERA5 PWV is larger.In conclusion, the GFPW and GMPW models are more suitable for China ERA5 PWV adjustment and have higher accuracy than the GLPW model.because the site is surrounded by high mountains near its location, resulting in complex precipitation there [72].Therefore, the adjusted RMSE of ERA5 PWV is larger.In conclusion, the GFPW and GMPW models are more suitable for China ERA5 PWV adjustment and have higher accuracy than the GLPW model.

Conclusions
Machine learning algorithms proficiently discern the intricate nonlinear associations inherent in various PWV datasets, facilitating the enhancement of PWV datasets and the production of PWV products characterized by high precision and resolution.Current PWV adjustment models inadequately address the comprehensive assessment of error distribution patterns among diverse PWV datasets and the spatiotemporal applicability of adjustment models.Therefore, this study constructs PWV adjustment models based on meteorological parameters (GMPW model) and non-meteorological parameters (GFPW model) by introducing the GRNN algorithm.A linear model GLPW was established to compare model accuracy.Through model comparison and accuracy verification, the RMSE of seasonal ERA5 PWV in each subregion was significantly reduced, and the accuracy before and after the overall adjustment was about 0-4 mm and 0-2 mm.Adjusted

Conclusions
Machine learning algorithms proficiently discern the intricate nonlinear associations inherent in various PWV datasets, facilitating the enhancement of PWV datasets and the production of PWV products characterized by high precision and resolution.Current PWV adjustment models inadequately address the comprehensive assessment of error distribution patterns among diverse PWV datasets and the spatiotemporal applicability of adjustment models.Therefore, this study constructs PWV adjustment models based on meteorological parameters (GMPW model) and non-meteorological parameters (GFPW model) by introducing the GRNN algorithm.A linear model GLPW was established to compare model accuracy.Through model comparison and accuracy verification, the RMSE of seasonal ERA5 PWV in each subregion was significantly reduced, and the accuracy before and after the overall adjustment was about 0-4 mm and 0-2 mm.Adjusted overall ERA5 PWV Bias in four subregions of GLPW, GFPW, and GMPW models floats at 0 mm.
In conclusion, considering the seasonal and geographical characteristics of water vapor in China, the PWV adjustment model based on the GRNN algorithm can effectively reduce the systematic Bias in ERA5 PWV data and exhibits good accuracy and reliability.Compared with unadjusted ERA5 PWV and ERA5 PWV adjusted by models GLPW and GFPW, the GMPW model has superior spatial stability and can significantly improve the difference between coastal and inland ERA5 PWV and GNSS PWV.GFPW and GMPW models can generate PWV products with high spatial and temporal resolution, providing a reference for GNSS-ERA5 PWV fusion research.Future endeavors will explore different ERA5-PWV adjustment models and then use high-precision adjustment models for PWV fusion and water vapor monitoring.

Figure 1 .
Figure 1.Distribution of 339 GNSS and MET sites and four subregions in China from 2016 to 2018.

Figure 1 .
Figure 1.Distribution of 339 GNSS and MET sites and four subregions in China from 2016 to 2018.
correlation coefficient as a metric to investigate the correlations among 2016-2018 China's hourly GNSS PWV, ERA5 PWV, and four highly associated meteorological elements at meteorological sites: temperature (T), pressure (P), relative humidity (RH) and precipitation (PRE).The correlation values of the above six factors are shown in Figure 2. Remote Sens. 2024, 16, x FOR PEER REVIEW 8 of 22 meteorological elements: pressure, temperature, and relative humidity, to participate in constructing the PWV adjustment model.

Figure 2 .
Figure 2. Cross-correlations among the GNSS PWV, ERA5 PWV, and multisource meteorological parameters of meteorological sites from 2016 to 2018 in China.

Figure 2 .
Figure 2. Cross-correlations among the GNSS PWV, ERA5 PWV, and multisource meteorological parameters of meteorological sites from 2016 to 2018 in China.

Figure 2 .
Figure 2. Cross-correlations among the GNSS PWV, ERA5 PWV, and multisource meteorological parameters of meteorological sites from 2016 to 2018 in China.

Figure 4 .
Figure 4. GFPW Model for the NWC Subregion in Spring: changes in RMSE values  under dif- ferent distribution parameters generated by 10-fold cross-validation.

Figure 4 .
Figure 4. GFPW Model for the NWC Subregion in Spring: changes in RMSE values σ under different distribution parameters generated by 10-fold cross-validation.

Figure 5 .
Figure 5.The mean Bias between different models in different seasons and subregions from 2016 to 2018 (Q1 and Q3 of the box represent the first and third quartiles, respectively; the distance of Q1 and Q3 reflects the degree of fluctuation of data; Q2 is the median value, which reflects the average level of data; Q4 represents the outlier).

Figure 5 .
Figure 5.The mean Bias between different models in different seasons and subregions from 2016 to 2018 (Q1 and Q3 of the box represent the first and third quartiles, respectively; the distance of Q1 and Q3 reflects the degree of fluctuation of data; Q2 is the median value, which reflects the average level of data; Q4 represents the outlier).

Figure 6 .
Figure 6.The mean RMSE between different models in different seasons and subregions from 2016 to 2018 (Q1 and Q3 of the box represent the first and third quartiles, respectively; the distance of Q1 and Q3 reflects the degree of fluctuation of data; Q2 is the median value, which reflects the average level of data; Q4 represents the outlier).

Figure 6 .
Figure 6.The mean RMSE between different models in different seasons and subregions from 2016 to 2018 (Q1 and Q3 of the box represent the first and third quartiles, respectively; the distance of Q1 and Q3 reflects the degree of fluctuation of data; Q2 is the median value, which reflects the average level of data; Q4 represents the outlier).

Figure 7 .
Figure 7. Site distribution map of Bias between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 7 .
Figure 7. Site distribution map of Bias between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 8 .
Figure 8. Site distribution map of RMSE between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 8 .
Figure 8. Site distribution map of RMSE between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 9 .
Figure 9. External sites distribution map of Bias between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 9 .
Figure 9. External sites distribution map of Bias between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 10 .
Figure 10.External sites distribution map of RMSE between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models).

Figure 10 .
Figure 10.External sites distribution map of RMSE between ERA5 PWV and GNSS PWV before and after adjustment (UA is the unadjusted result, GLPW, GFPW, and GMPW are different adjustment models). .

Table 1 .
Coefficients of vertical adjustment model CT-PWV.

Table 3 .
The optimal spread parameters  of models GMPW and GFPW in various subregions and four seasons.

Table 3 .
The optimal spread parameters σ of models GMPW and GFPW in various subregions and four seasons.

Table 4 .
The cross-validation (CV) accuracy of models GMPW and GFPW in various subregions and four seasons (The unit is mm, and the asterisk (*) represents '×10 −3 ').

Table 5 .
The model coefficients (a and b) of models GLPW in various subregions and four seasons.