Intercomparison of Downscaling Techniques for Satellite Soil Moisture Products

Center for Built Environment, Sungkyunkwan University, Suwon, Gyeonggi-do, Republic of Korea Institute for Atmospheric and Climate Science, ETH Zürich, Zürich, Switzerland School of Earth, Ocean and the Environment, University of South Carolina, Columbia, SC, USA School of Urban and Environmental Engineering, UNIST, Ulsan, Republic of Korea Graduate School of Water Resources, Sungkyunkwan University, Suwon, Gyeonggi-do, Republic of Korea


Introduction
Remotely sensed soil moisture (SM) offers increased spatial coverage and improved temporal continuity and has thus resulted in substantial changes in our understanding of the global water cycle [1,2].Nevertheless, the relatively large spatial resolution of approximately 10 km for passive/active microwave satellite remote sensing datasets is the main reason they cannot be effectively applied to hydrological studies at a regional scale [3].e issue of scale mismatch between remotely sensed and in situ SM has also been considered unavoidable and has been critically evaluated using coarse satellite measurements, particularly in areas with nonhomogeneous land cover [4].us, downscaling techniques that focus on the spatial resolution of remotely sensed SM are important to match with an in situ dataset and enable practical applications.
is approach is based on the relationship of SM between the land surface temperature (T s ) and the normalized difference vegetation index (NDVI) that theoretically forms a triangular shape because of the evaporative cooling effect [10,11].However, the downscaling methods based on this relationship are considered semiempirical.Previous SM downscaling researches have consisted largely of variations in the regression formula based on these three related variables.Chauhan et al. [6] introduced surface albedo into this method to strengthen the relationship between SM and land parameters and applied it to 25 km SM data from a special sensor microwave imager and 1 km land parameters from the advanced very high resolution radiometer.A comparison of the 1 km SM and in situ SM revealed fairly similar trends, with a root mean square error (RMSE) that ranged from 0.005 to 0.037 m 3 •m −3 .In addition, the introduction of surface albedo was later adopted in Yu et al. [7] and Choi and Hur [5].Piles et al. [8] introduced brightness temperature (TB) instead of surface albedo to downscale the SM with other variables and the Ocean Salinity (SMOS) mission.
e polynomial regression formula applied in previous downscaling studies has been shown to have good performance.However, the method features innate errors resulting from the regression of a highly complex and nonlinear relationship of T s in nonhomogeneous vegetation conditions and SM into a polynomial model [12,13].us, there is a need to find and employ a different regression model to better capture the inherent complexity.
A support vector machine (SVM), an alternative method to downscale the SM, is a machine-learning algorithm that provides a nonlinear generalization solution to datasets through structural risk minimization and is based on the solid theoretical foundation of Vapnik-Chervonenkis theory [14][15][16][17].
e initial applications of SVM have targeted optical characteristic recognition and object recognition tasks using support vector (SV) classifiers [18][19][20]; its application in regression and time series prediction was subsequently adopted [18].Support vector regression (SVR) in remote sensing research has often been applied to predict variables that appear as responses to other input variables [21,22].Kaheil et al. [23] suggested using downscaling algorithms for the Southern Great Plains 1997 (SGP 97) with SVM and assimilation with ground SM measurements.e SVM method was specifically used to tune the downscaled image based on the relationship between the original and approximated coarse scale image.Keramitsoglou et al. [24] applied an SVR to downscale the meteosat second generation T s using moderate resolution imaging spectroradiometer (MODIS) NDVI, emissivity, and other regression methods to find the preferred methodology.
ese studies using SVR have evaluated SM downscaling methods by comparing them within identically structured calculation methods in which only the input variables varied [5,8].However, the application of SVR in East Asia is insufficient using a remote sensing dataset.us, the comparison of downscaling in this area is necessary because the various methods for downscaling SM are inadequate.
In this study, a methodology to downscale active microwave SM based on T s and NDVI using SVR is suggested to build an optimized regression model that considers the spatial pattern of the original dataset to obtain finer, more accurate SM distribution relative to the conventional VIS/IR downscaling methods. is research is unique because it offers a cross comparison between the newly suggested SVR downscaling method and conventional methods.e downscaled SM was evaluated by taking in situ measurements from nine measurement sites within a 150 km × 125 km study area of the Korean Peninsula from March to November 2012.e polynomial regression downscaling method was also applied in the same study area for comparative evaluation.

Study Area.
e study area in southwestern South Korea encompasses the area from 35.0 to 36.3 °N and 126.6 to 128.4 °E for a total of 18,750 km 2 (Figure 1).Cropland and mixed forest are the dominant land covers.e area was selected for its representative land cover characteristics and the availability of in situ measurements, the locations and characteristics of which are described in Table 1.
e land cover types were considered because surface properties such as vegetation types, soil types, land uses, and topography could affect the SM retrieval algorithm that is based on microwave sensor observations [25,26].
e annual precipitation at the measurement sites ranged from 1300 to 1800 mm, with the heaviest rainfall occurring during the summer, and the annual mean temperature ranges from 10.6 to 13.2 °C [27].e western part of the study area generally consists of plains that are used as cropland, while the eastern part is of higher altitude and mostly forested (Figure 1).e land cover is classified using the MODIS yearly land cover type data with the international Geosphere-Biosphere Programme (IGBP) global vegetation classification scheme [28].

In Situ SM Measurements.
e nine in situ SM measurements stations that were used in this study were installed by the Rural Development Administration (RDA), Korea.
e measurement sites were approximately distributed to cover the study area, and SM was measured within 0 to 10 cm depth with time-domain reflectometry (TDR) at hourly time-step from March to November 2012.TDR and frequency-domain reflectometry (FDR) sensors are the most commonly used techniques to measure soil water content [29].
e TDR measures the propagation time of an electromagnetic wave along the transmission line to determine the dielectric permittivity, while FDR measures the capacitance.Previous studies have demonstrated good agreement in SM measurements between the two approaches [30][31][32].Note that there is an unavoidable limitation in the difference in the measurement depth of microwave satellite SM data and in situ data [4].However, because the geophysical variables adopted in this study represent the surface properties observed using an optical satellite sensor, the measuring depth difference was disregarded.Advances in Meteorology measures the radar backscatter at C-band (5.255 GHz) with vertical transmit-vertical receive (VV) polarization.Since the measurement is performed using two satellite tracks, dual 550 km-wide swaths are produced, covering 82% of the Earth daily.While the SM retrieval from passive microwave observations with 12.5 km resolution is mainly based on the linkage between TB and geophysical variables, the retrieval of the SM from ASCAT uses a time seriesbased approach to scale the backscattering coefficient between the lowest and the highest values which are presented as the degree of saturation [33,34].

Advanced
e SM values are estimated as the relative variation between the wettest (100%) and driest (0%) values.
e dataset used in this study was daily ASCAT-relative SM processed by the Integrated Climate Data Centre in Hamburg.Relative SM was converted to volumetric SM (m 3 •m −3 ) by applying the porosity of each soil texture to enable comparison with the in situ measurements (Table 2).

Moderate Resolution Imaging Spectroradiometer.
e MODIS on board the Earth observation system (EOS) Terra (10:30/22:30) and Aqua (01:30/13:30) satellites uses 36 spectral bands to observe characteristics of the atmosphere, land, and ocean.e MODIS products used in this study were 1 km resolution daily daytime T s (MOD11A1) and 1 km resolution 16-day NDVI (MOD13A2) from the Terra satellite.e T s is retrieved from TB using the generalized split-window algorithm [35].Cloudy pixels are excluded from the T s retrieval process since thermal infrared signals do not penetrate clouds and are thus confounded with cloud-top temperature.e NDVI can be calculated as the normalized ratio of the near IR and red bands, reflecting the chlorophyll and mesophyll in the vegetation canopy [36,37].
e level 2 daily surface reflectance product, from which the 16-day period of the MOD13A2 NDVI product is generated, is the adjusted data for ozone absorption, molecular scattering, and aerosols [38].To establish statistically significant regression models, only days with more than 90% cloud-free pixels were used.

Preprocessing of Remote Sensing Images.
e T s and NDVI from MODIS which originally have 1 km spatial resolution were uniformly disaggregated to a spatial resolution of 500 m and were then aggregated to have a 12.5 km resolution by applying arithmetic means as follows: where NDVI 12.5 is the 12.5 km averaged NDVI, T s,12.5 is the 12.5 km averaged T s , and m and n are the number of 500 m pixels in ith rows and jth columns in 12.5 km ASCAT, respectively.For downscaling the 12.5 km ASCAT SM, the difference between the 500 m and 12.5 km spatial resolution of the LST and NDVI dataset was required; thus, the 500 m LSTand NDVI products were upscaled to 12.5 km resolution for calculating the difference between the products that had different resolutions.

SM Downscaling Using Polynomial
Regression. e performance of the suggested downscaling method using SVR was evaluated by calculating the downscaled SM from the conventional polynomial method using the same input variables.Carlson et al. [10] suggested a relationship among SM, NDVI, and T s , a polynomial regression formula, under the different climatic conditions and land cover types as follows: where n is the number of a reasonable dataset and a ij is the regression coefficient at a specific day and scene for analysis.
(3) In this study, the equation is applied with n � 2 and i + j ≤ 2 to yield second-order polynomial equation as follows:

SM Downscaling Using Support Vector Regression.
e SM downscaling procedure using SVR consisted of two parts.e remote sensing images (ASCAT SM, MODIS T s , and NDVI) were preprocessed for application during the SVR process, and high-resolution SM data were produced using the training and prediction procedure in the SVR.e downscaling methodology suggested in this study combines the conventional VIS/IR synergistic downscaling method with the image approximation concept by introducing the locational information of latitude and longitude as an additional input variable.Figure 2 shows the entire procedure for the suggested downscaling.
e SVM is among the machine learning based on covariates' nonlinear transformations developed by Vapnik in the early 1990s [39].e SVM for regression was also updated by Vapnik [14].
is model included a training phase to train the associated input and target output dataset based on statistical learning theory [40].
Of the various versions of SVM tools, the LibSVM that was built by Chang and Lin [41] was used in this study.e radial basis function (RBF) was selected for the kernel as follow: where c is the bandwidth that determines the under-or overfitting loss [42].e x consists of {u, v, T s , and NDVI}, where u is the x position of the pixel and v is the y position of a pixel.e selection of the RBF kernel was based on previous studies that showed its superiority over other kernel functions for both classification and regression tasks [43][44][45].Two RBF parameters-gamma and penalty-were optimized using a grid search algorithm and n-fold cross validation, both of which have been widely used in the literature [46][47][48].e original sample in the n-fold cross validation is randomly divided into n subsamples of equal size.A single sample among the n subsamples is maintained with validation data for assessment of the model, and the n − 1 subsamples are used for training data.en, the process of cross validation is repeated n times, and the n subsamples are used at once for validation.
is approach has been widely used in SVM research [49][50][51], and it is regarded as a basic application in the LibSVM tool as previously mentioned.A three-fold cross validation was then used, and the selected parameters are showed in Table 3.Since the selection of geophysical variables (T s and NDVI) was theoretically conducted [3,6], variable selection was omitted.All of the variables were scaled to [0, 1] to even out quantitative differences among them.To obtain valid regression models with a sufficient number of samples, only satellite images from cloud-free days were used; thus, a total of 55 days from March to November 2012 were available.

Statistical Analysis Methods.
e four following indices were used for the statistical evaluation as follows: where SM satellite/model is the satellite-observed or modeled SM and SM in situ is the ground-measured SM.In this study, the averaged R value instead of each R value from each site was employed in accordance with previous studies [5,[52][53][54] because of the limitations of a lack of SM samples from both the ground and satellites.In case of in situ measurements, it is difficult to obtain simultaneous data with the over pass time of the satellite, and vice versa, as cloud cover causes an absence of the visible band-based MODIS land data (LST and NDVI) making it impossible to get a downscaled SM. e index of agreement (IOA) ranges from 0 to 1, with higher index values indicating a smaller mean square error and better agreement between the modeled values and observations [54].

Results and Discussion
e polynomial regression and SVR models were established to perform daily-scale evaluations of SM variability.
e averaged linear correlation coefficient value between the original and downscaled products was 0.55 for both models.
is model performed relatively well in disaggregating the coarse-scale original SM product; thus, both models were considered suitable to downscale the original SM dataset.

Evaluation of Downscaled SM Compared with In Situ SM.
e original ASCAT SM and each downscaling algorithm were compared against nine in situ SM measurements in the study region.Figure 3 shows the temporal variation in the SM measurements, 12.5 km SM from ASCAT, and downscaled 1 km SM using SVR and polynomial regression and their response to daily rainfall events.Although the characteristics of the temporal patterns are site-specific, all three remotely sensed measurements approximately followed the patterns of the in situ SM. e 1 km SM downscaled data using polynomial regression also showed a similar temporal pattern to that of the 12.5 km and in situ SM measurements, but crucially underestimated some values on occasion.is was most visibly demonstrated in comparison to the downscaled SM from SVR.In Geumsan, Yeongdong, Wanju, and Jeonju, the underestimation of the polynomial Advances in Meteorology downscaling results are more apparent than elsewhere.Overall, the 1 km SM downscaled using SVR had more realistic trend values than that using polynomial regression compared with the in situ SM, and its pattern was very similar to the original 12.5 km SM.
Since the results of the comparison of the downscaled SM methods appeared to have clear differences, particularly at some sites, the correlation between the two independent variables (NDVI and T s ) and the 12.5 km SM residual magnitudes of each regression model prediction were analyzed to evaluate the manner in which the variables affect the regression models using the p-test.Figure 4 shows the time series for R between the residual magnitude and the  Advances in Meteorology variables with the corresponding significance.e statistical results are summarized in Table 4. e p value, a statistical significance, is the marginal significance level under the assumption that true for the null hypothesis stands for occurrence probability of the given event and the slope means whether there is linear relationship between the independent variable x and the dependent variable y.For the polynomial regression model, the residual magnitude showed a significant and strong correlation on average with both independent variables with average R values of 0.32 and 0.41.Comparatively, the SVR model showed a weaker and insignificant correlation with averaged R values of 0.10 and 0.17.e difference between the degrees of correlation of the two models used in this study was likely caused by the methodological difference of the SVR model that additionally considered locational weighting.
Meanwhile, the signs of the slope coefficients for each variable were found to be opposing for each regression model.For the polynomial regression model, NDVI showed a positive relationship and T s showed a negative relationship with the SM residual magnitude, while the signs were opposing for the SVR model.Considering that for most dates, the relationship between the two variables and the residual magnitudes of the SVR models was insignificant, with large p values (average values of 0.41 and 0.24, resp.), and the signs were only meaningful for the polynomial regression models.
e positive slope coefficients between the NDVI and the residual magnitude can be explained as a result of the increased uncertainty of the microwave SM retrieval for areas with denser vegetation [55].e oppositely negative signs of the slope coefficients between T s and the residual magnitude were partially due to their relationship with vegetation.While the relationship between the two variables was assumed to be mutually independent in the regression models, similar water stress conditions produced two variables that were negatively correlated, partly due to evaporative cooling [12,56].In addition, the highest R values were found during the growing season, from mid-May to mid-September, demonstrating that seasonal patterns occurred in the residual of the polynomial regression models.is result was also partly explained by the crucial underestimating tendency of the polynomial regression model found at some sites.In the case of Jeonju, this pattern could be attributed to the highest annual mean air temperature based on the observed significant positive correlation between the regression model's residual magnitude and T s (Table 1).In addition, in a pixel-by-pixel inspection for days with extremely underestimated performance at each site (not shown here), the underestimations were found to have occurred in pixels in which T s was substantially higher than the average for that particular day.
As shown in Tables 5-7, each of the remotely sensed SM measurements were quantitatively evaluated by comparing them with the in situ SM measurements.e average value of nine in situ SM measurement sites was 0.23 m 3 •m −3 , and among the remotely sensed products, the 1 km SM downscaled using polynomial regression had the nearest value (0.24 m 3 •m −3 ) but with the highest RMSE.e average downscaled SM using SVR was 0.26 m 3 •m −3 with a standard deviation (SD) of 0.05 m 3 •m −3 , similar to that of the 12.5 km ASCAT SM measurement (0.05 m 3 •m −3 ) (Table 7).e R values between in situ SM and 12.5 km ASCAT SM, downscaled SM using polynomial regression, and downscaled SM using SVR were 0.66, 0.62, and 0.68, respectively.
Figure 5 presents the overall error distribution for each remotely sensed SM measurement.Since a difference in SM indicates an error in the remotely sensed SM relative to the in situ SM measurement, an ideal histogram would have a steep and narrow form centered on zero, thus indicating a normal distribution with a zero mean [57].While the original coarse scale SM had a positive bias on average with an RMSE of 0.072 m 3 •m −3 , for the downscaled SM using polynomial regression, the RMSE was the same as that of the original ASCAT SM but with a higher SD (0.072 m 3 •m −3 ).In the case of the SVR, it also had a positive bias with a decreased RMSE (0.065 m 3 •m −3 ) and SD (0.056 m 3 •m −3 ) (Figure 5).us, these results indicate that SVR offers better performance in reducing the error of the downscaled satellite SM. e R values between the satellite SM and the corresponding in situ measurements showed better results for the SVR downscaling method, with an increase from 0.62 to 0.68 as previously mentioned (Tables 6 and 7).e IOA

Advances in Meteorology
Table 5: Comparison between in situ and original scale (12.5 km) ASCAT SM.
In situ SM ASCAT SM (12.5 km)         Advances in Meteorology values, which are more sensitive to extreme values in estimating the model agreement, showed differences with the R value at some sites (Jeongeup and Wanju).However, on average, they also indicated the SVR results to be a better estimation for in situ SM.
Figure 6 shows the two-dimensional Taylor diagram [58] summarizing the statistics for the three ASCAT SM products compared with the in situ SM measurements from nine sites.
is diagram shows the statistical values between the original SM and downscaled SM using SVR and the polynomial method and in situ data.While the ranges of the R values for the three SM products were similar, from approximately 0.4 to 0.9, there were apparent differences in the distributions of the ratio of the SDs and RMSE.Although statistical resemblances were found between the results of the 12.5 km ASCAT SM (diamond) and 1 km SVR SM (circle), the diagram indicates a clearly higher SD for the 1 km poly SM (triangle).In particular, the SVR SM points were found to be most closely around the ideal arc drawn with a dashed line.
e results of the polynomial downscaling were more sparsely distributed on the diagram with an isolated point representing the result at the Hapcheon site, and the larger RMSE was probably a result of the geophysical characteristics at Hapcheon site since the corresponding ASCAT pixel contained a mixed land cover of forest and cropland.Generally, they showed some weak agreement between SM retrievals with in situ measurements such as at the Hapcheon site; however, the R values of the downscaled SM were largely improved even if the range of that improvement was small.

Spatial Distribution of Downscaled SM.
e 12.5 km and 1 km ASCAT SM measurements obtained using two different downscaling methods were spatially compared with daily mappings of each type of data on dry and wet days (Figure 7).e overall spatial variations of the 1 km ASCAT SM measurement were approximately similar to those of the 12.5 km data, but with more finely distributed characteristics.While the eastern part of the study area with forested land cover had a higher average SM of approximately 0.5-0.6 m 3 •m −3 , the western part with primarily cropland land cover had more temporal variation according to meteorological events.A comparison of the spatial distributions in  the 1 km SM mapping using polynomial regression revealed a clear similarity between the mappings of the 12.5 km SM and 1 km SM for SVR caused by the downscaling algorithm that uses each pixel's position as a predictive variable.Under wet conditions (Figures 7(c) and 7(d)), the spatial patterns of the downscaled SM from the polynomial regression are evenly distributed and rely on the distribution of T s compared with that under dry conditions (Figures 7(a) and 7(b)).Under wet conditions, the original ASCAT SM shows relatively dry patterns in the western part of the study area, while the TS and downscaled SM using polynomial regression show no higher temperature or drier patterns in the same region, respectively.Piles et al. [8] also reported a more consistent and similar spatial variability of the downscaled SM product relative to the original SMOS SM under dry soil conditions.us, a consideration of positional weighting would allow substantial performance improvement of the SM downscaling based on T s -NDVI.
Figure 8 shows the distribution of the seasonal mean differences between the 1 km ASCAT, uniformly disaggregated, and downscaled SM measurements generated using each methodology.On average, the difference between the original data and the downscaled SM using the SVR was clearly less than that using the polynomial downscaling.Although there were clear per-pixel discrepancies in the polynomial-downscaled SM, the differences in the SVR-downscaled SM were more evenly distributed, regardless of location.is characteristic was a result of the methodological difference between those that the SVR downscaling considered as the positional weight while the polynomial downscaling did not.
e difference in the polynomial-downscaled SM was negatively biased and was found to be larger in the southeastern region of the study area where the elevation was higher with a land cover dominated by mixed forests.It is probable that the higher uncertainty in the NDVI for densely vegetated areas erroneously affected the regression model.e largest differences in SM for both products were found during the summer (June to August) when the vegetation growth reached its peak, and this might have affected the relationship between the SM and T s .Similar seasonal differences in the error pattern for the downscaled SM were also found in Merlin et al. [3], and that study adopted a separate downscaling algorithm to reduce the seasonal discrepancy in downscaling performance by considering the controlling variable of SM for each pixel.In addition, in a future study, the depth discrepancy between satellite-and ground-based SM measurements should be corrected when comparing downscaled SM product with in situ data by estimating the profile satellite SM values [59,60].

Conclusions
e downscaling methods for remotely sensed SM dataset are among the most important topics in related research fields since they provide a solution to low spatial resolution.  is study proposed and evaluated a new downscaling method using SVR by comparing with in situ SM measurements and results of a conventional downscaling method.e RMSE decreased after downscaling using SVR from 0.08 to 0.07 m 3 •m −3 , and the R increased from 0.66 to 0.68; the bias remained the same at 0.03 m 3 •m −3 .Considering that the improvements and deterioration of the downscaled SM evened out on average, valid improvements in accuracy should be noticed at the nine sites selected for validation.
e statistics were better than those of the polynomial downscaling method, which had an RMSE of 0.09 m 3 •m −3 , an R of 0.62, and a bias of −0.02 m 3 •m −3 .In the correlation analysis between the independent variables (NDVI and T s ) and the residual magnitude between the 12.5 km ASCAT SM and predicted SM from each regression method, only the polynomial regression residual magnitudes showed significant results that were positively correlated with NDVI and negatively correlated with T s .In a spatial comparison among the SM mappings at two scales, the 1 km SM using SVR better followed the spatial distribution of the original scale (12.5 km) than the 1 km SM using a polynomial regression.In the spatial distribution of the seasonally averaged differences between the original and the downscaled SM contents, the SVR downscaling method showed a more consistent performance, given the seasonal effect.Based on these results, the suggested SVR downscaling method can be used to improve the spatial resolution of satellite SM while offering better performance than the conventional downscaling method.However, this study did have several limitations; first, the remote sensing data were difficult to obtain due to missing products; second, it took considerable time to preprocess the dataset and execute the model to obtain downscaled SM; and lastly, the algorithm's complexity needed considerable memory requirements for a wide range of tasks [61].In a future study, the limitations of this study will be improved by applying various remote sensing and assimilation datasets.is method can be extended to apply to various fields that require fine-resolution SM datasets such as large-scale water-related natural disasters.
is is because antecedent SM information can be effectively used to predict landslides, droughts, dust outbreaks, and agricultural water deficiencies [62,63].

Figure 1 :
Figure 1: (a) Study area on the Korean Peninsula and (b) the in situ sites (white circles).

Figure 2 :Figure 3 :
Figure 2: Flow charts of (a) the conventional and (b) the support vector machine (SVR) downscaling algorithms.
r e l a t i o n c o e f f i c i e n t 12.5 km ASCAT SM 12.5 km ASCAT SM (SVR) 12.5 km ASCAT SM (polynomial)

Figure 6 :Figure 5 :
Figure 6: Taylor diagram (correlation coefficient, normalized standard deviation, and root mean square error) of each SM measurement from 12.5 km ASCAT, 1 km ASCAT with SVR, and 1 km ASCAT SM with polynomial.

Figure 7 :
Figure 7: Spatial distributions of the original and downscaled SM, NDVI, and T s on dry days on April 16 and October 16 and wet days on April 26 and September 10.

Figure 8 :
Figure 8: Seasonal spatial distributions during 2012 of the residual between the original and downscaled SM using polynomial downscaling for (a) spring, (b) summer, and (c) fall and SVR downscaling for (d) spring, (e) summer, and (f ) fall.

Table 1 :
Descriptions of the nine in situ sites.

Table 2 :
Green-Ampt infiltration parameters for various soil textures.

Table 3 :
Parameter characteristics of the SVR regression model.

Table 4 :
Correlation analysis between the independent variables and residual magnitude.