SPACEBORNE GNSS- R RETRIEVING ON GLOBAL SOIL MOISTURE APPROACHED BY SUPPORT VECTOR MACHINE LEARNING

GNSS Reflectometry system is an excellent to sense soil moisture content. In recent, GNSS-R technique could be aided to detect soil moisture contents but still have many difficulities issues, most especially vegetation impact. Soil moisture observing is a major concept for enhancing the sustainability of the earth’s system and process. On retrieving soil moisture from spaceborne GNSS-R technology has been challenging to the system, retrieving model and geophysical parameters. In this research, we use the Support Vector Machine (SVM) method to retrieve global soil moisture, the TDS-1 Delay Doppler Map (DDM) and the AVHRR Normalized Difference Vegetation Index (NDVI) imagery as inputs and the Soil Moisture and Ocean Salinity (SMOS) soil moisture data as a reference to retrieve global SM daily basis. The results have shown that the squared correlation coefficient (R) values are much higher in TDS-1 fused with NDVI than using DDM alone, which indicates that vegetation impact has effectively weakened. The feasibility of this approach could provide the performance for spaceborne GNSS-R retrieving to soil moisture analysis.


INTRODUCTION
GNSS-R technology has tremendously succeeded in earth observations since the recent decades. Spaceborne GNSS-R platforms as TDS-1 and CYGNSS have been successfully launched and aided in retrieving useful information on the areas of sea and land applications. In this GNSS-R concept, both together with transmitter and receiver becomes bistatic radar by adequately retrieving the reflected signals. This configuration can be used for both of an altimeter or a scatterometer, to investigate the interesting characteristics as surface roughness, or dielectric properties and sub-surface features. GNSS-R technology proved to be highly productive and efficient for monitoring the earth's environments, measuring significant wave heights, wind speeds, surface roughness, ice and snow thicknesses and soil moisture conditions. However, despite these advantages, there are still challenges in retrieve techniques on data assimilation, verification and accuracy assessments in line with a temporal and spatial capsule. Many soil moisture techniques determination employed, ranging from multi-disciplinary data sources as well as machine learning approaches. GNSS-R for soil moisture observation has utilized in-ground and air-based.  initiated spaceborne GNSS-R sensitivity to soil moisture using TechDemoSat-1 data preliminary results showed a good correlation with soil moisture. Recently, Senyurek, V., et al. (2020) calculated with NASA CYGNSS data are compared and analyzed for the soil moisture retrieval through utilizing multiple validation strategies including SVM has RMSE values 0.065 cm3/cm3. Observing soil moisture is a major factor in hydrological practice, which influence on evapotranspiration, run-off and infiltration on the land surface. It has significantly improved the prediction of soil moisture with short, medium and * Corresponding author huge changes in climate, and hazardous crisis such as in disaster mitigated land. Raghu Garg et al. (2019) process using with several machines techniques figured out to extract knowledge from big data learning methods for sustainability on plant-related studied. In this research, attempts have using SVM machine learning methods to retrieve global soil moisture, taking the TDS-1 DDM input and the SMOS SMC as reference. This paper is categorized as follows: section 2 describes the specific data preparation; section 3 states the data processing and next is the discussion; finally, the conclusion in section 4.

DATA PREPARATION
In this research, we used UK TDS 1 data and AVHRR NDVI data as inputs and SMOS data which is an ESA Explorer Opportunity Science Mission as reference. All data are prepared monthly from January to March 2017 and collocated.

Soil Moisture and Ocean Salinity (SMOS)
SMOS is one of ESA's missions, and the satellite brings radiometer that operates in the microwave L band range to detect brightness temperature images. The images over land used to derive global maps of SMC every three days, achieving an accuracy of 4% at a spatial resolution of about 50 km. A new reprocessed level of SMOS SSM product released from the CATDS Centre, i.e. SMOS level 3 products (SMOSL3), which provided as global gridded maps of SSM. It contains surface soil moisture conditions and land information in a daily product. Soil moisture data generated with two products per day: one for ascending and one for descending orbits. Finally, the three months products aggregated, and it provides retrieving to soil moisture from spaceborne observation.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition)

Normalized Differential Vegetation Index (NDVI)
NOAA Climatic satellites with AVHRR sensor record the NDVI daily product. The NDVI produces daily output and obtains time-series data for vegetation observation. NDVI is ubiquitous as an index of vegetation. Based on remote sensing observation, the monitoring of vegetation occurrences takes place via times series data analysis and big data processing systems. These systems may use a pixel or object-based algorithms to examine vegetation health, evapotranspiration, and other ecosystem functions as agricultural concerned. NDVI observation can make sustainability of earth's vegetation and enhancing the climatic condition as well.

UK Test DemoSat (TDS-1)
The spaceborne GNSS receiver received the reflected signals, which is flying on-board TechDemoSat-1satellite. The receiver can process various simultaneous tracks, from multiple GNSStransmitters. GNSS reflections are not only sensitive to the ocean, but also for the land information, triggering to the other potential new opportunities for remote sensing and estimating the thickness of sea ice, snow depth, soil moisture levels, and the classification of vegetative foliage.  pointed out the presence of vegetation indicates that it attenuates the GNSS signal, reducing the reflection coefficient, and the sensitivity to soil moisture, which is still enough for remote sensing from space. They detail analysed vegetation impacts on SM as the more vegetation cover (NDVI increases), the reflective SNR and the sensitivity to soil moisture and the Pearson correlation coefficient decreases; however, it is still significant. Camps, A., et al. (2018) analysed retrieval to surface and subsurface soil moisture by using four years TDS-1 data and SMOS with different spatial scales at global and regional observation.
They found that GNSS-R observation to soil moisture retrieving performance is ~0.09 dB %. However, this is relying on the spatial scale used for the ground-truth and the selected region. Park, H., et. al., (2018) developed the DDM simulator using TDS-1 and CYGNSS data shown that the DDM over land varies according to the glistening zone characteristics, e.g., the mixture of surface type, topography, soil moisture, and vegetation, etc.
Where, Ti = the coherent integration time, PtGt = the transmitter's capable isotropic radiated power Gr = the receiver antenna gain pattern, Rt, Rr = distances between the nominal specular point and the transmitter/receiver, 2 = the Woodward Ambiguity Function (WAF), which describes the range and Doppler selectivity of the coherent radar, 0 = the normalized bistatic radar cross-section (BRCS) of the rough surface. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition)

Support Vector Machine (SVM)
SVM supervised learning technique used for data regression and classification analysis. In other terms, SVM is a prediction tool that maximizes estimated accuracy while automatically omitting overfit to the data. SVM trained with a learning algorithm that executes a learning bias derived from statistical theory. SVM is very popular utilized in machine learning research around the world. SVM becomes popular and, it gives exact correctness comparable to complex neural networks with further details features. A proposed explanation that performed the accurate classification of the training samples, and new learning algorithms designed to find such precise fit to the data. SVM training involves deciding the maximum margin hyperplane that divide the two classes. The maximum margin hyperplane is one which has the most significant separation from the nearest training data point. Figure 5 shows the margins for an SVM. Given a training data set (xi, yi) where xi is an ndimensional vector and yi = 1. If xi is in class 1 and yi = -1 if xi is in class 2. A standard SVM finds a hyperplane w.x-b = 0, which correctly separates the training data points. It has a maximum margin which is the distance between the two hyperplanes w.x-b =1 and w.x-b = -1 as shown in figure 5. The optimal hyperplane which a maximum margin can obtain with the following quadratic programming problem, subject to yi(w.xi-b)= 1i, i > 0, 1 < i < l where C is the soft margin parameter and is a slack variable for the non-separable case. The optimal hyperplane to be obtained as, , Where, i is the Lagrange multiplier, and K (xi, x) is the kernel function. A standard SVM is a two-class classier where the outcome is 1 or -1. When sets are non-linear separable, the data points in the initial finite-dimensional space classed to an important dimensional feature where it can be separated easily. The accuracy performance of an SVM classifier depends on the selection of kernel, the kernel's parameters, and soft margin parameter C. There are two kinds of Kernel RBF functions; Gaussian and Exponential. In this research, we use Gaussian Radial Basis Function with a Gaussian form,

, (1.4)
After the model has been built, the predict method is used to make the predictions.

Controlling Complexity
SVM is a robust technique to evaluate any training samples and generalizes preferable on given datasets. The complexity in terms of kernel affects the accomplishment of new datasets. For controlling complexity, SVM supports parameters should be able to determine by cross-validation on the given datasets. Choosing the right kernel-related with the problem or application would enhance SVM's performance. The following diagram gives a better understanding of how to control in complexity process carried out.

SVM Classification:
Although, it can consider that neural networks are more applicable to use than SVM technique, however, sometimes unsatisfactory results obtained. A classification scheme has usually involved training and testing data involving some data instances. Each occurrence in the training set concluded one of the target values and different attributes. SVM intends to build a model that shows the target value of information instances in the testing set, which gives only the data attributes. SVM classification is an example of one of the supervised learning techniques. Known labels help to show whether the system is performing accurately or not. This information points guides to the desired feedback, validating the correctness of the system, and improve the network learn to act accuracy performance.

SVM prediction on Soil Moisture:
In this research, Support Vector Machines (SVMs) are used to develop these predictive soil moisture models and then drive soil moisture retrieving on time series data analysis. Vapnik (1995) developed the statistical learning strategy theory that SVMs can be used to predict a quantity forward in time with the results of "training" data. There are two critical factors to enhance the generalization capability of the learning machine. The training error rate is the first factor, and the second one is the potentiality of the learning machine estimated according to Vapnik-Chervonenkis (VC) dimensions. Ahmad, S., et al. (2010) pointed out, and compared soil moisture observation with SVM RMSE value was 0.34 to 0.77 tested in selected sites 2% less and more predicted than other techniques such as an Artificial Neural Network (ANN) and Linear Regression model (MLR). Ren, C., et al. (2019) figured out GNSS Interferometric Reflectometry data to the correlation with received SNR and soil moisture estimation using least square-SVM method. They concluded that LS-SVM based on multi-satellite fusion results is a more accurate estimation on retrieving soil moisture than single satellite means single station. Garg, R., et al. (2019) extracted big data for the sustainability of soil nutrition composition comparatively analyzed with machine learning techniques as support vector machine (SVM) using the polynomial function, radial basis function (RBF) methods and others.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition)

Data Processing
The following figure presents the processing flow of this work. On the one hand, SMC calculated using both TDS-1 data and NDVI data. On the other hand, the retrieval also carried out using TDS-1 data alone as a comparison. The data used in this work collected from January to March 2017. The data processing carries out as follows. First, TDS-1 DDM SNR data has filtered according to these principles: (1) The data whose corresponding antenna gain is below 10dB is filtered out to guarantee the data quality; (2) the information whose coordinates located in the ocean is filtered out. Secondly, the TDS-1DDM and SMOS SMC data are matched in both time and space dimensions to ensure the rationality of the retrieval. The next step is to fuse TDS-1 DDM and the NDVI imagery as input and the SMOS soil moisture data as a reference. Again, TDS-1DDM SNR only used for input and to give room for results comparison with SMOS data as a reference. They were retrieving SM, a better accuracy result with the SVM method based on input data achieved. These involve removing features from the usual which have low weights to gain a specified level of data sparsity. Lastly, using only features retained after choosing the feature selection process is done, a representation of the full training set of documents created. The Radial Basis Kernel Function (RBF) SVM in the fixed feature space classify the test data retained. This method planned to take advantage of the memory freed as a result of increased data sparsity and include more big data training sets while keeping the memory consumption constant. Concurrently, the performance controlling the possible negative impact of the reduced feature space calibrated. We here trained 19701 and tested 4925 out of 24626 samples data.

Results and Discussion
UK TDS 1 and AVHRR NDVI data as inputs and SMOS data as reference datasets used to train SVM-based machine learning classifier.

Training with SVM Classification (without NDVI):
The following figures present the data processing results obtained using TDS-1 along and SMOS as reference dataset according to training and testing tasks. After the prediction task has done, the result shows that the mean square error is 0.0827153, and the squared correlation coefficient is 0.102284 for the training dataset. And the mean square error is 0.0779197, and the squared correlation coefficient is 0.0960314 for the testing dataset.

Training with SVM Classification (with NDVI):
The following figures present the data processing results obtained using TDS-1 fused with NDVI data as input and SMOS as reference according to training and testing tasks. In the above figure 10 and 11, the data optimization finished with iteration 45525 and the respective values obtained are as follows: nu = 0.995864, obj = -3830.046966, rho = -0.126590, nSV = 19642 and nBSV = 19603. After the prediction task has done, the result shows that the mean square error is 0.0764551, and the squared correlation coefficient is 0.177215 for the training dataset. And the mean square error is 0.0727246, and the squared correlation coefficient is 0.16478 for testing dataset. It is showed that the squared correlation coefficient (R) values with NDVI much higher than without NDVI dataset. Higher correlation coefficient means more correlation to retrieve soil moisture. Specifically, for each calculation, its training performance is superior to its testing stage.

CONCLUSION
This work focus on the TDS-1 DDM data and SMOS data to remotely sense the global soil moisture using SVM learning approach. Support Vector Machines acts as one of the best methods for data classification performance. It combines generalization as a technique to control dimensionality. In classification problems, generalization control obtained by maximizing the margin, which corresponds to the minimization of the weight vector in a legal framework. The solution obtained as a set of support vectors that can be sparse. However, retrieval performance is still to be improved. Although ground and airbased observation had shown good results of (0-5) cm soil moisture or vegetation water contents, it had assumed that spaceborne GNSS-R observation would be too weak and uncertainty on soil moisture sensing. Also, in this research, space-borne GNSS-R data to retrieve on soil moisture is not very significant. Data range should extend on yearly analysis. Data, features and models are the essential cores of the multidisciplinary data source fusion driven by the SVM classification method, in terms of time series data analysis. GNSS R technology boost to complement the results of the existing space-based earth sensing techniques, such as SAR or other space-borne data.