Data-Driven Calibration of Soil Moisture Sensor Considering Impacts of Temperature: A Case Study on FDR Sensors

Commercial soil moisture sensors have been widely applied into the measurement of soil moisture content. However, the accuracy of such sensors varies due to the employed techniques and working conditions. In this study, the temperature impact on the soil moisture sensor reading was firstly analyzed. Next, a pioneer study on the data-driven calibration of soil moisture sensor was investigated considering the impacts of temperature. Different data-driven models including the multivariate adaptive regression splines and the Gaussian process regression were applied into the development of the calibration method. To verify the efficacy of the proposed methods, tests on four commercial soil moisture sensors were conducted; these sensors belong to the frequency domain reflection (FDR) type. The numerical results demonstrate that the proposed methods can greatly improve the measurement accuracy for the investigated sensors.


Introduction
The determination of soil moisture content is of great importance to the management of agriculture and in the field of hydrological engineering [1,2]. In modern agriculture, soil moisture is frequently monitored to better schedule irrigation [3,4]. A variety of approaches have been developed to measure soil moisture content based on various techniques including the thermo-gravimetric technique, the calcium carbide technique, the neutron scattering technique, dielectric techniques, electrical impedance sensors, and the thermal dissipation block technique [5][6][7]. These techniques differ in measuring principles, accuracy, and complexity. The thermo-gravimetric technique measures the soil moisture content by drying the soil sample in an oven. This technique can provide accurate measurements of soil moisture and is often used as the standard reference. However, thermo-gravimetric-technique-based measurement is time-consuming and requires special equipment, which constrains the application of such a technique, especially for in-situ measurements. Other techniques employ the physical and chemical properties of soil to measure its moisture content. In this study, the soil from the tillage layer (0-20 cm) was used for the experiments, and its parameters of characteristics are given in Table 2. The procedures of the experiments composed of the following steps: (1) The soil sample was dried and sieved with a 2 mm sieve; (2) the wet soil samples with different levels of soil moisture content such as 9.58%, 18.25%, and 27.01% were obtained by applying the soil mixing method; and (3) the readings of the sensors were recorded at considered temperatures (0, 5,10,15,20,22,25,30,35,40, 45 • C) in temperature-controlled chambers. In the experiment, the samples were sealed with film to avoid the impact of water evaporation. The considered test temperatures fall in the range of the environmental temperature during the growth of wheats and corns in Northern China. The standard Gravimetric technique was applied to obtain the Sensors 2019, 19, 4381 3 of 11 actual soil moisture [19]. The soil samples were dried in the oven at 105 • C for 48 h to a constant weight before the soil moisture was computed. The size of test data points for each sensor was 70. Examples of records of operating temperature (T), sensor reading, and actual (reference) soil moisture content are illustrated in Table 3. From Table 3, there are errors on the measurements of soil moisture content by the sensor, and the sensor reading can be quite different under various operating temperatures. To better illustrate the temperature impacts on the measurements, the sensor reading at various temperatures for different levels of actual soil moisture content is illustrated in Figure 1. It is observable that the sensor reading tends to increase with increasing temperature, while the trend of variation highly depends on the level of actual soil moisture content and the type of soil moisture sensor. This is partially due to the increasing temperature strengthening the polarization of the soil and the movement of water molecules, resulting in a larger soil dielectric constant.  The size of test data points for each sensor was 70. Examples of records of operating temperature (T), sensor reading, and actual (reference) soil moisture content are illustrated in Table 3. From Table  3, there are errors on the measurements of soil moisture content by the sensor, and the sensor reading can be quite different under various operating temperatures.
To better illustrate the temperature impacts on the measurements, the sensor reading at various temperatures for different levels of actual soil moisture content is illustrated in Figure 1. It is observable that the sensor reading tends to increase with increasing temperature, while the trend of variation highly depends on the level of actual soil moisture content and the type of soil moisture sensor. This is partially due to the increasing temperature strengthening the polarization of the soil and the movement of water molecules, resulting in a larger soil dielectric constant.  From above analysis, it is valuable to investigate the calibration methods of soil moisture sensors considering the impact of temperature.

Methodology
The framework of the proposed data-driven calibration methods is depicted in Figure 2, which consists of the following four steps: (1) Prepare the training dataset: Collect both the soil moisture sensor data and the actual soil moisture content via experiments.
(2) Develop the calibration model: Train a regression model based on the multivariate adaptive regression splines (MARS) and Gaussian process regression (GPR) algorithms on the training dataset.
(3) Model evaluation: Compute the calibration errors using the learned model and the test dataset.
(4) Model application: Collect new sensor data and apply the calibration model to yield the calibrated soil moisture content.
The development of data-driven calibration methods based on the MARS and the GPR models was introduced as follows.

Multivariate Adaptive Regression Splines
The MARS model was built to generate more accurate soil moisture content from the original sensor readings. Given the original sensor reading r, the calibrated soil moisture y was yielded according to Function (1) From above analysis, it is valuable to investigate the calibration methods of soil moisture sensors considering the impact of temperature.

Methodology
The framework of the proposed data-driven calibration methods is depicted in Figure 2, which consists of the following four steps: (1) Prepare the training dataset: Collect both the soil moisture sensor data and the actual soil moisture content via experiments. From above analysis, it is valuable to investigate the calibration methods of soil moisture sensors considering the impact of temperature.

Methodology
The framework of the proposed data-driven calibration methods is depicted in Figure 2, which consists of the following four steps: (1) Prepare the training dataset: Collect both the soil moisture sensor data and the actual soil moisture content via experiments.
(2) Develop the calibration model: Train a regression model based on the multivariate adaptive regression splines (MARS) and Gaussian process regression (GPR) algorithms on the training dataset.
(3) Model evaluation: Compute the calibration errors using the learned model and the test dataset.
(4) Model application: Collect new sensor data and apply the calibration model to yield the calibrated soil moisture content.
The development of data-driven calibration methods based on the MARS and the GPR models was introduced as follows.  The development of data-driven calibration methods based on the MARS and the GPR models was introduced as follows.

Multivariate Adaptive Regression Splines
The MARS model was built to generate more accurate soil moisture content from the original sensor readings. Given the original sensor reading r, the calibrated soil moisture y was yielded according to Function (1): where T is the environmental temperature and f is the MARS model. The MARS model is described as [15]:  (3) on the training data set. The forward and the backward pass procedures [15] as well as the generalized cross-validation were applied to avoid over-fitting, and thus a more robust model was obtained.

Gaussian Process Regression
To improve the accuracy of soil moisture content measurement, a GPR model was applied to capture the relationship among the operating temperature T, original soil moisture sensor reading r, and actual (reference) soil moisture content as in (4) [17]: where y represents the actual (reference) soil moisture content, while x = [r, T], and ζ~N(0, δ 2 ). In Equation (4), f (x) are latent functions from a Gaussian process, which is a collection of random variables, and any finite number of such variables follows a joint Gaussian distribution. A Gaussian process is specified by its mean function m(x) and its covariance function g(x, x ) as f (x) ∼ GP(m(x), g(x, x )). In practice, the data are generally normalized to have a zero mean. Consider the training data set D = (X, y), where X = x i , x ∈ R d | n i=1 denotes the samples of predictor and y = y i , y ∈ R n i=1 denotes the samples of response. In Equation (4), the response y results from additive combination of Gaussian variables f (x) and ζ; hence, y also follows a Gaussian distribution. The GPR model associated to the training data set can be expressed as: where G(X, X) denotes the covariance matrix with G ij = g(x i , x j ) and I is an identity matrix. The Gaussian kernel (6) is frequently considered in the applications of GPR models: In Equation (6), the σ 2 f is the signal variance, and the diagonal matrix, M = diag [1/λ 2 1 , 1/λ 2 2 , . . . derived by maximizing the logarithm marginal likelihood function based on the training data set as in Equation (7).
For a test point x * , the joint distribution of the response y * associated to the training data set follows: According to the theory of the joint Gaussian distribution, the predictive distribution of y * is written as: where µ = G(x * , X) G(X, X) + σ 2 I −1 G(X, x * ). The prediction of the response was assumed to be:

Performance Metrics
In this study, error metrics including mean bias error (MBE), mean absolute error (MAE), and root mean square error (RMSE) were employed to verify the effectiveness and efficiency of the proposed methods in improving the accuracy of soil moisture content measurements by considering the impacts of temperature: whereŷ and y indicate the modeling and actual soil moisture content, respectively, and n is the number of test data points.

Experiment Results
The proposed calibration models were developed for each device. The size of available data points was small. To comprehensively verify the efficacy of the proposed methods on the entire data set, the cross-validation technique was applied to implement the proposed methods using the following steps: (Step 1) The entire data set D is randomly partitioned into k folds, D = {D 1 , . . . , D k }; ( Step 2) Train a MARS/GPR model with data set P i = D\D i , which is the complementary data set of D i , for I = 1, . . . , k; (Step 3) Apply the MARS/GPR model in Step 2 to predict the soil moisture content on the data set D i for I = 1, . . . , k; (Step 4) Calculate modeling errors on the entire data set D. Table 4 illustrates one set of GPR model parameters for test sensors, while the prediction equations of MARS model are given in Equations (14)- (17).
In Equations (14)- (17), the basis function is defined in Functions (18) and (19), where x is the input variable being either r (moisture sensor reading) or T (temperature): To illustrate the effectiveness of the proposed calibration methods considering the impacts of temperature, the performance of the proposed methods was compared with the sensor reading as well as data-driven methods developed only using information from the original sensor reading.
The comparison of modeling performances by different methods are provided in Table 5. It is observed from Table 5 that large differences exist between the measured and actual soil moisture data. More accurate soil moisture was obtained by using the data-driven calibration methods in terms of MBE, MAE, and RMSE. Moreover, the consideration of temperature impacts highly improved the modeling accuracy with the data-driven models. Therefore, the MARS model and the GPR model are effective for developing the data-driven calibration method for soil moisture sensors considering temperature impacts. Between the MARS model and the GPR model, neither dominated the other for all three metrics and four sensors.  Table 6 illustrates an example of the modeling performance at various temperatures. It is observable that at the extremely high and low temperature, the improvement of modeling accuracy by incorporating the temperature information was much greater than by only using the sensor reading. In China, the ambient temperature can be around 0 • C during the growth period of winter wheat, while the ambient temperature can be greater than 35 • C during the growth period of summer corn. Hence, the improvement of measurement accuracy at high/low temperatures can be of great importance to the arrangement of irrigation during the growth period of crops in different seasons. The boxplot of the bias error is illustrated in Figure 3. It is further demonstrated that the proposed methods highly improved the accuracy compared to the sensor reading, while the variation of the bias errors was also reduced.  To further demonstrate the performance of the proposed methods, the calibrated and actual soil moisture at temperature 30 °C is depicted in Figure 4. It is observable that the modeling soil moisture by the proposed data-driven calibration methods agreed well the actual soil moisture.
(a) (b) To further demonstrate the performance of the proposed methods, the calibrated and actual soil moisture at temperature 30 • C is depicted in Figure 4. It is observable that the modeling soil moisture by the proposed data-driven calibration methods agreed well the actual soil moisture.
(c) (d) To further demonstrate the performance of the proposed methods, the calibrated and actual soil moisture at temperature 30 °C is depicted in Figure 4. It is observable that the modeling soil moisture by the proposed data-driven calibration methods agreed well the actual soil moisture. From the above analysis, the proposed data-driven calibration of soil moisture sensors considering the impact of temperature can greatly improve the accuracy of soil moisture content measurement. The MARS and GPR model were used due to their strong capability in nonlinear From the above analysis, the proposed data-driven calibration of soil moisture sensors considering the impact of temperature can greatly improve the accuracy of soil moisture content measurement. The MARS and GPR model were used due to their strong capability in nonlinear modeling with a limited training dataset. The MARS model can be more efficiently implemented on embedded devices compared to the GPR model in terms of model complexity, while the latter achieved better performance for most cases in this study. Hence, the trade-off between the modeling accuracy and the ease of model implementation should be considered when selecting calibration models in practice. In the future, more machine learning algorithms such as boosted regression trees and neural networks [20] can also be applied to sensor calibration with rich data.

Conclusions
In this paper, data-driven methods based on the multivariate adaptive regression splines (MARS) and Gaussian process regression (GPR) models were developed to calibrate soil moisture sensors considering the impact of temperature. The effectiveness and efficiency of the proposed method were verified on various soil moisture sensors that belong to the frequency domain reflection (FDR) type. The numerical results demonstrate that the proposed methods can greatly reduce the measurement errors. This study supports the application of data-driven models for the calibration of soil moisture sensors to improve the measurement accuracy for the considered sensors.