Next Article in Journal
Obscuration Threshold Database Construction of Smoke Detectors for Various Combustibles
Next Article in Special Issue
Application of the TDR Sensor and the Parameters of Injection Irrigation for the Estimation of Soil Evaporation Intensity
Previous Article in Journal
Graph Search-Based Exploration Method Using a Frontier-Graph Structure for Mobile Robots
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Simultaneous Prediction of Soil Properties Using Multi_CNN Model

1
College of Information Science and Engineering, Ocean University of China, Qingdao 266000, China
2
Pilot National Laboratory for Marine Science and Technology, Qingdao 266000, China
*
Author to whom correspondence should be addressed.
Sensors 2020, 20(21), 6271; https://doi.org/10.3390/s20216271
Submission received: 28 September 2020 / Revised: 30 October 2020 / Accepted: 1 November 2020 / Published: 3 November 2020
(This article belongs to the Special Issue New Sensors for Monitoring of Soil Parameters)

Abstract

:
Soil nutrient prediction based on near-infrared spectroscopy has become the main research direction for rapid acquisition of soil information. The development of deep learning has greatly improved the prediction accuracy of traditional modeling methods. In view of the low efficiency and low accuracy of current soil prediction models, this paper proposes a soil multi-attribute intelligent prediction method based on convolutional neural networks, by constructing a dual-stream convolutional neural network model Multi_CNN that combines one-dimensional convolution and two-dimensional convolution, the intelligent prediction of soil multi-attribute is realized. The model extracts the characteristics of soil attributes from spectral sequences and spectrograms respectively, and multiple attributes can be predicted simultaneously by feature fusion. The model is based on two different-scale soil near-infrared spectroscopy data sets for multi-attribute prediction. The experimental results show that the R P 2 of the three attributes of Total Carbon, Total Nitrogen, and Alkaline Nitrogen on the small dataset are 0.94, 0.95, 0.87, respectively, and the R P 2 of the attributes of Organic Carbon, Nitrogen, and Clay on the LUCAS dataset are, respectively, 0.95, 0.91, 0.83, And compared with traditional regression models and new prediction methods commonly used in soil nutrient prediction, the multi-task model proposed in this paper is more accurate.

1. Introduction

Soil is an important natural resource. The rapid acquisition of soil property content and spatial distribution is of great value and significance to agriculture and global change. However, the collection of soil samples consumes a large amount of cost, so the prediction of soil nutrient content has become a hot topic in soil research. Visible light near infrared (Vis-NIR) spectroscopy analysis, with its unique advantages such as rapid detection, non-destructive, non-polluting, and real-time detection, has extensive research and application foundations in soil nutrient content prediction [1,2,3]. However, the spectral data is susceptible to interference from stray light, noise, baseline drift and other factors, which affect the modeling effect. Therefore, it is necessary to preprocess the spectral data before modeling to improve the predictive ability and robustness of the model. Due to the complex characteristics of spectral data, although traditional mathematical modeling methods can perform a certain degree of analysis and prediction, its more accurate and more universal prediction process faces technical bottlenecks. With the development of machine learning, many new spectral model regression prediction algorithms have been continuously proposed and applied [4,5,6]. However, compared with traditional mathematical modeling and machine learning methods, neural network models have higher computational efficiency and stronger modeling capabilities, and can independently extract effective feature structures from complex spectral data for learning. The purpose of this paper is to establish a soil nutrient spectrum prediction model with higher efficiency, higher robustness and accuracy, which is of great significance for accelerating the advancement of my country’s agricultural informatization, improving the level of agricultural scientific management and developing my country’s agricultural economy.
Early research found that the soil organic matter can be calculated from the reflectance value of the soil reflectance spectrum, and the response of soil properties can be identified from the spectral characteristics. In 2006, Rossel et al. compared the predictions of various soil concentrations using qualitative analysis values of visible light (VIS) (400 nm–700 nm), near infrared (NIR) (700 nm–2500 nm), and medium infrared (MIR) (2500 nm–5000 nm), demonstrating that soil analysis and soil information can be obtained more effectively using VIS, NIR and MIR [7]. Later, due to the complexity of vis-NIR spectroscopy, a variety of methods were applied to the pretreatment of soil spectra, such as Savitzky-Golay smoothing, standardization, and normalization methods [8,9]. In 2016, Lin et al. used a combined method of S-G smoothing and scattering correction to process soil spectral data to minimize irrelevant and useless information of the spectrum and increase the correlation between the spectrum and the measured value [10]. By choosing the best combination of preprocessing methods to process soil vis-NIR data, not only can the interference factors be eliminated to the greatest extent, but also the complementary relationship between each preprocessing method can be used to improve the prediction accuracy of the network model. In the existing literature, researchers mostly focus on the preprocessing of spectral data, and there are few proposals and improvements of correlation spectral regression models. A high-performance spectral data modeling method can simplify the preprocessing requirements of spectral data and is also the key to ensuring the accuracy of spectral prediction [11]. With the development of regression prediction, more and more linear regression methods are applied to soil nutrient prediction, such as the principal component regression (PCR) of Chang [12] and the partial least square regression (PLSR) method of McCarty [13]. After that, random forest, genetic algorithm, least squares-support vector machines (LS-SVM) and the Cubist method in machine learning are also used to improve model prediction ability [14,15,16,17]. Because deep neural networks are good at automatically extracting useful feature representations from large amounts of data, they have obvious advantages over shallow models and linear methods in modeling, and have become a hot spot in machine learning research in recent years. In 2015, Veres et al. applied deep learning technology to soil spectroscopy for the first time, proving the feasibility of the CNN model for evaluating certain characteristics of LUCAS soil data [18]. In 2017, Ruder proposed that the use of multi-task models can reduce the risk of overfitting while improving the efficiency of model training [19]. In 2019, Padarian et al. used the convolutional neural network (CNN) model and multi-task CNN network to predict various soil properties based on the LUCAS data set, verifying the effectiveness of multi-task learning in predicting soil properties, but the proposed deep learning method is only suitable for large-scale spectral data set, the prediction result is poor on a small sample [20]. After that, Ndikumana et al. used the spectral data as time series data and input it into the long and short-term memory network (LSTM) for soil prediction, and finally achieved good results. However, before training the model, the article performs PCA linear dimensionality reduction processing on the data, which may cause the loss of non-linear correlation between samples, resulting in the model not being able to fit the data characteristics well [21].
Aiming at the problems of low efficiency and low accuracy of current soil prediction models, this paper proposes a new multi-task model based on near-infrared spectroscopy soil data to simultaneously predict multiple attributes of soil. Since the spectral data presents a non-linear trend with the change of the spectral wavelength, this paper takes the spectral wavelength as the time axis, and the spectral data is a non-stationary time series signal. First, the spectrum signal through the three pre-processing methods of SG smoothing, multi-scattering correction, and centralization to construct a stable spectrum sequence. and the original spectral data is windowed and Fourier transform is used to generate a spectrogram, and multiple input channels are used to construct a dual-stream Multi_CNN network that simultaneously inputs a spectrum sequence and a spectrogram, and realizes multiple inputs and multiple outputs of the model by fusing one-dimensional convolution and two-dimensional convolution. In addition, the model has an adaptive input selection function, and independently selects single input and multiple input based on the characteristics of two different scale soil spectral data sets. Due to the small number of samples and short wavelength range of the Small dataset, it only uses a single input to use the one-dimensional convolutional network of the Multi_CNN model for attribute prediction, while LUCAS dataset selects multiple inputs for prediction based on the complete Multi_CNN model. The results show that the evaluation results of single-input network predicting small sample data are better than traditional machine learning algorithms. For large sample data sets, the evaluation results of the Multi_CNN model are better than the existing new models.
The structure of the article in this article is as follows. The second part introduces the two soil sample spectral data and preprocessing methods involved in the article, as well as the multi-input multi-output network Multi_CNN built. The third part compares the two data sets with different scales, and discusses and analyzes the results. The fourth part summarizes the article.

2. Materials and Method

2.1. The Soil Dataset

Deep learning methods require a large number of samples to train the network, but soil samples based on large national or even global data sets take a long time to sample, so local soil spectral data sets are generally small. The purpose of this article is to make the most accurate predictions for data sets of different scales. The study uses two soil vis-NIR spectroscopy data sets of different scales for prediction modeling.
The Small dataset selected in this paper is to obtain 180 soil samples from 19 sampling sites in Qingdao’s South District, Shibei District, Laoshan District, Huangdao District, and Jiaozhou City. The sampling points are selected to be consistent in color and vegetation coverage. In the region, keep the soil nutrients at each sampling point relatively uniform, and the soil quality is mainly sandy loam or silt loam. After air drying and sifting the soil samples, DH-2000 (Ocean Optics, Dunedin, FL, USA) was used as the light source to conduct soil nutrient spectrum collection by connecting the QE-65000 (Ocean Optics, Dunedin, FL, USA) spectrometer with Y-type optical fiber. The spectral range of the spectrometer is 200 nm–1100 nm, the sampling interval is set to 1 nm, and the integration time is set to 600 ms. In order to eliminate accidental factors, each soil sample in the study was repeated 5 times, and the average value was used for calculation. The obtained reflectance spectrum is greatly interfered in the beginning and ending period, so only the middle reflectance data from 225 nm to 975 nm is retained, and each sample contains 750-dimensional data. The basic information of each sample is shown in Table A1, Table A2, Table A3, Table A4. After that, the physical and chemical values of soil nutrients are obtained through laboratory methods, and the content of the test is the concentration values of soil total nitrogen (TN), total carbon (TC) and alkali hydrolyzed nitrogen (AN). Among them, the physical and chemical values of TN and TC were directly measured by Perkin-Elmer 2400 (PerkinElmer, Waltham, MA, USA) carbon and nitrogen analyzer, and the soil AN attribute value is measured by alkaline hydrolysis diffusion method. First, sodium hydroxide is used to hydrolyze the sample to make the available nitrogen alkaline hydrolyze into ammonia state, then it is absorbed with boric acid, then titrated with standard acid, and finally the content of alkaline hydrolysis nitrogen is calculated.
The large-scale soil data set LUCAS is composed of 19,036 soil sample data collected from 23 European countries, including cultivated land, grassland, woodland and other land types. The soil samples are quite different [8]. After the soil samples are also air-dried and sieved, they are measured by a FOSS XDS (Foss, Denmark) near-infrared spectrometer. The spectral characteristics are recorded with a spectral resolution of 0.5 nm to obtain 4200-dimensional data with a wavelength range of 400 nm–2500 nm. The LUCAS dataset tested a variety of soil properties, such as coarse debris content, organic carbon, nitrogen, potassium, phosphorus, pH, clay, and so forth, so this article selected three soil properties from the LUCAS data set, including organic carbon (OC), nitrogen (N) and Clay, the basic information of the two data sets is shown in Table 1, which shows the diversity and difference of LUCAS sample data. Figure 1 shows the original soil vis-NIR spectra of the two data sets, where the horizontal axis is the wavelength range and the vertical axis is the absorbance of the soil sample.

2.2. Data Preprocessing

In the collection of soil spectra, due to environmental factors such as temperature and humidity, instrument state, manual operation, and uneven physical state of the soil, the obtained spectra may contain interference factors such as noise, scattering, and baseline drift. Noise concealing spectral characteristics will reduce the accuracy of soil carbon content prediction. Therefore, the use of preprocessing methods to process the original spectrum can reduce the interference that affects the analysis results and establish a more accurate prediction model.
At present, there are many kinds of spectral preprocessing methods. According to the effect of preprocessing, this paper adopts three types of algorithms: smoothing, scattering correction and scale scaling to eliminate high-frequency noise interference in spectral signals. The Savitzky-Golay (S-G) smoothing algorithm is essentially a weighted average method. The smoothing point data is obtained by least square fitting of the data in the smoothing window with the method of polynomial fitting, so as to reduce the loss of spectral information in smoothing by weighting. Multivariate Scattering Correction (MSC) is one of the commonly used algorithms for spectral data preprocessing. It can effectively eliminate the spectral difference caused by different scattering levels and the shift and shift of baseline caused by the influence of scattering between samples, so as to enhance the correlation between spectra and data and improve the signal-to-noise ratio of original absorbance spectrum [22]. In addition, in order to eliminate the influence of the difference between the dimensions and the value range between the standards, it is usually necessary to centralize the data in the regression problem. The absorbance of the average spectrum is removed by calculating the data value of the absorbance of each spectrum, thereby deducting the value of absolute spectral absorbance. Figure 2a,d are the spectra of the two data sets after S-G smoothing, which can remove the noise peak while retaining the useful spectral information. After the data in Figure 2b,e is processed by MSC, the degree of spectral overlap becomes higher, which reduces the influence of scattering on the original spectrum. Figure 2c,f shows that the data after the centering is scaled based on the origin, eliminating the interference of size difference and different information structure.
In addition, since this paper uses vis-NIR spectroscopy as time series data, it is possible to decompose signal fragments and do Fourier transform to obtain a spectrum map of soil spectral data. In this experimentr, the Hamming window is used for windowing before the Fourier transformation. The frame length on the Small dataset is determined to be 50, the overlap observation value is 20, and there are 180 pictures in total. The LUCAS dataset contains 19036 pictures, the frame length is determined to be 100, and the overlap observation value is 50. Finally, each sample is represented by a 64 × 64 spectrogram. Figure 3 shows the spectrum of soil sample data randomly selected from the two data sets. The horizontal axis represents the wavelength range, and the vertical axis represents the frequency.

2.3. The Multiple-Input Multiple-Output Network

Convolutional Neural Networks (CNN) is a nonlinear model, and its unique convolution and pooling structure can extract essential features from complex input information, so it has excellent model characterization capabilities. It is usually composed of multiple convolutional layers, pooling layers and fully connected layers. The convolutional layer is a linear calculation layer that uses a series of convolution kernels to convolve with the input data. It can reduce the number of parameters of the whole model by taking advantage of the localization and positional independence of the features in the input data. The pooling layer is mainly used for feature dimensionality reduction, compressing the number of data and parameters to reduce over-fitting and improve the fault tolerance of the model. The fully connected layer uses neurons to fit the data distribution and improve the model learning ability. Temporal Convolutional Network (TCN) is an innovative network structure that combines the best practices extracted from convolutional neural networks. It transforms into a model suitable for sequence data by combining one-dimensional full convolution and causal convolution [23]. The residual structure of the TCN model connects causal convolution and dilated convolution, and its structure is shown in Figure 4. It includes two dilated convolutional layers, and WeightNorm and Dropout are added after each layer to achieve regularization. The dilated rate of each dilated convolutional layer increases exponentially with the number of levels, ensuring that the convolution kernel covers all inputs in the effective history information, and also ensuring that the use of deep networks can generate extremely long effective history information.
The multi-task method is a natural choice, and its goal is to obtain predictions for multiple tasks at the same time. The multi-task network mainly includes hidden layers shared between all tasks, as well as maintaining several task-specific output layers, where each output layer is associated with a task. This paper builds a multi-task network structure Multi_CNN, through the fusion of one-dimensional convolution and two-dimensional convolution network, to achieve the model’s multiple input and multiple output. The one-dimensional spectral sequence and two-dimensional spectrogram are used as feature data at the same time, which can better fit the spectral features, thereby improving the prediction accuracy of soil properties.
The first input of the model is mainly preprocessed spectral sequence data. By referring to the time convolution network suitable for time series modeling, a one-dimensional convolution network is built to predict multiple soil attributes, and it is named Multi_CNN_1D. The first layer of a one-dimensional convolutional network is a one-dimensional convolutional layer with 64 filters. ReLU is used as the activation function and the convolution kernel weight is normalized to train the network more stably. After that, the BN layer is added, and the maximum pooling layer is used for the down-sampling operation. The third layer is also the convolutional layer with 128 filters. Then the residual module is constructed, which includes two dilated convolutional layers with dilated rate of 2 and 4, and a one-dimensional convolutional layer with exactly the same parameters. Then the fully connected layer connects all the outputs of the previous layer to all the inputs of the next layer and performs information integration. The spectrogram is trained by using a three-layer two-dimensional convolutional layer and adding a pooling layer in the middle, which can reduce the parameter dimension and prevent network overfitting. Then, the feature data type extracted by the dual-stream CNN is converted through the Flatten layer, and finally the prediction results of the three attributes of the soil are output through the three-branch fully connected layer. The network structure is shown in Figure 5, and the specific parameters of the model are shown in Table 2.

3. Results and Discussion

The regression fit and accuracy of the network model to the predicted samples are the most important aspects to measure the performance of the model, reflecting the model’s ability to predict unknown samples after training. In order to verify the effectiveness of the multi-task network proposed in this paper, the performance of the model was investigated from the two aspects of regression fitting degree and prediction accuracy. The most unified and objective evaluation criteria were adopted, including determination coefficient ( R 2 ), modeling root mean square error ( R M S E C ), prediction root mean square error ( R M S E P ) and prediction relative analysis error ( R P D ). R 2 reflects the closeness between the measured value and the predicted value. R M S E C and R M S E P respectively reflect the degree of deviation between the actual measured value and the predicted value in the prediction of the training set and the test set. R P D reflects the predictive power of the model built, The larger the R 2 and R P D obtained at the end, the smaller the R M S E , indicating that the performance of the prediction model is better, and vice versa.

3.1. Comparison of Pretreatment Methods

Early experiments proved that a suitable preprocessing method can make the model iterate at a faster convergence rate and improve computational efficiency. In this paper, three single methods of the Savitzky-Golay smoothing, multivariate scattering correction and centralization and four combinations of three methods are used to process the spectrum sequence. Due to the complexity of the spectrum data, the processed spectrum is more messy. In order to see the effect comparison more intuitively, three representative samples are selected in the Small dataset for drawing, as shown in Figure 6. Among them, Figure 6a,b are the spectra of S-G smoothing combined with MSC and centralization, Figure 6c is the spectrum of MSC and centralization, and Figure 6d is the spectrum of S-G smoothing, MSC and centralization.
After that, the Small dataset and LUCAS dataset processed by 3 single preprocessing methods and 4 combinations are input into the Multi_CNN_1D network suitable for time series data modeling for training, and the laboratory measured soil attribute value as true value labels in the network, as the training sample according to the batch sent to the input of network framework. Finally, the model test set evaluation effect diagram in Figure 7 is obtained, where the vertical axis represents R 2 . The closer to 1, the higher the accuracy of prediction. It can be seen from Figure 7 that the three soil attributes in the two data sets have the highest R 2 after S-G smoothing, MSC and centralization combined processing, which means that the gap between the measured value and the predicted value is the smallest, which proves that the S-G smoothing, MSC and centralized combination methods are the best preprocessing methods for processing the spectral sequences of the two data sets.

3.2. Experimental Comparison on Small Dataset

Under the condition of small sample of spectral data and selection of the best preprocessing method, the proposed Multi_CNN_1D model is compared with other methods. In this paper, linear regression method is selected as the model comparison method, including Partial Least Square Regression (PLSR), Random Forest Regression (RFR) and Gradient Boosting Decision Tree (GBR). Considering the problem of too few samples in the small data set, the verification set is not split. Instead, the data set is randomly divided into 125 samples in the training set and 55 samples in the test set according to the ratio of 7:3, and then the Multi_CNN_1D model is input for training. Finally, the prediction sequence corresponding to each soil attribute is obtained. By calculating the deviation between the real value of the training set and the corresponding predicted value of the three soil properties TC, TN and AN, and the real value of the test set and the corresponding predicted value, the training set fitting accuracy of the model and the prediction accuracy of the test set are given quantitatively. Finally, the degree of the advantages and disadvantages of each method is evaluated. The results are shown in Table 3. Where R C 2 and R M S E C are the results of the training set, R P 2 , R M S E P and R P D are the results of the test set.
Figure 8 is a box plot of model evaluations obtained by performing multiple experiments on the test set using different regression models, where the solid line in the middle represents the median of the model evaluation value R 2 . It can be seen from the Table 3 and Figure 8 that the PLSR method performs best in the linear regression method, and the evaluation results of the Multi_CNN_1D model R 2 and R P D are better than other linear methods. For the same soil attribute TC, the R M S E P error of the Multi_CNN_1D network were 0.24, 0.74 and 0.99 lower than those of traditional PLSR, RFR and GBR networks. Compared with TN, the R M S E P error is reduced by 0.02, 0.11,0.06. Although R P 2 and R P D of AN were 0.87 and 2.76, which were slightly lower than those of the other two attributes due to the large difference in AN attribute value, the R M S E P error of AN was reduced by 2.64, 5.22 and 1.21 compared with other methods. This indicates that the accuracy of the deep learning method in predicting soil properties based on vis-NIR spectral data is better than the general linear regression method. In addition, when inputting Small dataset samples into the Multi_CNN network for modeling training, the network will have over-fitting problems. This is because the small data set has too few samples and the wavelength range is short, which is more suitable for simple one-dimensional convolutional networks, not multi-input networks. Therefore, for Small dataset, the prediction result of the single-input network is better than that of the dual-stream network.
Figure 9 shows the comparison of the predicted TC, TN, and AN content values of the test set soil samples with the actual values obtained by laboratory method analysis. It can be clearly seen that the scattered points are closely and evenly distributed on both sides of the regression line, and the predicted values of the three soil properties are positively correlated with the actual values, which proves that the multi-task network proposed in this paper is effective in predicting soil properties with a single input.

3.3. Experimental Comparison on LUCAS Dataset

The sampling of the LUCAS data set spans the European continent, and the soil samples are very diverse. In this paper, the 19,036 samples of the LUCAS data set are shuffled and divided into training set, validation set and test set according to the ratio of 6:2:2. The number of samples in the training set is 11,420, and the number of samples in the validation set and test set is 3808. Figure 10 shows the scatter plot of the predicted and actual values of the soil samples on the test set using the Multi_CNN model. It can be seen that the scatter values of the three attributes of OC, N, and Clay are evenly distributed on both sides of the regression line. The results show that for large samples of soil spectral data, the proposed Multi_CNN model can effectively extract the characteristic information of soil vis-NIR spectral data. It has high regression fitting and regression accuracy for training samples, and has better learning ability, and can achieve maximum training through existing data, and at the same time accurately approximate the actual measured value of the training sample.
In order to more intuitively reflect the effectiveness of the proposed network model, based on the LUCAS data set, this paper compares the two proposed models with the existing advanced models, including the CNN and multi-task CNN models proposed by Padarian [20] and Ndikumana [21] LSTM model and traditional PLSR model. Table 4 shows the comparison of the evaluation results of each model.
The results show that the R P 2 of the proposed Multi_CNN and LSTM network [21] in predicting soil N attributes is 0.91, but R M S E P is relatively reduced by 0.06, and the results are better than the LSTM network [21] in predicting OC and Clay. Compared with CNN_multi [20], the prediction results R P 2 and R M S E P of the three attributes of OC, N and Clay proposed in this paper are improved a lot. This is because this paper uses spectral data as time series data to learn the short-term and long-term dependence of sample data, and fuse one-dimensional convolution and two-dimensional convolution makes the feature fit better and the prediction accuracy more accurate. In addition, the prediction effect of the three attributes of the Multi_CNN network is also higher than that of the proposed single-input Multi_CNN_1D network. Therefore, the self-adaptive Multi_CNN network built in this paper can obtain better prediction results for different scale data sets.
In order to further verify the effectiveness of the proposed algorithm, this paper selects Qingdao soil spectral data measured in different periods, which contains 500 data samples, and the selected soil attributes are also TC, TN, and AN. The small samples are preprocessed and input into the adaptive network Multi_CNN. Finally, the modulus evaluation parameters R P 2 of the three attributes are 0.91, 0.98 and 0.95, and the R M S E P is 0.17, 0.02, and 2.21. The three soil attributes are shown in the Figure 11. A scatter plot of the predicted value and the actual value of the attribute obtained by laboratory method analysis. It can be seen that the scattered points are evenly distributed on both sides of the regression line, which proves that the Multi_CNN network has high predictive ability and generalization ability.

4. Conclusions

This paper proposes a new intelligent network architecture for simultaneous soil multi-attribute prediction in the same task network. The proposed framework is based on soil vis-NIR spectral signals, and a dual-stream convolutional network is built to predict various characteristics of soil. The spectral signal was processed by the combination of pretreatment and the conversion of the original data to the spectral map, which made the soil characteristic information extracted by the network more detailed. In addition, this paper discusses the predictive ability of soil data set networks based on different scales. Due to the small number of samples and short wavelength range of the Small dataset, it is more suitable for one-dimensional convolution input, but not for the complex network structure with multiple inputs and outputs. However, for the large-scale LUCAS dataset, the multi-input and multi-output network significantly improves the prediction accuracy, and the results are better than the existing methods. This paper fully proves the feasibility and accuracy of multi-task network in soil attribute prediction.

Author Contributions

The manuscript was written through contributions of all authors, and all authors contributed equally. Conceptualization, R.L. and B.Y.; methodology, R.L. and B.Y.; validation, Y.C.; visualization, R.L. and Z.D.; supervision, Y.C. and Z.D.; writing—original draft, R.L.; writing—review and editing, R.L. and B.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key R&D projects of Shandong Province (2019JMRH0109) and the National Natural Science Foundation of China (61972367).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Basic information on samples from S1 to S55.
Table A1. Basic information on samples from S1 to S55.
Soil SampleTC (g·kg−1)TN (g·kg−1)AN (mg·kg−1)MinimumMaximumMeanMedian
S12.900.33325.2055.1632.1837.02
S23.990.5339.54.8952.2630.4335.04
S32.290.2720.65.7457.3834.2839.98
S45.040.5464.64.6849.2328.3832.13
S54.640.5390.45.2052.1230.6535.17
S63.180.3893.25.0155.6832.5037.36
S73.070.4533.24.9354.2931.3435.94
S83.100.4252.75.0256.5932.8437.77
S93.500.5034.74.5751.5829.4533.50
S101.900.21184.9356.0032.7638.12
S112.940.3231.34.9050.6829.3433.42
S123.150.3443.45.3853.9031.8836.57
S133.160.4834.64.9153.6930.9535.30
S143.410.4030.34.9955.5931.9536.30
S154.370.4738.46.3158.3835.3741.19
S162.980.3335.34.9654.0831.0035.21
S172.730.3533.85.1656.8332.9237.85
S183.160.4437.84.7051.1829.4933.71
S193.700.4642.64.5952.6029.6433.42
S203.140.40365.0154.1631.4135.85
S214.660.6250.35.9354.0732.4437.60
S222.830.4332.65.1256.0032.8738.00
S233.610.4438.55.0750.5530.2035.48
S243.840.5643.64.8751.3030.0534.71
S252.790.4327.74.9953.5931.6036.65
S263.370.4335.55.1352.9931.3936.52
S273.030.4735.34.6549.6628.8833.18
S283.960.5238.75.0556.3632.6937.41
S293.000.4732.35.0652.8531.4936.68
S303.580.5660.64.8753.4031.1935.86
S314.000.5346.44.8250.5629.7334.56
S324.010.57475.1855.0032.5737.75
S332.960.34334.9856.3133.3038.50
S343.560.4738.85.0255.8232.8137.87
S352.960.3624.74.9355.5532.6037.37
S364.070.5535.44.8254.9332.0136.74
S373.510.4230.74.7455.5232.2836.73
S382.990.4125.54.8554.9132.4237.56
S393.120.3829.45.2756.8833.9039.23
S403.960.5438.24.9353.4331.3836.14
S412.740.3319.84.7655.7732.5337.22
S423.430.4725.74.5952.4030.6835.32
S433.640.48334.5551.3330.0434.41
S443.520.5432.24.6252.0830.5735.03
S452.760.3123.24.5753.4930.9835.27
S464.300.4840.94.4249.9829.3733.67
S473.110.4028.74.6754.1032.0437.17
S483.630.3932.14.7450.6930.1734.77
S492.050.2917.24.5352.2030.9235.76
S503.590.4034.14.4551.9330.5935.36
S513.200.3832.24.3950.6529.5233.69
S524.070.47475.7157.3535.3941.25
S532.800.3330.94.5453.8831.3335.79
S543.210.3927.64.2951.9730.2834.58
S553.390.4734.64.3947.4328.1032.49
Table A2. Basic information on samples from S56 to S105.
Table A2. Basic information on samples from S56 to S105.
Soil SampleTC (g·kg−1)TN (g·kg−1)AN (mg·kg−1)MinimumMaximumMeanMedian
S563.200.4036.84.2551.3529.9234.34
S574.590.5945.34.3352.6731.0635.93
S582.790.4330.44.4956.2132.8237.72
S593.180.3630.24.6556.4533.2738.14
S602.760.3221.44.1851.3929.7233.83
S615.490.5780.64.6541.9425.2429.81
S625.570.6164.85.2445.9727.9933.04
S634.270.5247.44.6536.7322.3326.22
S645.150.6280.24.5040.3324.3428.79
S654.540.57745.0641.8825.5630.03
S665.080.59914.8342.4126.0331.08
S674.560.5673.14.6641.2325.1229.94
S684.580.5774.64.5040.3624.5029.05
S694.680.5267.74.8843.7226.5131.30
S704.840.5968.65.0443.5626.7731.79
S714.250.4861.94.9540.6824.7529.39
S724.310.5155.74.7239.2623.6927.91
S734.730.60834.8041.6925.3330.10
S744.420.5159.64.9841.9625.5130.18
S754.380.5260.94.9443.0826.1430.83
S765.430.6992.94.6542.1825.5230.10
S775.300.6585.75.0141.9725.9630.84
S784.150.53584.7741.1524.8129.42
S794.740.6167.45.2242.8826.2631.20
S804.480.5260.34.9142.5225.8330.47
S814.310.5451.44.9339.9424.1728.33
S824.140.51695.0644.3927.0232.22
S834.290.5379.34.9542.1025.9830.92
S844.940.5574.54.6941.6225.0129.52
S855.130.6581.54.4141.2224.6929.26
S864.480.5969.74.9242.5725.9130.68
S874.910.5465.75.1143.8627.0232.20
S884.490.5362.44.9143.4726.3831.34
S894.460.5676.14.8240.4024.6729.24
S904.920.5461.44.4740.7624.3828.62
S914.650.5366.84.7438.8623.6427.82
S925.660.5960.64.5340.8624.4928.84
S934.430.4439.44.5038.4923.0427.07
S944.220.4845.14.2837.1422.2226.10
S954.320.5061.64.7841.8525.3229.81
S965.210.5860.64.7742.9825.8730.53
S974.990.5549.74.3536.5822.0125.87
S984.390.49454.4638.8823.1827.19
S994.460.5253.34.6439.5023.9828.44
S1004.790.5449.84.5338.3222.8526.76
S1014.960.5454.54.4339.7823.6727.81
S1024.480.5356.64.4540.2724.0328.36
S1035.030.5548.84.4838.6923.2627.34
S1044.800.5847.74.4038.7923.1127.11
S1054.380.6771.84.6641.5224.9729.43
Table A3. Basic information on samples from S106 to S150.
Table A3. Basic information on samples from S106 to S150.
Soil SampleTC (g·kg−1)TN (g·kg−1)AN (mg·kg−1)MinimumMaximumMeanMedian
S1064.060.7266.84.7643.4326.1330.94
S1074.690.58644.7643.2425.9630.51
S1084.220.5159.14.5641.5724.9129.33
S1094.430.5263.34.3140.0924.0228.22
S1104.670.60714.5741.6425.2329.78
S1114.180.4656.64.5939.2723.8928.09
S1124.550.5569.44.5742.1325.2229.76
S1134.150.5361.74.4240.9324.8229.41
S1144.710.5249.84.6940.6424.7929.36
S1154.610.5569.84.4139.3223.7727.82
S1164.650.4956.94.1238.1722.4926.25
S1174.580.5655.53.8938.8222.6626.40
S1185.140.6067.94.5546.3927.6532.65
S1194.840.5350.73.9038.1022.2325.98
S1205.600.6495.74.8044.1927.4632.81
S12110.121.371084.2744.0025.3229.45
S12211.271.541204.3444.7725.9130.04
S12311.701.611304.5847.1027.5632.23
S12412.571.551314.1841.9423.8427.22
S1257.311.0280.74.1345.0025.6829.85
S1267.261.0096.24.3045.7426.2830.81
S12712.271.581254.3140.9623.8027.58
S1289.521.2290.74.1043.5424.9628.90
S12912.641.611304.0841.2723.8227.47
S13012.521.531364.7444.8226.3230.68
S13110.801.251024.4142.9624.8028.52
S13213.401.551154.4841.4424.1027.82
S13312.301.571154.6849.1228.7033.51
S13412.471.631224.3042.5424.5228.47
S1357.211.0169.84.1045.7726.0830.46
S13612.691.631184.7348.4328.3433.07
S1376.010.8261.84.0548.3027.2131.74
S13812.311.561064.4544.6226.2130.81
S1394.200.7254.54.2451.1429.1634.03
S14012.061.371044.5144.2225.7629.88
S1418.401.1383.54.1745.2025.7029.72
S14210.991.491094.3743.8525.4929.86
S1429.851.2997.34.4042.4424.6428.55
S14411.241.501034.3247.0926.6830.71
S14510.681.491074.7549.0528.4833.19
S14611.981.571214.6946.7727.6032.46
S1479.941.321064.5349.5528.6633.51
S14812.471.641164.6046.2626.9331.34
S1499.281.2796.74.0246.3926.2630.65
S15011.781.451604.2043.7125.0028.77
Table A4. Basic information on samples from S151 to S180.
Table A4. Basic information on samples from S151 to S180.
Soil SampleTC (g·kg−1)TN (g·kg−1)AN (mg·kg−1)MinimumMaximumMeanMedian
S1518.281.0379.74.1845.4325.9530.16
S15212.771.591224.5144.9425.9830.10
S15311.611.611184.4042.2624.4328.31
S15411.931.611164.2242.2024.0727.85
S15512.941.601144.2843.3124.9528.83
S15611.101.471334.3346.0826.1530.20
S15712.941.741364.5647.1127.3231.94
S15812.511.651254.4243.7925.1128.89
S1595.180.7861.14.3648.2627.4731.84
S16010.611.501094.0345.5525.9230.24
S16112.271.651254.3742.8524.8628.93
S16211.991.511114.2842.7624.6328.64
S16312.241.601244.3442.7924.6728.52
S16412.511.721224.3745.0026.0430.34
S1659.951.3597.74.3744.0225.2229.39
S16610.681.461124.5247.4227.2031.54
S16711.121.511134.1742.8624.4528.27
S1689.591.3299.44.3145.0425.9030.39
S16910.031.411004.1141.2923.7127.46
S1708.051.1379.84.1846.0926.3930.67
S1713.790.7451.23.9749.4627.7132.25
S17211.081.521164.5145.3326.0430.09
S1739.221.1894.14.0242.3324.2328.25
S1747.841.0379.34.3749.0927.9532.50
S17512.191.611164.4143.3724.9328.97
S17611.761.601174.6248.2128.0532.87
S17711.701.521164.2539.8823.1226.70
S1788.561.1785.84.2445.8926.5031.09
S17912.191.661294.6446.8027.2331.75
S18011.261.611164.4144.5725.5429.50

References

  1. Yan, L.; Escobar, M.S.; Kaneko, H.; Funatsu, K. Detection of Nonlinearity in Soil Property Prediction Models Based on Near-infrared Spectroscopy. Chemom. Intell. Lab. Syst. 2017, 167, 139–151. [Google Scholar] [CrossRef]
  2. Schimann, H.; Joffre, R.; Roggy, J.C.; Lensi, R.; Domenach, A.M. Evaluation of the recovery of microbial functions during soil restoration using near-infrared spectroscopy. Appl. Soil Ecol. 2007, 37, 223–232. [Google Scholar] [CrossRef]
  3. Zhang, Y.; Li, M.; Zheng, L.; Zhao, Y.; Pei, X. Soil nitrogen content forecasting based on real-time NIR spectroscopy. Comput. Electron. Agric. 2016, 124, 29–36. [Google Scholar] [CrossRef]
  4. Nawar, S.; Buddenbaum, H.; Hill, J.; Kozak, J.; Mouazen, A.M. Estimating the soil clay content and organic matter by means of different calibration methods of vis-NIR diffuse reflectance spectroscopy. Soil Tillage Res. 2016, 155, 510–522. [Google Scholar] [CrossRef] [Green Version]
  5. Nawar, S.; Mouazen, A.M. Comparison between Random Forests, Artificial Neural Networks and Gradient Boosted Machines Methods of On-Line Vis-NIR Spectroscopy Measurements of Soil Total Nitrogen and Total Carbon. Sensors 2017, 17, 2428. [Google Scholar] [CrossRef]
  6. Vohland, M.; Besold, J.; Hill, J.; Fründ, H.C. Comparing different multivariate calibration methods for the determination of soil organic carbon pools with visible to near infrared spectroscopy. Geoderma 2011, 166, 198–205. [Google Scholar] [CrossRef]
  7. Rossel, R.A.V.; Walvoort, D.J.J.; McBratney, A.B.; Janik, L.J.; Skjemstad, J.O. Visible, near infrared, mid infrared or combined diffuse reflectance spectroscopy for simultaneous assessment of various soil properties. Geoderma 2006, 131, 59–75. [Google Scholar] [CrossRef]
  8. Stevens, A.; Nocita, M.; Tóth, G.; Montanarella, L.; Van, W.B. Prediction of Soil Organic Carbon at the European Scale by Visible and Near InfraRed Reflectance Spectroscopy. PLoS ONE 2013, 8, e66409. [Google Scholar] [CrossRef] [PubMed]
  9. Vasques, G.M.; Grunwald, S.; Sickman, J.O. Modeling of Soil Organic Carbon Fractions Using Visible–Near-Infrared Spectroscopy. Soil Sci. Soc. Am. J. 2009, 73, 176–184. [Google Scholar] [CrossRef] [Green Version]
  10. Lin, Z.D.; Wang, Y.B.; Wang, R.J.; Wang, L.S.; Lu, C.P.; Zhang, Z.Y.; Song, L.T.; Liu, Y. Improvements of Vis-NIRS Model in The Prediction of Soil Organic Matter Content Using Wavelength Optimization. J. Appl. Spectrosc. 2017, 84, 529–534. [Google Scholar] [CrossRef]
  11. Liu, S.; Shen, H.; Chen, S.; Zhao, X.; Biswas, A.; Jia, X.; Shi, Z.; Fang, J. Estimating forest soil organic carbon content using vis-NIR spectroscopy: Implications for large-scale soil carbon spectroscopic assessment. Geoderma 2019, 348, 37–44. [Google Scholar] [CrossRef]
  12. Chang, C.W.; Laird, D.; Mausbach, M.; Hurburgh, C. Near-Infrared Reflectance Spectroscopy–Principal Components Regression Analyses of Soil Properties. Soil Sci. Soc. Am. J. 2001, 65, 480–490. [Google Scholar] [CrossRef] [Green Version]
  13. Mccarty, G.W.; Reeves, J.B.; Reeves, V.B.; Follett, R.F.; Kimble, J.M. Mid-Infrared and Near-Infrared Diffuse Reflectance Spectroscopy for Soil Carbon Measurement. Soil Sci. Soc. Am. J. 2002, 66, 640–646. [Google Scholar] [CrossRef]
  14. Rossel, R.A.V.; Behrens, T. Using data mining to model and interpret soil diffuse reflectance spectra. Geoderma 2010, 158, 46–54. [Google Scholar] [CrossRef]
  15. Hosseini, M.; Agereh, S.R.; Khaledian, Y.; Zoghalchali, H.J.; Brevik, E.C.; Naeini, S.A.R.M. Comparison of multiple statistical techniques to predict soil phosphorus. Appl. Soil Ecol. 2017, 114, 123–131. [Google Scholar] [CrossRef]
  16. Morellos, A.; Pantazi, X.E.; Moshou, D.; Alexandridis, T.; Whetton, R.; Tziotzios, G.; Wiebensohn, J.; Bill, R.; Mouazen, A.M. Machine learning based prediction of soil total nitrogen, organic carbon and moisture content by using VIS-NIR spectroscopy. Biosyst. Eng. 2016, 152, 104–116. [Google Scholar] [CrossRef] [Green Version]
  17. Yang, M.; Xu, D.; Chen, S.; Li, H.; Shi, Z. Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra. Sensors 2019, 19, 263. [Google Scholar] [CrossRef] [Green Version]
  18. Veres, M.; Lacey, G.; Taylor, G.W. Deep Learning Architectures for Soil Property Prediction. In Proceedings of the 2015 12th Conference on Computer and Robot Vision, Halifax, NS, Canada, 3–5 June 2015; pp. 8–15. [Google Scholar] [CrossRef]
  19. Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv 2017, arXiv:1706.05098. [Google Scholar]
  20. Padarian, J.; Minasny, B.; McBratney, A.B. Using deep learning to predict soil properties from regional spectral data. Geoderma Reg. 2019, 16, e00198. [Google Scholar] [CrossRef]
  21. Ndikumana, E.; Minh, D.H.T.; Baghdadi, N.; Courault, D.; Hossard, L. Deep Recurrent Neural Network for Agricultural Classification using multitemporal SAR Sentinel-1 for Camargue, France. Remote Sens. 2018, 10, 1217. [Google Scholar] [CrossRef] [Green Version]
  22. Sum, S.T. Spectral Signal Correction for Multivariate Calibration. Ph.D. Thesis, University of Delaware, Newark, DE, USA, 1998. [Google Scholar]
  23. Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Figure 1. Original spectrogram of (a) Small dataset (b) LUCAS dataset.
Figure 1. Original spectrogram of (a) Small dataset (b) LUCAS dataset.
Sensors 20 06271 g001
Figure 2. The Spectra of the small dataset (ac) and LUCAS dataset (df) preprocessed by Savitzky-Golay (S-G), Multivariate Scattering Correction (MSC) and Centralization methods.
Figure 2. The Spectra of the small dataset (ac) and LUCAS dataset (df) preprocessed by Savitzky-Golay (S-G), Multivariate Scattering Correction (MSC) and Centralization methods.
Sensors 20 06271 g002
Figure 3. Spectrograms of samples of (a) Small dataset (b) LUCAS dataset. (The yellower the color represents the greater the spectral frequency density at this wavelength, the bluer the color, the lower the density.)
Figure 3. Spectrograms of samples of (a) Small dataset (b) LUCAS dataset. (The yellower the color represents the greater the spectral frequency density at this wavelength, the bluer the color, the lower the density.)
Sensors 20 06271 g003
Figure 4. The residual module of Temporal Convolutional Network (TCN).
Figure 4. The residual module of Temporal Convolutional Network (TCN).
Sensors 20 06271 g004
Figure 5. The Multi_CNN network structure.
Figure 5. The Multi_CNN network structure.
Sensors 20 06271 g005
Figure 6. Preprocessed spectra of (a) S-G+MSC (b) S-G+Centralization (c) MSC+Centralization (d) S-G+MSC+Centralization.
Figure 6. Preprocessed spectra of (a) S-G+MSC (b) S-G+Centralization (c) MSC+Centralization (d) S-G+MSC+Centralization.
Sensors 20 06271 g006
Figure 7. Comparison diagram of preprocessing effect on test set. (S-G refers to Savitzky-Golay smoothing algorithm, MSC refers to Multivariate Correction, Cent refers to Centralization methods.)
Figure 7. Comparison diagram of preprocessing effect on test set. (S-G refers to Savitzky-Golay smoothing algorithm, MSC refers to Multivariate Correction, Cent refers to Centralization methods.)
Sensors 20 06271 g007
Figure 8. R 2 comparison box diagram of predicted values of (a) Total Carbon (b) Total Nitrogen (c) Alkaline Nitrogen.
Figure 8. R 2 comparison box diagram of predicted values of (a) Total Carbon (b) Total Nitrogen (c) Alkaline Nitrogen.
Sensors 20 06271 g008
Figure 9. Actual vs Predicted values of Proposed framework (a) Total Carbon (b) Total Nitrogen (c) Alkaline Nitrogen.
Figure 9. Actual vs Predicted values of Proposed framework (a) Total Carbon (b) Total Nitrogen (c) Alkaline Nitrogen.
Sensors 20 06271 g009
Figure 10. Actual vs Predicted values of Multi_CNN model (a) Organic Carbon (b) Nitrogen (c) Clay.
Figure 10. Actual vs Predicted values of Multi_CNN model (a) Organic Carbon (b) Nitrogen (c) Clay.
Sensors 20 06271 g010
Figure 11. Actual vs. Predicted values of Proposed framework (a) Total Carbon (b) Total Nitrogen (c) Alkaline Nitrogen.
Figure 11. Actual vs. Predicted values of Proposed framework (a) Total Carbon (b) Total Nitrogen (c) Alkaline Nitrogen.
Sensors 20 06271 g011
Table 1. Basic information of soil data set.
Table 1. Basic information of soil data set.
MinimumMaximumMeanMedianSt.Dev.
Small datasetTC (g·kg−1)1.9013.406.214.623.42
TN (g·kg−1)0.211.740.800.560.46
AN (mg·kg−1)17.2160.0069.5162.1533.18
LUCAS datasetOC (g·kg−1)0.00586.8050.0020.8091.31
N (g·kg−1)0.0038.602.921.703.76
Clay (g·kg−1)0.0079.0018.8817.0013.00
Table 2. Multi_CNN network specific parameter settings.
Table 2. Multi_CNN network specific parameter settings.
LayerKernel SizeFiltersLayerKernel SizeFilters
Conv1D364Conv2D5 × 564
Maxpool1D5-Maxpool2D2 × 2-
Conv1D3128Conv2D3 × 3128
Conv1D3,64Maxpool2D2 × 2-
dr = 2
Conv1D3,64Conv2D3 × 3256
dr = 4
Conv1D364Maxpool2D2 × 2-
FC1-128
FC2-64
FC3-1
Table 3. Comparison of evaluation indexes of each model based on Small dataset.
Table 3. Comparison of evaluation indexes of each model based on Small dataset.
Attributes PLSRRFRGBRMulti_CNN_1D
TC R C 2 0.960.970.970.99
R M S E C 0.740.500.570.23
R P 2 0.890.810.730.94
R M S E P 1.041.541.790.80
R P D 3.012.301.954.23
TN R C 2 0.980.960.970.99
R M S E C 0.070.100.090.04
R P 2 0.930.820.900.95
R M S E P 0.110.200.150.09
R P D 3.802.343.204.71
AN R C 2 0.990.960.890.95
R M S E C 3.595.9811.867.36
R P 2 0.810.760.830.87
R M S E P 14.2616.8412.8311.62
R P D 2.312.042.432.76
Table 4. Comparison of model evaluations on the LUCAS dataset.
Table 4. Comparison of model evaluations on the LUCAS dataset.
Attributes Multi_CNN_1DMulti_CNNCNN [20]CNN_multi [20]LSTM [21]PLSR
OC R P 2 0.890.950.880.690.940.54
R M S E P 29.4922.6932.1431.8623.2568.12
N R P 2 0.780.910.830.600.910.55
R M S E P 1.761.091.541.591.151.73
Clay R P 2 0.720.830.700.680.800.50
R M S E P 7.175.537.557.785.958.93
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, R.; Yin, B.; Cong, Y.; Du, Z. Simultaneous Prediction of Soil Properties Using Multi_CNN Model. Sensors 2020, 20, 6271. https://doi.org/10.3390/s20216271

AMA Style

Li R, Yin B, Cong Y, Du Z. Simultaneous Prediction of Soil Properties Using Multi_CNN Model. Sensors. 2020; 20(21):6271. https://doi.org/10.3390/s20216271

Chicago/Turabian Style

Li, Ruixue, Bo Yin, Yanping Cong, and Zehua Du. 2020. "Simultaneous Prediction of Soil Properties Using Multi_CNN Model" Sensors 20, no. 21: 6271. https://doi.org/10.3390/s20216271

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop