Research on Hyperspectral Inversion of Soil Alkaline Hydrolysis Nitrogen Content and pH Value Based on DWD

Aiming at the shortcomings of traditional methods for detecting the content of Alkaline Hydrolysis Nitrogen (AHN) and pH value in soil, such as time-consuming and labor-consuming, this paper proposes a rapid quantitative inversion method based on hyperspectral analysis of AHN content and pH value. This method uses db4 discrete wavelet denoising (DWD) and wavelet denoising normalization (DWD-N) to carry out Pearson correlation analysis, and two methods, Ridge regression and Partial Least Squares Regression (PLSR), were used to compare the accuracy of hyperspectral inversion of soil AHN content and pH value. Experiments have demonstrated that in the inversion of the AHN content prediction model, Ridge regression has a good modeling effect under the DWD-N model, where R2=0.647, RMSE=7.067mg/kg. PLSR has good prediction effect under DWD-N, where R2 is the highest of 0.792, RMSE is 3.438mg/kg; in the model inversion of pH prediction, the full-band PLSR modeling effect of pH value under DWD pretreatment is the best, which modeling set and the prediction set of R2 is 0.826 and 0.875, the RMSE is 0.217 mg/kg and 0.191 mg/kg respectively.


Introduction
Soil AHN content is also called available nitrogen. It includes inorganic nitrogen and organic nitrogen directly absorbed and utilized by crops, which can be absorbed by crops in the near future. Its content depends on the content of organic matter and the amount of nitrogen fertilizer. The material is extremely unstable and easily changed by the influence of hydrothermal environment and biological activities, but its content can reflect the recent soil nitrogen supply capacity. Due to the difference of topography and region, the measurement of soil pH has an important impact on the planting of crops, and after the pH is greater than 7.5, it will appear alkaline, which is unfavorable for the growth and development of crops. The use of hyperspectral technology can quickly and accurately obtain the AHN content and pH value of the soil, thereby greatly saving manpower and material resources, and providing a new approach and method for the study of soil properties.
In the study of soil properties inversion based on hyperspectrum, how to perform spectral preprocessing, selection of spectral characteristic bands, and comparison of inversion methods is an important direction to improve the accuracy of soil attribute prediction. Di Wupengyao et al. [2] summarized the methods of spectral preprocessing, including optimizing the number of factors of partial least squares (PLS) and the first derivative, second derivative, window parameters of SG smoothing, SG smoothing filter, the wavelet function and the decomposition scale of continuous wavelet transform, multivariate scattering correction, standard normal variable, centering, maximum  [3] et al. compared the denoising effects of different wavelet bases and decomposition layers, and determined the best wavelet base db4 for selection. The 4-layer optimal wavelet decomposition has achieved good results. Luan Fuming [4] et al. used the first derivative spectrum of the logarithm of soil reflectance to separately decompose the four wavelet functions in multiple layers, and used the PLSR to establish the inversion of the soil AHN content model. Shen Congwang [5] et al. used three methods of principal component analysis regression, support vector regression and PLSR to establish the pH value and total potassium content models of purple soil and rice soil respectively, and performed hyperspectral analysis as well as comparative study of inversion accuracy. Lifei Wei et al. [6] chose the characteristic band based on different spectral index transformations and the Pearson correlation analysis method, in which the band with a correlation coefficient greater than 0.7 was taken as the characteristic band. Then, using ridge regression, nuclear ridge regression, Bayesian ridge regression and AdaBoost algorithm to establish the inversion model, it is found that the model effect established by the AdaBoost algorithm is compared with other methods, and the result is the best. Bao Qingling [7] used the random forest algorithm to model, then compare soil organic matter, and found that the combination of machine learning and wavelet decomposition can effectively improve the screening and prediction accuracy of the characteristic bands of organic matter while reducing noise.
The special topography of the Mu Us sandy land has resulted in the local desertified soil. Compared with the previous farmland, the soil water and fertilizer use efficiency is lower. Optimizing the use of fertilizers in the soil can improve the living environment of farmers and their living standards. Since there are few studies on the inversion of soil pH and AHN content in the Mu Us sandy area based on hyperspectral technology, this article intends to use Rigde and PLSR two methods to invert the local desertified soil's AHN and pH value. Provide a basis for its rapid measurement.

Material
The measurement of soil properties and other nutrient elements will be carried out in the Mengda Industrial Park (38°5'N, 109°0'E) in Wushen Banner from 2019 to 2020. The first stubble of the test field was helianthus castor, hemp and other crops. The depth of soil collected was 0-20cm and 20-40cm respectively. The main types of soil were sandy soil.

Soil sample processing
Generally, in order to avoid the influence of moisture and soil roughness, the soil is collected first. The soil of the demonstration base depends on the different crops. Collect 0-20cm and 20-40cm soil under each crop, a total of 136 sets of samples, and then put the soil in a ziplock bag and bring it back indoors, let it dry naturally, pass through a 0.850 mm sieve to screen out other impurities in the soil, and then make duplicates, one is used for the analysis and determination of chemical physical and chemical properties, the other is used for hyperspectral analysis.
The AHN measurement method is mainly alkaline hydrolysis diffusion method [8]. The steps for determining the pH value are as follows: weigh the soil and a beaker, add deionized water, stir with an 85-2a thermostatic magnetic for 1 min to fully diffuse the soil particles, let stand for 1 hour, use a pH meter to measure the supernatant and make a record after the value is stabilized.
The study used the Finnish company Specim IQ handheld hyperspectrometer to measure. Specim IQ is a complete, portable, handheld hyperspectral camera. Whiteboard calibration should be done before and after each measurement. When measuring, the spectral range is 400-1000nm, the wavelength accuracy is 3nm, the interval is 7nm, and a total of 204 wavelength points are recorded.
The soil spectral picture is mainly taken indoors. Compared with the outdoor environment, the indoor environment and the measured spectrum is relatively stable, and the signal-to-noise ratio is  [9]. Select the current environment to perform spectrum shooting in a darkroom environment. The laboratory temperature is 19°C and the humidity is 30hPa. In order to avoid external stray light interference, the laboratory uses two 125W halogen sources for non-contact measurement. The operation is as follows: The soil is placed in a petri dish with a diameter of 1cm and a diameter of 6cm. The soil sample is about 30cm away from the camera lens, and the two halogen lamps are about 15cm away from the camera. The angle of the light to the object is about 90 degrees, as shown in Figure 1. x=15cm, α=45 degrees, after each measurement, the white reference plate is used for correction. Each picture is taken 3 times, and the average value of the 3 spectra is taken as the last current soil spectral value.

Fig.1 Sample pictures collected indoors
The black and white corrected image using Specim IQ is R, and the formula is as follows: Where W is the all-white calibration image obtained by the standard whiteboard, B is the all-black calibration image of the camera, and I is the original hyperspectral image. The mode of shooting at the same time as the whiteboard is used for data collection to eliminate the problem of environmental mismatch.

Description of specimens
Although the soil photos taken by the camera ensure the stray light interference from the external environment of the experiment, there will still be an army of noise at the beginning and the end of the spectrum, which has a great influence on the accuracy of the experimental results. In order to eliminate the influence of high -frequency noise on the experimental results, we use wavelet transform, using its unique advantages of local analysis in both the time domain and frequency domain [3] to denoise the spectrum.
The quality of DWD results is mainly composed of the following parts. One is the choice of wavelet base, the other is the selection of decomposition levels, and the third is the threshold function and the estimation method, and the threshold function is the most important link. There are usually 4 threshold estimation methods.
(1) Fixed threshold, where N is the number of bands.
(2) Adaptive threshold selection based on Stein's unbiased likelihood estimation principle (SURE): for a given threshold m, get its likelihood estimation, and minimize the non-likelihood m, then the selected threshold is obtained.
(3) Heuristic threshold: It is a combination of the first two thresholds.
(4) Maximum-minimum threshold. The second and fourth threshold selection methods have more coefficients retained, while the other two coefficients are completely threshold denoising. All wavelet coefficients become zero. In summary, choose dbN wavelet basis, four layers as an example of wavelet decomposition, the heuristic threshold selection method is used to perform wavelet denoising from db2 to db6. See Table  1  Finally, the wavelet base db4 is selected for 4-layer decomposition, as shown in Figure 2(b), after wavelet denoising, the spectral curve is smoother than the original spectral curve, and the highdimensional, the non-zero variable background baseline and the interference of noise obtains the characteristic data corresponding to the analysis in the pure spectral data. Perform the normalization analysis on the spectrum of DWD data to make the spectrum data fall into a specific interval, and remove the interference of size difference and the information structure, then compare the inversion methods of soil AHN and pH content. The original signals and the corresponding signals after the noise reduction are shown in Figure 2. The wavelet denoising process is shown in Figure 3.

Regression prediction method and accuracy evaluation index
PLSR is a commonly used method of multi -linear regression model. By selecting the best principal component n, it will increase continuously. When the value 2 n R is the largest, select the first n principal components for modeling analysis, and find the MSE and the coefficient of determination R 2 . Ridge regression (Ridge), also known as Tikhonov regularization. It is different from the main transformation of linear regression is to add a regular term λ to the loss function of multiple linear regression, the derivation process is as follows: the linear model Y=XA, Y is the sample value, XA is the calculated value of the model, that is the expected value. Because the calculated value and the sample value may not be 100% consistent, there is an error B=Y-XA, in order to find the most suitable model, it is necessary to solve the smallest E, which is the square of the error, the expression is as follows: After deriving the matrix E, obtain the minimum E as the corresponding A, the corresponding A when the derivative is 0, and the simplification of A is as follows: When the dimension of A is more than that of Y, the result will be very unstable. Adding a small disturbance λI to the original least squares estimate of A, the situation where the generalized inverse cannot be obtained can be obtained. The generalized inverse makes the problem stable and can be solved, the expression is as follows: It can be seen that becomes a full-rank matrix, and a stable inverse can be obtained. In the regression prediction model, set λ between 0.001 and 100 for modeling, and select the best λ to make the prediction model function accurate Highest.
Two indicators, the coefficient of determination (R 2 ) and the root mean square relative error (RMSE) of prediction, were used to evaluate the two methods of Ridge and PLSR. The closer the R 2 is to 1, and the smaller the value of RMSE, the better the effect of the inversion model. The expressions of R 2 and RMSE are as follows:

Soil spectral curve analysis
The spectrum curve is selected by extracting the average value of the soil in a certain range in ENVI5.3. The black and white correction of the camera is shown in 2(a) after pretreatment of the soil sample. Observation shows that the soil spectrum is in visible light and in the near-infrared light range, there are relatively gentle changes. The effect of DWD is shown in Figure 2(b), and the effect of DWD-N is shown in Figure 2(c).

Correlation Analysis of Soil AHN Content, pH Value and Spectral Reflectance
The Pearson correlation analysis method was performed on the soil sample reflectivity of the original spectral information (RSI), wavelet denoising (DWD), and wavelet denoising normalized (DWD-N) with the soil AHN content and pH value respectively. The correlation coefficient is analyzed with significance of P<0.01 or P<0.05, and the characteristic band within the correlation coefficient is selected. The results are shown in Figure 4 and Figure 5. The soil AHN content and the spectral curve show the strongest negative correlation. The correlation coefficient ranges from -0.478 to -0.128 under RSI, and -0.482 to -0.228 under DWD and DWD-N, among which 590nm-610nm the band shows a great negative correlation; the soil pH value shows a negative correlation with the spectral curve. Under RSI, the correlation coefficient ranges from -0.412 to 0.137, and the strongest negative correlation is shown at 768nm, 826nm and 941nm.The DWD ranges from -0.429 to -0.169, and the strongest negative correlation is shown at 720nm and 807nm. DWD-N is between -0.194 and 0.142, and the band at 892nm has a peak, showing a great negative correlation. All the characteristic bands

Division of the sample set
The SPXY algorithm is widely used in the division of sample sets, and is better than random sampling (RS), Kennard-Stone (KS) algorithm, two-way algorithm (duplex), etc. [10] Guo et al. [11], Zhu et al. [12], have achieved good results, indicating that this method can effectively improve the accuracy of the model. In this study, AHN or pH value y was used as the dependent variable, and near-infrared spectroscopy x was used as the variable. The soil samples were divided into 102 modeling set and 34 prediction set to establish the model. The statistical values of the samples are shown in Table 2.

Conclusion
The establishment of the inversion model of soil properties is inseparable from the rapid and accurate prediction of hyperspectral technology. In order to solve the problem of noise and redundancy in hyperspectral soil information, the DWD method has achieved remarkable results. It not only effectively reduces the noise and highlights the difference in spectral information, but also increases the correlation between the soil and the AHN and pH, which is a soil hyperspectral method. The improvement of inversion accuracy provides conditions. In the inversion of Ridge and PLSR methods, the following conclusions are drawn: (1) In the inversion of soil AHN content, in general, the prediction set effect of PLSR is better than that of Ridge method, but WTD-N Later, the modeling set of the Ridge method is better, and the data is normalized to achieve good results in the accuracy of the two methods. (2) In the inversion of soil pH value, the accuracy of the feature extraction effect under the Ridge method is poor. DWD-N adopts the Ridge method in the full band relatively optimal, and the R 2 of the modeling set and prediction set are 0.792 and 0.770. RMSE is 0.235 and 0.242. Under DWD, the model inversion of PLSR is improved under RSI. The R 2 of the modeling set and prediction set are 0.826 and 0.875, and the RMSE is 0.191 and 0.217. Through feature selection, the accuracy difference under DWD-N 54 is small, and the amount of original information retained is the least and optimal, which makes the extraction of spectral features more effective. In summary, these two methods are feasible for the prediction of soil AHN content and pH value.