A New Plant Indicator (Artemisia lavandulaefolia DC.) of Mercury in Soil Developed by Fourier-Transform Near-Infrared Spectroscopy Coupled with Least Squares Support Vector Machine

A rapid indicator of mercury in soil using a plant (Artemisia lavandulaefolia DC., ALDC) commonly distributed in mercury mining area was established by fusion of Fourier-transform near-infrared (FT-NIR) spectroscopy coupled with least squares support vector machine (LS-SVM). The representative samples of ALDC (stem and leaf) were gathered from the surrounding and distant areas of the mercury mines. As a reference method, the total mercury contents in soil and ALDC samples were determined by a direct mercury analyzer incorporating high-temperature decomposition, catalytic adsorption for impurity removal, amalgamation capture, and atomic absorption spectrometry (AAS). Based on the FT-NIR data of ALDC samples, LS-SVM models were established to distinguish mercury-contaminated and ordinary soil. The results of reference analysis showed that the mercury level of the areas surrounding mercury mines (0–3 kilometers, 7.52–88.59 mg/kg) was significantly higher than that of the areas distant from mercury mines (>5 kilometers, 0–0.75 mg/kg). The LS-SVM classification model of ALDC samples was established based on the original spectra, smoothed spectra, second-derivative (D2) spectra, and standard normal transformation (SNV) spectra, respectively. The prediction accuracy of D2-LS-SVM was the highest (0.950). FT-NIR combined with LS-SVM modeling can quickly and accurately identify the contaminated ALDC. Compared with traditional methods which rely on naked eye observation of plants, this method is objective and more sensitive and applicable.


Introduction
Excessive mercury is one of the major heavy metal pollutants in the environment, which can cause great toxicity to human body [1][2][3]. A series of environmental pollution incidents have aroused great attention to mercury pollution in the world [4]. In recent years, the study of mercury pollution in soil has become one of the hotspots of environmental protection [5]. Mercury exists in soils in different forms, such as Hg 0 , Hg 2 2+ , Hg 2+ , and organic mercury. ere is also a complex interaction between different forms of mercury under specific environmental conditions; for example, mercury can be converted into highly toxic methylmercury in the soil [6]. Because of the hysteresis, easy migration, and high toxicity of mercury pollution in soil, the analysis and detection technology of mercury in soil is of great significance to soil monitoring and the effective control of mercury pollution. erefore, in recent years, the development of trace mercury extraction, enrichment, and detection methods [7] has become one of the hot research fields of analytical chemistry.
At present, most of the analytical methods for total mercury in soil involve cumbersome, time-consuming, and reagent-consuming sample pretreatment and extraction methods, such as wet digestion, dry ashing, microwave digestion, and ultrasound-assisted extraction. [8]. e quantitative detection methods include spectrophotometry [9,10], atomic spectrometry [11], chromatography [12], mass spectrometry [13], neutron activation analysis [14], and so on [15][16][17]. Tongren City (Guizhou Province, China) has many large-scale mercury deposits. Mercury deposits play an important role in the local traditional industry, but they also pose a major threat to the ecological environment. e traditional quantitative analysis method of mercury needs more complex experimental operation skills and analytical instruments. e number of samples to be routine analyzed is large, and the cost of analysis and testing is high, so it cannot be fully popularized in economically underdeveloped areas.
Indicator plants [18,19] are sensitive to pollutants and show obvious morphological changes in the presence of pollutants, but the distribution of indicator plants is greatly affected by geographical and climatic factors. In the case of long-term soil contamination, the morphological changes of most local species are often not obvious due to tolerance. erefore, the application of traditional methods of plant indicators by naked eye observation is largely limited. Fourier-transform near-infrared (FT-NIR) spectroscopy, as a fast detection method, has the advantages of convenient sample preparation, rapid analysis, and simultaneous characterization of mixtures [20,21]. Combined with chemometrics, FT-NIR has been widely used in the classification of various samples [22][23][24]. In order to screen the mercury contamination in local soil rapidly and effectively, in this work, Artemisia lavandulaefolia DC. (ALDC), a local plant commonly distributed in mercury mining areas, was investigated as a mercury indicator. FT-NIR spectroscopy was used to characterize the chemical composition changes caused by mercury in ALDC. Finally, chemometrics methods were applied to identify its mercury content level rapidly by developing classification models.

Collection of ALDC Samples and FT-NIR Spectrometry.
ALDC samples were collected including stem and leaves at the top of the plants with a length of about 20 cm. e mercury-contaminated group was collected within 3 km surrounding mercury mines (n 1 � 120), and the regular group was gathered from areas farther from mercury mines (>5 km and n 2 � 120). All ALDC samples are washed and stored in a cool, dry, and ventilated place to avoid direct sunlight to remove moisture. Each sample was crushed by a crusher and then passed through a 200-mesh sieve. Samples powders were stored with integrated packaging. Before FT-NIR analysis and Hg reference analysis, each sample was dried by an ultraviolet lamp for 10 minutes. An Antaris II Fourier-transform near-infrared spectrometer ( ermo Electron Co., USA) was used to analyze the compacted powder in a quartz sample cup under reflection mode. A PbS detector was used to record the spectrum. e measured spectral interval was 4000-10000 cm − 1 . Each sample is measured three times, and the number of scans per measurement was 32. e resolution of the instrument is 8 cm − 1 .

Reference Analysis of Mercury.
Reference analysis of mercury was performed using a DMA-80 direct mercury analyzer (Milestone, Italy). Mercury standard reserve solution (100 mg/L) was purchased from the standard sample research center of China Ministry of Environmental Protection. Mercury standard solutions (10.0 mg/L and 1.0 mg/L) were obtained by diluting the standard reserve solution with 1% (w/w) nitric acid and deionized water. e purity of oxygen was over 99.99% (v/v). e working conditions of the DMA-80 direct mercury analyzer were as follows: low pressure mercury lamp was used as light source; the wavelength was kept at 253.7 nm; drying temperature was kept at 200°C for 60 s; decomposition temperature was kept at 650°C for 90 s; the oxygen pressure was 60 psi; and the detector was silicon ultraviolet photoelectricity. For each ALDC (naturally dried in the sun) or soil sample, the weight was kept at 100 mg and accurately weighted. e standard curve was developed by using the following series of standard solutions with 0, 0.2, 0.4, 0.6, 0.8, and 1.0 mg/L.

Chemometrics Analysis.
Data preprocessing and classification modeling were both computed on MATLAB 7.0.1 (MathWorks, USA). In order to obtain representative training set and test set, the Kennard-Stone (K-S) algorithm [25] was used to divide the original spectral data. In the K-S method, first, the two samples with the greatest Euclidean distance are selected as training objects; and then the two samples with the greatest distance from the remaining samples are selected and put into the test set. e above process is repeated until one has obtained enough test objects and all the remaining objects are put into the training set. e training and test sets obtained by the K-S algorithm can cover a wide range of spectral data. e code for the K-S algorithm comes from the free TOMCAT toolbox [26]. Least square support vector machine (LS-SVM) [27] is used for classification modeling and prediction. e kernel width and normalized parameters of the Gauss function in LS-SVM model were estimated by the cross-validation method. LS-SVM was performed using LS-SVMlab 1.8 toolbox [28]. e second-order derivative (D2) [29] and standard normal transformation (SNV) [30] spectra were calculated using self-compiled MATLAB code.

Mercury Content in Samples.
According to previous research experience, there is little difference in mercury content between plants near the same sampling location (within the range of 200 meters), and considering that the rapid spectral identification method is qualitative, the mercury content of 10 batches of samples near mercury mines and 11 batches of samples far away from mercury mines was determined in this paper as a reference for the rapid spectral identification method. According to the standard curve, the linear range of the method is 0-1.0 mg/L, and the linear correlation coefficient is greater than 0.999. According to the definition of IUPAC, the detection limit was calculated to be 0.2 μg/L by 3-fold standard deviation of 11 repeated measurements of blank solution under the same conditions as the sample solution. Quantitative analysis showed that there were significant differences in mercury content between the two groups near and far away from mercury deposits, ranging from 0 to 0.75 mg/kg and 7.52-88.59 mg/kg, respectively. Figure 1 shows the original FT-NIR spectra of the two groups of samples. According to the original spectra, although the relative intensity of some peaks may be different, the absorption bands are basically similar. It is difficult to distinguish them by naked eyes, so it is necessary to use chemometrics to model and classify them. In order to further eliminate the scattering effect of sample powders and the possible baseline drift, D2 and SNV spectra were also calculated. As shown in Figure 2, the D2 spectrum can eliminate most of the baseline effects. Near 7250 cm − 1 , the group A samples (mercurypolluted) have more significant changes in spectral intensity, so the D2 spectrum has stronger peaks nearby. After SNV transformation, the intraclass spectral differences between the two groups were reduced, but the differences between the two groups were still not obvious. Chemometrics should be used for classification modeling.

Classification Modeling.
Considering the difference of sample distribution between the two groups, the K-S algorithm was used to divide the A and B group into 80 training samples and 40 test samples separately. erefore, the final training set contained 160 (80 + 80) samples and 80 (40 + 40) samples, respectively.
e LS-SVM model was constructed based on original spectrum, smoothed spectrum, D2 spectrum, and SNV spectrum. In this paper, LS-SVM used the usual Gaussian kernel function as the nonlinear transformation, so it is necessary to optimize two parameters simultaneously, namely, the kernel width (σ) of the Gaussian kernel function and the normalized parameter (c). e simplex method was used to optimize the classification error rate of 10-fold cross verification. Table 1 lists the model parameters and prediction results using different data preprocessing methods. In order to demonstrate the optimization of LS-SVM parameters, Figure 3 shows the error rates of cross validation obtained using different pairs of σ 2 and c. As shown in Figure 3, there is a platform where the error rates reach the lowest value. For a stable and accurate model, the optimal pairs of σ 2 and γ were selected to   the 80 test objects were wrongly classified. e prediction results of 80 prediction objects also showed that D2 and SNV can eliminate some scattering effects and baseline drift, and the classification accuracy reached 0.950 and 0.925, respectively.

Conclusions
In this paper, the DMA-80 direct mercury analyzer was used as a reference analysis method to study the stem and leaf samples of ALDC from the areas surrounding and far away from the mercury mine. e results showed that the total mercury contents of the two samples were significantly different. Combining FT-NIR spectroscopy and pattern recognition method, the classification model of the two groups of ALDC samples was established and the classification accuracy was satisfactory. e comparison of different data preprocessing methods showed that SNV and D2 can eliminate the effects of partial scattering and baseline drift, highlight the spectral differences between the two groups, and obtain the improved classification accuracy of 0.950 and 0.925, respectively. FT-NIR combined with the pattern recognition method can quickly and accurately identify mercury-contaminated ALDC samples. Compared with the traditional method of naked eye observation, this method is more objective and can identify the changes of plant components caused by pollution more effectively.

Data Availability
e data used to support this study are available upon request by interested readers.

Conflicts of Interest
e authors declare no conflicts of interest.