中国农业科学 ›› 2014, Vol. 47 ›› Issue (12): 2374-2383.doi: 10.3864/j.issn.0578-1752.2014.12.010

• 土壤肥料·节水灌溉·农业生态环境 • 上一篇    下一篇

基于可见-近红外光谱变量选择的土壤全氮含量估测研究

 杨梅花1, 2, 赵小敏1, 2, 3, 方倩1, 2, 谢碧裕1, 2   

  1. 1、江西农业大学农学院/教育部作物生理生态与遗传育种重点实验室,南昌 330045;
    2、中国科学院南京土壤研究所/土壤与农业可持续发展国家重点实验室,南京 210008;
    3、南昌师范高等专科学校,南昌 330103
  • 收稿日期:2013-11-24 出版日期:2014-06-15 发布日期:2014-04-23
  • 通讯作者: 赵小敏,E-mail:zhaoxm889@126.com
  • 作者简介:杨梅花,E-mail:727198509@qq.com
  • 基金资助:

    国家自然科学基金(41361039)、土壤与农业可持续发展国家重点实验室开放基金(0812201202)、江西省自然科学基金(20122BAB204012)

Study on Soil Total N Estimation by Vis-NIR Spectra with Variable Selection

 YANG  Mei-Hua-1, 2 , ZHAO  Xiao-Min-1, 2 , 3 , FANG  Qian-1, 2 , XIE  Bi-Yu-1, 2   

  1. 1、College of Agronomy, Jiangxi Agricultural University/Key Laboratory of Crop Physiological Ecology and Genetic Breeding, Ministry of Education, Nanchang 330045;
    2、Institute of Soil Science, Chinese Academy of Sciences/The State Key Laboratory of Sustainable Soil and Agricultural Development, Nanjing 210008;
    3、Nanchang Teachers College, Nanchang 330103
  • Received:2013-11-24 Online:2014-06-15 Published:2014-04-23

摘要: 【目的】变量选择是可见光-近红外光谱研究至关重要的步骤,通过分析可见光-近红外光谱不同特征的选择方法筛选出土壤全氮敏感波段,建立基于敏感波段的土壤全氮最佳预测模型,为土壤全氮的快速定量估算提供重要的理论指导依据。【方法】在红壤典型地区江西省吉安县采集代表性土壤样品120个,对可见光-近红外光谱采用主成分分析(PCA)、无信息变量消除(UVE)和无信息变量消除后结合连续投影(UVE-SPA)3种变量特征选择方法,建立基于不同变量选择的偏最小二乘回归(PLSR)模型、最小二乘-支持向量机(LS-SVM)、反向传播神经网络(BPNN)和遗传算法优化的反向传播神经网络(GA-BPNN)模型,从模型对预测集的预测精度分析不同变量选择方法对不同土壤全氮定量估算模型的差异。【结果】经UVE算法筛选后,光谱变量从200个减少至59个,其中可见光波段处10个,其余在近红外光谱的合频区和一倍频区,信息量丰富;进一步采用SPA进行变量选择,得到共线性最小的5个有效波长,分别为820、940、1 040、1 060和1 990nm;基于UVE变量选择建立的PLSR、BPNN、GA-BPNN和LS-SVM模型,经不同的土壤全氮的数据检验,预测精度最高的为LS-SVM,决定系数(R2)、均方根误差(RMSEp)和相对偏差(RPD)分别为0.7492、0.2921和1.8904;基于UVE-SPA特征选择建立的PLSR、BPNN、GA-BPNN和LS-SVM模型对预测集的验证表明,UVE-SPA提取的特征波段建立的LS-SVM建立模型预测效果最好,其建立的LS-SVM定量估算模型预测集的决定系数R2为0.7945,均方根误差RMSEp为0.2499相对偏差RPD为2.0009,模型稳定;基于PCA提取的7个主成分建立的LS-SVM、BPNN和GA-PBNN模型预测性能差,不能用于定量估算土壤全氮。对比相同的变量建立的GA-BPNN和BPNN,GA-BPNN预测性能比BPNN高。【结论】UVE-SPA变量选择方法结合LS-SVM模型能用来估算土壤中的全氮含量,同时UVE-SPA是一种有效的土壤光谱变量选择方法。

关键词: 土壤全氮 , 无信息变量消除(UVE) , 连续投影(SPA) , 偏最小二乘回归(PLSR) , 最小二乘支持向量机(LS-SVM) , 遗传算法优化的反向传播神经网络(GA-BPNN)

Abstract: 【Objective】 Variable selection or feature selection is a critical step in data analysis of visible-near infrared (Vis-NIR) spectrum research. The aim of this study was to determine the soil total nitrogen (TN) contents through building models based on absorption features of soil TN using variable selection methods combined with Vis-NIR spectroscopy, and to provide a basis for the fast estimation of the content of soil TN.【Method】Representative 120 soil samples were collected from the typical red soil area of Ji’an County, Jiangxi Province. The TN contents and the Vis-NIR were measured in the laboratory. Several variable selection methods including principal component analysis (PCA), uninformative variable elimination (UVE) and UVE coupled with successive projections algorithm (SPA) were employed for Vis/NIR data, the models of partial least squares regression (PLSR) with leave-one-out cross-validation, least squares-support vector machine (LS-SVM), the back-propagation neural network (BPNN) and BPNN with optimized threshold and weight using genetic algorithm (GA-BPNN) combined different variable selection methods were calibrated and validated using independent data sets. 【Result】 The results showed that the application of UVE to the wavelengths reduced wavelengths from original 200 to 59 of which located in visible range and the rest located in the region of overtones and combinations in near infrared range. The application of SPA to the wavelengths preselected by UVE further reduced the wavelengths to only 5 for TN, including 820, 940, 1 040, 1 060 and 1 990 nm. LS-SVM models achieved competitive prediction performance compared with PLSR, BPNN and GA-PBNN based on 59 wavelengths with coef?cient of determination (R2) of 0.7492 and root mean square error (RMSEp) of 0.2921 and residual prediction deviation (RPD) of 1.8904 for soil TN. Furthermore, LS-SVM models achieved excellent prediction performance with PLSR, BPNN and GA-PBNN based on 5 wavelengths using variable selection UVE-SPA, with coef?cient of determination (R2) of 0.7945 and root mean square error (RMSEp) of 0.2499 and residual prediction deviation (RPD) of 2.0009 for soil total N. Nevertheless, LS-SVM, BPNN and GA-PBNN models based on 7 principal components was invalid.【Conclusion】 The overall results indicated that SPA was a powerful way for the variable selection, and Vis-NIR spectroscopy incorporated to SPA-LS-SVM was successful for the accurate determination of soil TN.

Key words: soil total nitrogen , uninformative variable elimination (UVE) , successive projections algorithm (SPA) , PLSR , LS-SVM , GA-BPNN the selection