Hyperspectral remote sensing for foliar nutrient detection in forestry: A near-infrared perspective

https://doi.org/10.1016/j.rsase.2021.100676Get rights and content

Highlights

  • Near-infrared spectroscopy provides a rapid approach for foliar nutrient detection.

  • Spectral noise removal improves detection and prediction accuracies.

  • Selection of statistical models and pre-processing tools are application specific.

  • Sample sizes, the number of latent variables, and leaf water content were the main factors determining successful outcomes.

  • Effects of epicuticle wax, and trichomes constrain leaf optical properties.

Abstract

Over the past decade, hyperspectral remote sensing as a rapid, non-destructive technique for vegetation assessment has contribution considerably to the efficacy of remote sensing. This review paper provides a synopsis of the application of hyperspectral remote sensing for detecting foliar nutrients. We also concentrate on spectral noise, pre-processing data methods and statistical models using near-infrared technology. We used an integrative approach to critically analyze a decade (2010–2020) of research. The primary outcomes suggest that near-infrared technology provides reasonably accurate results by utilizing strategically selected pre-processing data methods and statistical models that reduce spectral noise. Sample sizes, latent variables, and leaf water content were the main factors determining successful outcomes. The constraints presented motivate future research to understand the effects of epicuticle wax and trichomes on leaf optical properties.

Introduction

The intensive management of forest plantations has significantly evolved to meet the global supply and demand for forest products (Rubilar et al., 2018). Advances in our understanding of silviculture practices combined with the progression of information technologies have revolutionized the forestry industry (Rubilar et al., 2018). Trees are given a strategic supply of nutrients at the nursery level to build up nutrient reserves for subsequent in-field planting for maximizing rooting and shoot production (Timmer, 1997). However, traditional nutrient assessment methods become insufficient to meet high demands when considering large sample sizes. Operationally, the intensification of labor to meet this demand further exacerbates costs, deeming the process forfeited (Payn et al., 1999). An alternative is to reduce sample sizes and extrapolate nutrient assessments which saves initial costs. However, not all trees are homogeneous; hence nutrient concentrations are irregular. An ineffective nutrient regime could leave plants exposed to the intrusion of pests and diseases, and overfertilization leading to toxicity (Payn et al., 1999).

Nutrient deficiencies plague plant production, significantly reducing industrial plantation output and productivity. Accurately diagnosing nutrient deficiencies at early stages of growth will substantially increase in-field survival on highly valued plantations (Mee et al., 2017). The timely assessment of plant nutrient status during the early stages of growth is critical for maintaining optimal nutrient levels to maximize shoot production and rooting (Turner and Lambert, 2017). Furthermore, 'hidden' nutrient deficiencies cannot be visually observed and interpreted with a human eye alone which could flaw diagnosis, disrupting remedial action for affected plants (Mee et al., 2017). The current advances in analytical techniques make the detection of hidden nutrient deficiencies more possible. The introduction of remotely sensed data, specifically near-infrared spectroscopy (NIRS), can provide almost instantaneous results for large-scale precision silviculture practice (Rubilar et al., 2018; Watt et al., 2019). NIRS is a type of high-energy vibrational spectroscopy which sensors in the wavelength range of 750 nm–2500 nm (Pasquini, 2018). However, the impact of spectral noise in highly dimensional data continues to implicate the development of efficient nutrient assessments. A recent review paper by Watt et al. (2019) concludes that early research has demonstrated the potential of remotely sensed data, particularly the NIR region, for diagnosing nutrient deficiencies; however, little research has used these models for this purpose.

Three main objectives outline this review. The three main objectives provide the reader with (1) hyperspectral remote sensing (NIRS) and its relation to plant physiological traits, (2) the impact of spectral noise in high dimensional data, and (3) strategic data pre-processing and statistical modeling techniques. Also, we present variable selection methods and accuracy assessment used by most studies in this review. To accomplish the objectives in this review, we critically examine a decade of research that used hyperspectral remote sensing (NIRS) to detect foliar nutrient deficiencies in a forestry environment. We unpack peer-reviewed scientific research articles between 2010 and 2020 using google scholars advanced search configurations in the 'incognito mode. Using the 'incognito mode' prevents skewing the data to previous searched articles outside the scope of this article. We used the following search configurations: (1) ‘with all the words’ = "near-infrared" & "forestry" & “tree” & "foliar" & "spectroscopy" & "nutrient" and "deficiency" & "remote sensing"; (2) ‘where my words occur’ = ‘anywhere in the article’; and (3) ‘return articles dated between’ = ‘2010’-‘2020’. The search results gathered approximately 61 results of journal articles on March 31, 2020. The structure of this review takes an integrative form similar to that of Varhola et al. (2010).

In the 1700s, Pierre Bouguer coined the Beer-Lambert law, which provided researchers with a physiological basis to understand the physical link between chemical compounds and electromagnetic radiation. In the 1800s, William Herschel, a musician, and amateur astronomer discovered "infrared radiation", which laid the foundation for modern NIRS (Ring, 2000). William Herschel studied the relationship between visible and invisible light rays and found simple ways of examining absorption and reflection (Ring, 2000). This relationship established the way scientists experimented with infrared radiation to discover a physical link to chemical compounds in the late 1900s (Curran, 1989; Dixit and Ram, 1985; Elvidge, 1990; Peterson et al., 1988; Sasaki et al., 1984; Wessman et al., 1989; Weyer, 1985). More specifically, the works of Karl Norris and Lois Weyer played a crucial role in developing a NIRS application for organic (molecular) sampling (Weyer, 1985; Williams and Norris, 1987).

Karl Norris and Lois Weyer explored detecting organic compounds in a molecule during energy transitions (thermal radiation). These energy transitions occur during molecular bonding from the electrostatic force of attraction between oppositely charged ions (Weyer, 1985; Williams and Norris, 1987). The authors used the magnitude of these energy transitions (excitation) to distinguish between different organic compounds known as vibrational states. Weyer (1985) described three bond vibrations, namely: (1) carbon-hydrogen (C–H); (2) nitrogen-hydrogen (N–H); and (3) oxygen-hydrogen (O–H) (Table 1). For example, the excitation of the C–H bond links to aliphatic, aromatic, olefins, and oxygenated compounds. Table 1 provides a list of chemical compounds associated with each bond vibration. Researchers then aimed to understand which regions of the electromagnetic spectrum had known absorption features for detection of specific chemical compounds, for example, nitrogen (N), phosphorous (P), and potassium (K) (Cael et al., 1975; Curran, 1989; Elvidge, 1990; Sasaki et al., 1984). Their findings consolidate many studies using NIRS today. The introduction of NIR spectrometer devices revolutionized many commercial industries, scientific institutions, and organizations, mainly for its capability of assessing large quantities of data rapidly (Weyer, 1985).

Significant research by Sasaki et al. (1984) developed methods for uniquely determining spectral curves to estimate xylene isomers and dyes in the visible and infrared absorption regions, respectively. Dixit and Ram (1985) discovered that derivatives enhanced smaller peaks encapsulated by more massive peaks and the separation of overlapping wavebands. Huete (1986) suggested using a factor-analytical inversion model, which encouraged the decomposition of spectral mixtures into Eigenspectral and Eigenvector matrices. Peterson et al. (1988); Card et al. (1988), and Curran (1989) similarly investigated the use of absorption wavebands in the visible and near-infrared region to predict foliar chemical concentrations in plant material.

Peterson et al. (1988) investigated the use of remote sensing for estimating biochemical content of forest leaves and canopies using a PerkinElmer Model 360 laboratory spectrophotometer (400 nm–2400 nm) and an airborne imaging spectrometer (AIS) satellite image (1100 nm–2400 nm) on three heterogeneous sites in Alaska, Wisconsin, and Oregon in North America. The authors found significant correlations with the shortwave infrared region and biochemical content, specifically N and lignin, produced standard prediction errors comparable to wet chemical laboratory assessments. Similarly, Card et al. (1988) predicted leaf chemistry using visible and near-infrared reflectance spectroscopy of dry, ground leaf material from deciduous and conifer tree species in Alaska, Wisconsin, and California in North America. The authors analyzed seven chemical compounds (sugar, starch, protein, cellulose, total chlorophyll (Chl), lignin, and total N using reflectance spectra acquired from a PerkinElmer Model 330 laboratory spectrophotometer (400 nm–2446 nm) and stepwise regression. The authors found the visible and near-infrared regions to have high correlations with the chemical compounds analysed in their study using a stepwise regression. However, insufficient sample sizes did not allow prediction for all chemicals, and they suggest implementing techniques for reducing instrument error.

Curran (1989) provided forty-two absorption features in the visible (380 nm–700 nm) and near-infrared (800 nm–2500 nm) spectral regions that are related to foliar chemical concentrations (lignin, cellulose, sugar, starch, and water). The authors located foliar wavebands using computer models such as stepwise multiple regression and deconvolution processes; and AIS I and II equipped with 124 NIR wavebands; airborne visible/infrared imaging spectrometer (AVIRIS) equipped with 209 visible and NIR wavebands; and high-resolution imaging spectrometer (HIRIS) equipped with 192 visible and NIR wavebands. Nonetheless, this study provided three accounts of criticism of using a stepwise regression: (1) overfitting of wavebands during modeling; (2) multi-collinearity of chemicals; and (3) waveband omissions. Subsequently, the use of more strategic portions of the electromagnetic spectrum became apparent. Hence, the strategic advancements of NIR spectrometers and various remote sensing devices are continually developing to enhance their functionality, utility, and capability.

Presently, there is a plethora of research that investigated the use of remote sensing for canopy chemistry and plant functional traits (Asner and Martin, 2009; Asner et al., 2011; Au et al., 2020; Girard et al., 2020; Knyazikhin et al., 2013; Lepine et al., 2016; Martin et al., 2018; Rodrigues et al., 2020; Shi et al., 2019; Stein et al., 2014; Ustin, 2013; van der Meer, 2018; van der Tol et al., 2019; Watt et al., 2019; Windley and Foley, 2015; Zeng et al., 2019; Zhang et al., 2020). These studies consider remote sensing an alternative to wet chemistry assessments as it effectively reduces the time for adaptive management practices (Fig. 1). Many studies used hyperspectral data as the basis for their investigations. For example, earlier studies such as Zhao et al. (2005) explored the capabilities of hyperspectral reflectance (350 nm–2500 nm) properties to determine the effects of nitrogen (N) deficiency on sorghum growth. The authors found linear correlations with reflectance ratios of R405/R715 (R2 = 0.68) and R1075/R735 (R2 = 0.64) for Leaf N and Chl concentrations, respectively.

Zhang et al. (2013) investigated the potential of visible and NIR hyperspectral imaging systems (380 nm–1030 nm) for determining N, P, K in oilseed rape leaves using partial least squares regression (PLSR) and least-squares support vector machines (LS-SVM). The authors revealed that hyperspectral imaging is a promising technique for detecting macronutrients, with both the PLSR and LS-SVM models predicting R2 accuracies above 0.70. Axelsson et al. (2013) explored the possibilities of retrieving N, P, K, calcium (Ca), magnesium (Mg), and sodium (Na) in mangroves of the Berau Delta, Indonesia, using hyperspectral data (450 nm–2490 nm). The results revealed that N could be successfully modeled with an R2 of 0.67; however, P, K, Ca, Mg, and Na revealed fewer encouraging results. Similarly, Mahajan et al. (2014) detected N, P, K, and Sulfur in wheat (Triticum aestivum L.) using hyperspectral imaging (350 nm–2500 nm) and eight vegetation indices (VIs). Their study reported lower R2s of <0.42 using VIs; however, a combination of the shortwave infrared (SWIR), NIR, and the visible region was more effective in monitoring plant nutrient status.

A later study by Osco et al. (2020) presented a framework based on a host of machine learning algorithms (k-Nearest Neighbor (kNN), Lasso Regression, Ridge Regression, SVM, Artificial Neural Network (ANN), Decision Tree (DT), and Random Forest (RF)) to predict a full range of macro-and micronutrients (N, P, K, Mg, S, Cu, Fe, Mn, and Zn) using a handheld hyperspectral spectrometer device (380 nm–1020 nm). The authors assessed the training data using Cross-Validation and Leave-One-Out and used the Relief-F metric of the algorithms for the prediction. Using a host of algorithms, their study revealed higher R2 predictions of >0.72 for all macro-and micronutrients compared to Mahajan et al. (2014) and (Zhang et al., 2013). Finally, Eshkabilov et al. (2021) successfully found optimal waveband regions between 506 nm–601 nm and 634 nm–701 nm for detecting discrete nutrient content variables (nitrate (NO3–), calcium (Ca2+), potassium (K+), soluble solid content (SSC), pH, and total Chl) using hyperspectral images (400 nm–1000 nm) of the freshly cut lettuce leaves. The authors produced R2s between 0.78 and 0.99 using (PLSR) and principal component analysis (PCA) techniques. With improvements in remote sensing technology and research knowledge, more studies have found correlations with specific regions of the electromagnetic region.

Many studies found the NIR region of the electromagnetic spectrum capable and reliable for further investigation. For example, Windley and Foley (2015) measured foliar concentrations of total N, in vitro dry matter digestibility, and available N of a multi-species dataset of New Zealand trees. The authors found NIRS robust for measuring nutritional traits with R2's ranging from 0.83 to 0.99 using modified-PLSR. Zeng et al. (2019) found the NIR spectral region resilient against soil background contamination, allowing for the robust calculation of Solar-induced Chl fluorescence (SIF). The authors estimated the fraction of total emitted near-infrared SIF (760 nm) photons that escape the canopy by combining the near-infrared reflectance of vegetation (NIRV) and the fraction of absorbed photosynthetically active radiation (fPAR) using a Soil Canopy Observation, Photochemistry, and Energy (SCOPE) model. Their NIRV based approach could explain variations in the escape ratio with an R2 of 0.91 and an RMSE of 1.48% across various simulations where canopy structure, soil brightness, and sun-sensor-canopy geometry are varied.

Au et al. (2020) used partial least squares regression to model the relationship between NIR spectra and the foliar concentration of two ecologically critical chemical traits, available N and total formylated phloroglucinol compounds, using a FOSS-NIR System 6500 (400 nm–2498 nm) of Eucalyptus leaves. However, their study proposed using different cross-validation techniques for model fitting and selection for testing the variation in large chemical and spectral datasets of 80 Eucalyptus species in eastern and southern Australia. The author's main findings were: (1) geographic location influenced the predictability of N, (2) prediction error increased when assessing samples from different locations in Australia, (3) prediction accuracy of the available N model differed little whether 300 or up to 987 calibration samples and (4) merely relying on spectral variation (assessed by Mahalanobis distance) may misinform researchers into how many reference values are required.

Furthermore, Rodrigues et al. (2020) evaluated the use of visible–near-infrared (Vis-NIR) spectroscopy for predicting the production of dry leaf mass (LDM), as well as macronutrients and micronutrients contents of soybean leaves grown under limestone-mining coproducts using an analytical spectrometer device (ASD) FieldSpec 3 spectroradiometer (350 nm–2500 nm) in Tietê, São Paulo, Brazil. As a result, the authors obtained R2p > 0.50 and RPDp > 1.50 for the variables LDM, P, K, Mg, S, and Zn using PLSR. Also, the authors found the following waveband regions 380 nm–400 nm, 500 nm–530 nm, 600 nm–690 nm, and 700 nm–750 nm important in their prediction model. The latest NIRS technologies provide forestry stakeholders with a rapid and non-destructive method to assess tree health (Cipullo et al., 2019). Subsequently, NIRS technology in large forestry nurseries can provide a significant competitive advantage.

Early research has shown that trees relocate nutrients throughout the canopy leaves as a conservation mechanism; therefore, the sampling position of acquiring a representative sample is an integral part of more accurately determining nutrient content (Gara et al., 2018). Furthermore, studies have reported inconsistent spectral values to sample on the adaxial (top) surface compared to the abaxial (bottom) surface of the same leaf (Lu and Lu, 2015; Warburton et al., 2014). For example, Warburton et al. (2014) measured relative water content, leaf water potential, and stomatal conductance in Eucalyptus grandis leaves using Thermo Scientific microPhazir NIR spectrometer (1600 nm–2400 nm) and PLSR in a controlled environment facility in Australia. The authors acquired spectral reflectance data from the adaxial and abaxial leaf surfaces and the upper and lower leaves in the stem. As a result, coefficients of determination using cross-validation were R2CV = 0.85 for relative water content, R2CV = 0.74 for leaf water potential, and R2CV = 0.80 for stomatal conductance. Similarly, Lu and Lu (2015) estimated leaf Chl content using ASD FieldSpec 3 portable spectrophotometer (350 nm–2500 nm) and vegetation indices in northeast China. The authors acquired reflectance spectra from both adaxial and abaxial leaf surfaces of white poplar (Populus alba) and Siberian Elm (Ulmus pumila var. pendula.). As a result, spectral reflectance values were higher on the abaxial surface than adaxial surfaces in the visible wavelengths (400–700 nm). In contrast, the authors found the opposite for the NIR wavelengths (700 nm–1000 nm) for both plant species.

Additionally, "Spectranomics" is a newly developing concept that explores the relationship between plant canopy species and their functional traits to their spectral-optical properties (Asner and Martin, 2009). Asner and Martin (2009) combined chemical (N, P, Chl-a, Chl-b) and spectral remote sensing (400 nm–2500 nm) perspectives to facilitate canopy diversity mapping. Asner et al. (2011) further developed this concept by examining leaf hemispherical reflectance and transmittance spectra, along with a 21-chemical portfolio, in 6136 humid tropical forest canopies. They developed up-scaling methods using a combination of canopy radiative transfer, PLSR, and high-frequency noise modeling techniques using a spectral range of 400 nm–2500 nm. Similarly, Stein et al. (2014) aimed to determine the relationship between spectral reflectance (350 nm–2500 nm) and foliar nutrient concentration (e.g., N, P, K, Calcium (Ca), Magnesium (Mg)) in loblolly pine, and to investigate the role of geographic scale in model accuracy. The authors found that localized loblolly pine nutrient studies, even with fertilization treatments, are less likely to produce successful remote-sensing models than studies across a large geographic region. McManus et al. (2016) discovered the link between foliar reflectance spectra (350 nm–2500 nm) and the phylogenetic composition of tropical canopy tree communities using nine biochemical traits that relate to a wide range of leaf functions. While Martin et al. (2018) tested the concept of the foliar trait (Leaf mass per area (LMA)) retrieval and chemical data (P, Ca, K, Mg, B, Fe) using imaging spectroscopy (Carnegie Airborne Observatory (CAO)) data (350 nm–2510 nm) constrained with simultaneous light detection and ranging (LiDAR) measurements.

This section delves into (a) the challenges of dealing with spectral noise in hyperspectral NIR data and (b) strategic methodologies for reducing spectral noise. Also, (c) we explore the effects of moisture content and epicuticle wax on extracting a representative sample.

Demetriades-Shah et al. (1990) define spectral noise as a signal of interest accompanied by background noise and other unwanted signals. We express spectral noise as the 'Signal-to-Noise Ratio' (SNR) between the wanted signal and the unwanted background noise as:SNR=PsignalPnoise

The suspended particles cause scattering and increase absorption by lengthening the path of the analytical beam through the sample (Demetriades-Shah et al., 1990). In addition, spectral noise may occur when the sensor malfunctions; the sensor is affected by environmental constituents or the Bidirectional Reflectance Distribution Function (BDRF). A study by Knyazikhin et al. (2013) further exemplifies quantifying the retrieval of any biochemical information from remote sensing data is subject to leaf and canopy bidirectional reflectance factor (BRF). A conference paper by Ustin (2013) agrees with Knyazikhin et al. (2013) and states that future research should address these problems by quantifying the physical interactions. Hyperspectral sensors produce higher spectral noise than multispectral sensors through acquiring highly discrete spectral information (Agjee et al., 2018). As a result, large continuums of data become damaged or lost (Peerbhay et al., 2013).

In practice, Lepine et al. (2016) tested the influence of spectral resolution, spatial resolution, and sensor fidelity on relationships between observed patterns of foliar %N and canopy reflectance. Their study revealed virtually no reduction in the strength of relationships between %N and reflectance when using coarser bandwidths from Airborne Visible/InfraRed Imaging Spectrometer (AVIRIS) imagery, but instead saw declines with increasing spatial resolution and loss of sensor fidelity. Signal processing is a continuously developing field with new signal denoising techniques available across many fields such as photogrammetry, bioinformatics, and remote sensing (Koziol et al., 2018). The most standard denoising techniques are the Savitzky-Golay or Fourier-filtering, and the more advanced approaches are Principal Component Analysis (PCA) and the Minimum Noise Fraction (MNF) (Koziol et al., 2018).

We are encouraged to undertake best practice data acquisition methods to reduce noise whilst acquiring a representative sample as remote sensing scientists. Hence, most of the studies in this review have used data pre-processing techniques to reduce spectral noise and normalize spectral reflectance values. For example, Asner et al. (2011) investigated the impact of high-frequency noise (sensor and residual artifacts noise following atmospheric correction) on PLSR predictions. They applied noise using data from AVIRIS imagery taken over tropical forests as a source of noise for very SWIR simulations. The authors found that noise negatively affects PLSR results varying degrees depending on wavelength range and chemical constituent. Zhai et al. (2013) estimated N, P, and K contents in the leaves of different plants using laboratory-based visible (Vis-NIR) and near-infrared reflectance spectroscopy using a FieldSpec Pro FR portable spectroradiometer (350 nm–2500 nm) in Jiangsu Province, China. The authors compared regression models PLSR and SVM regression methods for estimating the N (CN), phosphorus (CP), and potassium (CK) content present in leaves of diverse plants. As a result, the SVMR method accounted for more than 90% of CN, CP, and CK variation compared to PLSR, which accounted for 59.1%, 50.9%, and 50.6% of the variation using, respectively.

Similarly, Amirruddin et al. (2017) quantified N status on various ages (maturity classes) of Tenera oil palm stands using a Geophysical and Environmental Research Corporation 1500 model spectroradiometer (350 nm–1050 nm) in Malacca, Malaysia. The authors compared machine learning algorithms: Discriminative Analysis (DA) feature selection and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) to determine the best spectral wavebands needed for quantifying N status. As a result, their study found that DA outperformed SVM in all maturity classes of Tenera oil palms. Furthermore, the authors developed spectral signatures that illustrate 'deficient N′ and 'optimum N′ levels using the electromagnetic spectrum (Amirruddin et al., 2017).

Koziol et al. (2018) investigated the denoising efficiency and signal distortion properties of a series of spectral noise removal techniques such as Fourier transform, Mean Filter, Weighted Mean Filter, Gauss Filter, Median Filter, spatial Wavelets and Deep Neural Networks. The authors also tested spatial noise removal techniques such as Savitzky-Golay, Fourier transforms, Principal Component Analysis, Minimum Noise Fraction, and spectral Wavelets, using high-definition Fourier transform infrared (FT-IR) data (3900 cm-1 to 900 cm-1) as an input. As a result, their study showed that multivariate-based techniques of PCA and MNF outperformed any other spatial and spectral denoising method (Koziol et al., 2018). Agjee et al. (2018) evaluated the influence of simulated spectral noise on RF and oblique random forest (oRF) classification performance using two node-splitting models (ridge regression (RR) and support vector machines (SVM)) to discriminate healthy and infested vegetation using hyperspectral data (350 nm–2500 nm).

Advancements in remote sensing technology will produce more accurate sensors, enabling more precise acquisitions of remotely sensed data. Currently, there is no coherent framework for noise removal. However, the viability of using Deep Neural Networks to remove noise still needs further testing and assessment across many different IR spectrometers and materials (Koziol et al., 2018). Hence, future research should emphasize comparing numerous case studies and scenarios to provide suitable noise removal frameworks for analysis.

Another common problem when acquiring spectra is the influence of moisture content (aquaphotomics) within the leaf. Previous research has shown that water is ubiquitous in biological samples. Its effect on chemical compounds changes the intensity and shifts absorption wavebands that have been a long-term challenge (Kokaly and Clark, 1999; Pasquini, 2018). Kokaly and Clark (1999) illustrate the effect of moisture content on spectra obtained from a leaf when dry and exposed to 10% moisture, with 25% soil background effects and 50 m residual atmosphere. As a result, a lower spectral curve is produced, which is an inaccurate account of the actual chemical concentration of the plant. The presence of leaf glaucousness (epicuticle wax) & trichomes (presence of hairs) have a considerable impact on leaf reflectance values (Holmes and Keiller, 2002; Vanderbilt and Grant, 1985). The two studies have hypothesized that the amount of light specularly reflected by a leaf depends on plant species and is related to the canopy's physiological status and development stage (Vanderbilt and Grant, 1985). Most studies have used the UV and visible spectral regions; an opportunity still exists to more closely understand the effects of cuticle wax & trichomes when sensed using the NIR region of the electromagnetic spectrum.

We divided this section into four parts: (a) sample strategy, (b) choosing a pre-processing data method, (c) choosing an appropriate statistical model, and (d) variable selection. A plethora of remote sensing research has explored innovative data pre-processing methods to analyze reflectance data (Zhai et al., 2013). The utility of spectral data pre-processing methods is an essential component for deriving a representative spectral sample. Essentially, applying a pre-processing data method has many benefits, such as reducing spectral data dimensionality, spectral noise, data redundancy, and impurity, especially when employing high dimensional and multivariate data.

Firstly, deriving a suitable sampling strategy is the primary step to acquiring accurate results using NIR scanning systems. Sampling strategies should assimilate steps that reduce background noise and enhance the integrity of data for modeling (Atkinson and Curran, 1995; Zhu et al., 2019). Samples obtained for modeling should contain high variability of the target site, providing a more stable model less vulnerable to outlier scenarios (Atkinson and Curran, 1995; Au et al., 2020; Zhu et al., 2019). Table 2 below shows a variety of studies with different sampling strategies derived based on their application. Furthermore, using a reliable wet chemistry assessment method is integral to validating the data before modeling. Therefore, researchers should emphasize the accurate execution of wet chemistry assessments.

Secondly, an important step is choosing the most appropriate spectral data pre-processing method. Section 3. b of this review highlights some essential strategies to eliminate spectral noise from hyperspectral data. The statistical model's success depends on the pre-processing data method (Schmitt et al., 2014). There are many variations in spectral data pre-processing methods such as signal derivatives, vector normalization (VN), or multiplicative scatter correction (MSC). Table 3 below shows a list of studies with different scanning windows, pre-processing data methods, and data splitting methods used on different leaf material and scanning systems.

Thirdly, A statistical model should be selected based primarily on the application of the research, used for either classification or prediction of the data used in the study. Table 4 provides a detailed description of studies that have used mainly prediction statistical models for foliar analysis. An important step is to calibrate and test the models. Splitting data into training and test data is essential for model calibration and validation. The following studies demonstrate the application of using different splits (Curran, 1989; Curran et al., 2001; Donkin et al., 1993; Guo et al., 2010; Mutowo et al., 2018; Pasquini, 2018). Earlier studies used standard multivariate statistical algorithms such as Partial Least Squares (PLS) (Menesatti et al., 2010; Ulissi et al., 2011) and PLSR (Zhai et al., 2013). For instance, Menesatti et al. (2010) used PLS to make chemical determinations on citrus tree leaves detecting N, P, K, Ca, Mg, Fe, Zn, and Mn using the visible - near-infrared (Vis-NIR) region (310 nm–1100 nm). Their study found relatively high correlations with R2's ranging from 0.88 for Mg and 0.48 for P. Alternatively, PLSR is an effective method of estimating the nutrient content of plants (Zhai et al., 2013). However, when Zhai et al. (2013) compared the model performance of PLSR and SVMR in detecting N, P, K using Vis-NIR (1000 nm–2500 nm), they found that the PLSR model produced satisfactory results with R2's of 0.59, 0.51, 0.51 for N, P, K, respectively. The SVMR model outperformed PLSR; as a result, the SVMR model accounted for more than 90% of the variation. Most of the studies in Table 4 have focused on measuring N as it relates to the general health of the plant species. However, not much research has measured the entire range of macronutrients and micronutrients.

Lastly, past and recent studies successfully implemented variable selection. Variable selection enables scientists to test the ability of a spectral waveband to detect a feature accurately. For example, variable selection algorithms frequently detect water absorption features. Considering the extreme case of spectral data pre-processing, the method of variable selection aims to eliminate variables not contributing to improving the model's overall performance (Pasquini, 2018). Pasquini (2018) reviews and lists many different variable selection methods for improving model performance. Studies have generally shown decent to significantly accurate results when using variable selection to predict chemical properties of foliar material. Overall, variable selection seeks to: simplify models for more straightforward interpretation, shorter training times, avoiding problems of dimensionality and overfitting. For example, Mutanga et al. (2004) applied 'continuum removal on absorption features' to predict macronutrients N, P, K, Ca, and Mg using a GER 3700 spectroradiometer (350 nm–2500 nm) in a savanna grassland in Kruger National Park (KNP), South Africa. The authors tested four variables for estimating canopy concentrations N, P, K, Ca, and Mg: (i) continuum-removed derivative reflectance (CRDR), (ii) band depth (BD), (iii) band depth ratio (BDR), and (iv) normalised band depth index (NBDI) using Stepwise linear regression. As a result, their study produced the highest using CRDR data, which yielded R2 values of 0.70, 0.80, 0.64, 0.50, and 0.68 with root mean square errors (RMSE) of 0.01, 0.004, 0.03, 0.01, and 0.004 for N, P, K, Ca and Mg, respectively. The results of their study justify the use of pre-processing data methods for successfully estimating nutrient content in dry foliar samples (Zhai et al., 2013). Furthermore, the differences in prediction accuracy between PLSR and SVMR show the importance of selecting a suitable algorithm.

In this review, we examined a decade (2010–2020) of research to provide a synopsis of the past and present techniques for detecting foliar nutrients using NIRS. We believe the best practice of this technology will provide high throughput commercial industry with a rapid and cost-effective alternative to assessing the nutrient status of their plants. Hence, future research should support implementing a NIRS system with a standardized approach to sample preparation, data pre-processing, and statistical modeling.

An essential part of this review was to list and compare the pre-processing data methods used from the latest research studies. It is important to note that NIR spectrometers generally produce a large amount of noise towards the end of the spectrum. We found no standard data pre-processing method from the studies presented in this review that could deal with such noise. However, we found that most studies preferred to use the Savitzky-Golay (SG) smoothing as the primary data pre-processing method. Furthermore, most studies used the first derivative transformation to reduce background signal noise in NIR data (Lequeue et al., 2016; Masemola and Cho, 2019; Murguzur et al., 2019; Zhai et al., 2013). To overcome the challenge of signal-to-noise (SNR) problems caused by light scattering, studies here employed mainly two methods: 1. multiplicative scatter correction (MSC) and 2. standard normal variate (SNV).

Furthermore, pre-processors such as MSC, SNV, and SG into high-dimensional hyperspectral data have improved prediction accuracy compared to untransformed data (Ustin and Jacquemoud, 2020). For example, Zhai et al. (2013) successfully employed MSC and SNV combined with the wavelet detrending method to correct light scattering variation and baseline of N, P, and K content present in leaves of diverse plants using laboratory-based visible and near-infrared (Vis-NIR) reflectance spectroscopy. However, SNV predicted LN with the highest accuracy for all the leaf spectral datasets (Masemola and Cho, 2019). More commonly, most studies listed in this review used wavelet detrending as a successful method for reducing spectral noise. Furthermore, 'mean centering', and most studies also employed 'auto-scaling; however, these two methods are typically embedded and automated in the software used. We have stressed the importance of reducing spectral noise. The reduction in spectral noise has significantly improved results in the studies by (Agjee et al., 2018; Koziol et al., 2018; Peerbhay et al., 2013).

Finally, following data cleaning for noise and obscurities using pre-processing data methods, statistical models can be produced for either prediction or classification. An important part is selecting the most suitable algorithm (statistical model). For regression, most studies have used the partial least squares regression (PLSR) algorithm for predicting foliar nutrients in vascular plants, tomato leaves, grasses, and various other shrubs (Cho et al., 2007; Meuret et al., 1993; Oliveira and Santana, 2020; Peng et al., 2019). Many studies in this review found much higher correlations when using PLSR than other algorithms for predicting foliar nutrients (Abdel-Rahman et al., 2017; Murguzur et al., 2019; Singh et al., 2015). For example, Murguzur et al. (2019); and Ulissi et al. (2011) successfully predicted (R2 => 0.90) N levels using the PLSR algorithm. However, some studies found support vector machine regression (SVMR) performed better in estimating N, P, and K as SVMR have built-in noise and overfitting removal mechanisms (Amirruddin et al., 2017; Zhai et al., 2013).

For accuracy assessment, most studies used the coefficient of determination (R2), and root means square error (RMSE) to test the predictive ability of the models (Cho et al., 2007; Mutowo et al., 2018; Pasquini, 2018; Zhai et al., 2013). Most studies preferred to use the root means square error cross-validation (RMSECV), as well as the ratio of prediction (RPD) for the goodness of fit and standard error of calibration (SEC) for calibration error. These studies: Afandi et al. (2016), Murguzur et al. (2019), and Zhai et al. (2013) the K-fold cross-validation technique as a resampling procedure for further calibration of their models.

The latest research conducted (2010–2020) shows a trend towards NIR technology in various applications and strategies. In this review, most studies used NIR configured spectrometer devices, while a few studies strategically selected the NIR region using full-spectrum hyperspectral data (350 nm–2500 nm) for foliar analysis (Lequeue et al., 2016; Masemola and Cho, 2019; Murguzur et al., 2019). N is the most common plant health chemical parameter to monitor (Ustin, 2013; Windley and Foley, 2015). Hence, most of the examples in this review investigated estimating N levels within plant leaf material. In contrast, very few studies investigated other macronutrients (P, K, Ca, Mg, sodium) and micronutrient (manganese, iron, copper, zinc, boron). The sample sizes differed considerably from 15 to >1000 samples per study. It is important to note; these samples represented the total number of reference samples and not the spectral samples. However, researchers found that smaller sample sizes did not significantly affect the prediction results than studies with bigger sample sizes. The NIR instrumentation used by most studies was partly handheld devices and bench devices. The wet chemistry analysis performed differed across the laboratories; as a result, most nutrients did not show any significant pattern besides N using the conventional method called the 'Kjeldahl method' as the preferred method.

Section snippets

Future research

The findings of this review had a specific focus on the latest data pre-processing methods and statistical models for forest foliar nutrient assessment. This review highlighted the challenges and opportunities before model development. The leaf reflectance values affect spectral noise, moisture content; epicuticle wax; and adaxial and abaxial sampling. With this said, selecting the best data pre-processing method and statistical model is application-specific. It is vital to remain relevant with

Credit author statement

Leeth Singh: Methodology, Formal analysis, Investigation, Writing - Original draft preparation. Kabir Peerbhay: Conceptualization, Validation, Supervision. Onisimo Mutanga: Project administration, Supervision. Paramu Mafongoya: Funding acquisition, Supervision. Jacob Crous: Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

Acknowledgements

This work is based on the research supported wholly by the DST/National Research Foundation of South Africa (Grant Numbers: 86893 & 114898). We would like to thank the reviewers for their comments that enhanced this study.

References (78)

  • A.R. Huete

    Separation of soil-plant spectral mixtures by factor analysis

    Rem. Sens. Environ.

    (1986)
  • R.F. Kokaly et al.

    Spectroscopic determination of leaf biochemistry using band-depth analysis of absorption features and stepwise multiple linear regression

    Rem. Sens. Environ.

    (1999)
  • L.C. Lepine et al.

    Examining spectral reflectance features related to foliar nitrogen in forests: implications for broad-scale nitrogen mapping

    Rem. Sens. Environ.

    (2016)
  • P. Menesatti et al.

    Estimation of plant nutritional status by Vis–NIR spectrophotometric analysis on orange leaves [Citrus sinensis (L) Osbeck cv Tarocco]

    Biosyst. Eng.

    (2010)
  • O. Mutanga et al.

    Predicting in situ pasture quality in the Kruger National Park, South Africa, using continuum-removed absorption features

    Rem. Sens. Environ.

    (2004)
  • T.W. Payn et al.

    Potential for the use of GIS and spatial analysis techniques as tools for monitoring changes in forest productivity and nutrition, a New Zealand example

    For. Ecol. Manag.

    (1999)
  • K.Y. Peerbhay et al.

    Commercial tree species discrimination using airborne AISA Eagle hyperspectral imagery and partial least squares discriminant analysis (PLS-DA) in KwaZulu–Natal, South Africa

    ISPRS J. Photogrammetry Remote Sens.

    (2013)
  • D.L. Peterson et al.

    Remote sensing of forest canopy and leaf biochemical contents

    Rem. Sens. Environ.

    (1988)
  • H. Shi et al.

    Evaluation of near-infrared (NIR) and Fourier transform mid-infrared (ATR-FT/MIR) spectroscopy techniques combined with chemometrics for the determination of crude protein and intestinal protein digestibility of wheat

    Food Chem.

    (2019)
  • C. van der Tol et al.

    The scattering and re-absorption of red and near-infrared chlorophyll fluorescence in the models Fluspect and SCOPE

    Rem. Sens. Environ.

    (2019)
  • A. Varhola et al.

    Forest canopy effects on snow accumulation and ablation: an integrative review of empirical results

    J. Hydrol.

    (2010)
  • M.S. Watt et al.

    Application of remote sensing technologies to identify impacts of nutritional deficiencies on forests

    ISPRS J. Photogrammetry Remote Sens.

    (2019)
  • H.R. Windley et al.

    Landscape-scale analysis of nutritional traits of New Zealand tree foliage using near-infrared spectroscopy

    For. Ecol. Manag.

    (2015)
  • Y. Zeng et al.

    A practical approach for estimating the escape ratio of near-infrared solar-induced chlorophyll fluorescence

    Rem. Sens. Environ.

    (2019)
  • S. Zhang et al.

    Repaid identification and prediction of cadmium–lead cross-stress of different stress levels in rice canopy based on visible and near-infrared spectroscopy

    Rem. Sens.

    (2020)
  • X. Zhang et al.

    Detecting macronutrients content and distribution in oilseed rape leaves based on hyperspectral imaging

    Biosyst. Eng.

    (2013)
  • D. Zhao et al.

    Nitrogen deficiency effects on plant growth, leaf photosynthesis, and hyperspectral reflectance properties of sorghum

    Eur. J. Agron.

    (2005)
  • N.e.H. Agjee et al.

    The impact of simulated spectral noise on random forest and oblique random forest classification performance

    J. Spectrosc.

    (2018)
  • A.D. Amirruddin et al.

    Assessing leaf scale measurement for nitrogen content of oil palm: performance of discriminant analysis and Support Vector Machine classifiers

    Int. J. Rem. Sens.

    (2017)
  • G.P. Asner et al.

    Airborne spectranomics: mapping canopy chemical and taxonomic diversity in tropical forests

    Front. Ecol. Environ.

    (2009)
  • P.M. Atkinson et al.

    Defining an optimal size of support for remote sensing investigations

    IEEE Trans. Geosci. Rem. Sens.

    (1995)
  • J. Au et al.

    Sample selection, calibration and validation of models developed from a large dataset of near infrared spectra of tree leaves

    J. Near Infrared Spectrosc.

    (2020)
  • C. Axelsson et al.

    Hyperspectral analysis of mangrove foliar chemistry using PLSR and support vector regression

    Int. J. Rem. Sens.

    (2013)
  • J. Cael et al.

    Infrared and Raman spectroscopy of carbohydrates. Paper V. Normal coordinate analysis of cellulose I

    J. Chem. Phys.

    (1975)
  • S. Cipullo et al.

    Predicting bioavailability change of complex chemical mixtures in contaminated soils using visible and near-infrared spectroscopy and random forest regression

    Sci. Rep.

    (2019)
  • L. Dixit et al.

    Quantitative analysis by derivative electronic spectroscopy

    Appl. Spectrosc. Rev.

    (1985)
  • M. Donkin et al.

    Methods for Routine Plant Analysis in the ICFR Laboratories

    (1993)
  • C.D. Elvidge

    Visible and near infrared reflectance characteristics of dry plant materials

    Rem. Sens.

    (1990)
  • T.W. Gara et al.

    Impact of vertical canopy position on leaf spectral properties and traits across multiple species

    Rem. Sens.

    (2018)
  • Cited by (7)

    View all citing articles on Scopus
    View full text