Performance of two handheld NIR spectrometers to quantify crude protein of composite animal forage and feedstuff

: Two handheld near infrared (NIR) spectrometers were used to quantify crude protein ( 𝐶𝑃 ) content of mixed forage and feedstuff composed of Sweet Bran, distiller’s grains, corn silage, and corn stalk. First was a transportable spectrometer, which measured in the visible and NIR ranges (320–2500 nm) with a spectral interval of 1 nm (H1). Second was a smartphone spectrometer, which measured from 900–1700 nm with a spectral interval of 4 nm (H2). Spectral data of 147 forage and feed samples were collected by both handheld instruments and split into calibration ( 𝑛 = 120) and validation ( 𝑛 = 27) sets. For H1, only absorbances in the NIR region (780–2500 nm) were used in the multivariate analyses, while for H2, absorbances in the second and third overtone regions (940–1660 nm) were used. Principal component analysis (PCA) and partial least squares (PLS) regression models were developed using mean-centered data that had been preprocessed using standard normal variate (SNV) or Savitzky-Golay first derivative (SG1) or second derivative (SG2) algorithm. PCA models showed two major groups—one with Sweet Bran


Introduction
In forages and feedstuffs, crude protein ( ) is one of the most regularly assessed constituents [1]. Laboratory-based chemical tests are often expensive, time-consuming, and require chemical reagents, some of which are potentially dangerous. Near infrared (NIR) spectroscopy has been used as an alternative method to predict forage , giving quick and reliable results with minimal sample preparation and no requirement for any reagents [2][3][4]. Typically, NIR technology is conducted with benchtop spectrometers that exhibit very high performance [5], giving low errors and accurate predictions of nutrient concentrations. However, these instruments are too large and costly to be widely distributed and transported and are generally used in controlled environments. Advancements in optics and electronics have enabled the development of portable, handheld NIR spectrometers, which are relatively easy to operate and have reduced space and energy requirements [6] compared to benchtop spectrometers. These handheld units vary in cost, size, weight, type of power needed, robustness, user-friendliness, durability, accuracy of measurement, and performance reliability [7,8]. Given this large variation in properties and specifications, there is no one-size-fits-all spectrometer for different applications. There is a need for continued evaluation of different handheld NIR spectrometers to identify applications in which a given type can be reliably employed [9]. This study contributes to the ongoing process of validating the level of performance of miniaturized NIR instruments for utilization in the food and agriculture sector.
One of the major downsides of using NIR spectroscopy is the investment required in calibration development [10], more so in the case of different types of forage and feedstuff. The ability to develop one calibration that encompasses a wide range of available animal feed would significantly reduce the cost and time involved in developing a calibration per feed type. Using a handheld or portable spectrometer that can be easily transported to locations, where samples are, provides an extra benefit. The objective of this study was to evaluate and compare the accuracy of estimating crude protein ( ) of composite animal forage and feedstuff using two handheld NIR spectrometers. The first handheld NIR spectrometer (H1) was a transportable spectrometer, which measured in the visible and NIR ranges (350-2500 nm), had a spectral interval of 1 nm, and weighed 2.5 kg. Absorbance measurements from only the NIR region (780-2500 nm) were used in the calibration and prediction of . The second handheld NIR spectrometer (H2) was a smartphone spectrometer, which measured from 900-1700 nm, had a spectral interval of 4 nm, and weighed 136 g. This instrument covered the second overtone region with limited absorption windows in the first and third overtone regions of the NIR range.
Over the years, NIR has been used as a helpful tool for routine monitoring of quality control, including composition, in animal forage and feedstuff [11][12][13][14][15][16] with the coefficient of determination of prediction ( 2 ) ranging from 0.53-0.99. To obtain an accurate calibration, reference samples should have a sufficient working range of content [12], in addition to covering as much of the variability in predicted samples as possible [14]. However, it is not always possible to obtain same-type samples with a range wide enough to allow for the development of a reliable calibration model. For example, Monrroy et al. [17] reported a 2 = 0.53 for a calibration model for Brachiaria spp with a narrow range of 5.6-11.1%, utilizing NIR absorbance spectra in the second and third overtone regions. Likewise, a model with corn samples with a range of 6.95-8.05% had 2 = 0.61 [18], even when the first and combinations spectral regions were utilized in the calibration. Similar observations were made with corn stalk and dried distillers grain with solubles (DDGS) with narrow ranges, such that the resulting prediction had 2 ≤ 0.85. Pooling different feed and forage types together increases the working range of content, the number of samples, and variability of samples. Therefore, for this study, it was hypothesized that forage and feedstuff could be quantified using two handheld spectrometers (H1 and H2) with 2 > 0.85 because of a wide range of the calibration samples. Most reported studies, with high 2 values, using NIR to predict in forage and feedstuff, were based on absorbances that cover the entire NIR range, focusing on the 1100-2500 nm region [4,[19][20][21][22][23][24][25]. Therefore, it was postulated that H1 would have higher 2 and ratio of performance to deviation ( ) values than those of H2, indicating better prediction performance, because absorbance bands tend to be stronger in the first overtone and combinations regions (1700-2500 nm) than in the second and third overtone regions (900-1700 nm). With suitable accuracy, rapid data collection and analysis, handheld NIR units may be used for in situ monitoring of feed at different stages of production -whether it is at the farm, market, feed mill, silo, or a packaging facility. Miniaturized NIR spectrometers that cost less enable a wide distribution and application of the technology to areas where benchtop systems would not ordinarily be useful.

Samples
Forage and feed samples, and their corresponding data (measured via laboratory analysis) were obtained from the Ruminant Nutrition Laboratory, Animal Science department at the University of Nebraska-Lincoln. The samples included Sweet Bran, corn silage, corn stalks, and three kinds of corn distillers grains: wet distillers grain with solubles (WDGS), modified distillers grain with solubles (MDGS), and dry distillers grain with solubles (DDGS). Samples were collected weekly and composited monthly from the University of Nebraska Eastern Nebraska Research and Extension Center beef cattle feedlot. All samples were dried in a forced air oven at 60°C (model LBB2-21-1; Despatch Industries, Minneapolis, MN) for 48 hours (AAOC, 1965; method 935.29) [26], and ground through a 1mm screen using a Wiley mill (number 4; Thomas Scientific, Swedesboro, NJ).

Spectrometers
Absorbance measurements were collected using two handheld NIR spectrometers representing two of the several kinds of portable spectrometers available on the market that vary in spectral range, cost, and potential applications [7]. The first handheld NIR spectrometer (H1) was a transportable NIR spectrometer (ASD QualitySpec® Trek, Malvern Panalytical, Cambridge, UK), which measured in the visible and NIR ranges from 350-2500 nm, had a spectral interval of 1 nm, and weighed 2.5 kg. Only the NIR range (780-2500 nm) was used in the multivariate analyses. The second handheld NIR spectrometer (H2) was a smartphone NIR spectrometer (Enterprise Scanner, Tellspec Inc., Toronto, Ontario, Canada), which measured from 900-1700 nm, had a spectral interval of 4 nm, weighed 136 g, and could be classified as a Hadamard transform-based palm-sized spectrometer [7]. The spectrometer's 940-1660 nm range was used in the multivariate analyses. The cost of H2 was approximately 40x less that of H1. Table 2 shows the differences between the two handheld spectrometers.

Spectral data collection
Spectral data of 147 dried and ground forage and feed samples, contained in 0.08-mm-thick (3 mil = 3/1000th inch thick) polypropylene (PP) bags (Uline, Pleasant Prairie, WI, USA), were collected using both handheld spectrometers-H1 and H2. For each spectrometer, a background spectrum was collected by encasing a white reference (Spectralon®) disk in an empty 0.08-mm-thick PP bag. This background spectrum was subtracted from subsequent sample spectra to reduce the effect of PP absorption. Collecting NIR spectra of forage samples through a transparent PP film can reduce the accuracy of predicting consitituents [28]. A sample spectrum was collected by placing the window of the handheld spectrometer directly on top of the packaged sample and pulling the trigger of H1 or pressing the scan button on H2 to start scanning through the plastic film tightly held to the sample. The sample bag was flipped over to collect a second scan on the other side. For both instruments, each scan was an average of 50 spectral measurements across the entire spectral range. The two scans obtained per sample, per spectrometer, were averaged in Excel (Microsoft Office Suite, Version 2016, Microsoft Corporation, Redmond, WA, USA) before preprocessing and multivariate analyses. The averaged spectra collected with H1 were truncated also to include only the NIR range (780-2500 nm), and, for H2, the spectra were truncated to 940-1660 nm.

Spectra data preprocessing
For each spectrometer, the mean spectra of all forage and feed samples ( = 147) were exported to The Unscrambler® X software (Version 10.5, Camo Analytics, Magnolia, TX, USA) for further processing and analysis. The spectral data were split into a calibration set ( = 120) to build PLS regression models, and an independent validation set ( = 27) to test the performance of the models developed. Calibration and validation sets were selected such that they had a similar distribution based on the laboratory values of parameters under test and ensuring that the validation range was covered by the calibration (Table 3). Note: a Number of samples ( ), mean ( ), standard deviation ( ), range (∆), maximum ( ), and minimum ( ).

Principal component analysis and partial least squares regression
The spectral data from each spectrometer were analyzed using principal component analysis (PCA) and partial least squares (PLS) regression. PCA was performed on the whole forage and feed data set to observe which samples clustered based on their scores. PLS regression models were built using the calibration data and validated with the prediction data set. For both H1 and H2, PCA and PLS regression models were developed using mean-centered spectral data that had been preprocessed using standard normal variate (SNV) and Savitzky-Golay first derivative (SG1) or second derivative (SG2) algorithm with 11-61 smoothing points. Each model was built with random crossvalidation using 20 segments with four samples in each segment and removing one segment of observations from the calibration set at a time. For PLS regression, the ful l NIR spectral range of H1 (780-2500 nm) was used, while for H2 the spectral range of 940-1660 nm was used. During cross-validation, Marten's uncertainty test, a significance testing method based on jack-knifing [29], was enabled to identify, along with loading weights, the important wavelengths on which the PLS regression model is based. This allowed for a set of model parameters [e.g., -coefficients (also called regression coefficients), scores, loadings and loading weights] to be calculated for every submodel created based on samples that were not held out of the cross-validation segment. Differences between the -coefficients of all the sub-models to those of the full calibration model were calculated and used to estimate the uncertainty limits of each -coefficient. Wavelengths withcoefficients which have a relatively large uncertainty limit and, at the same time, had loading weights that also had relatively large uncertainties were deemed not important by The Unscrambler® software.
PCA models were assessed using sample scores and spectral loadings on the principal components ( s) that captured most of the variation in the samples (>70%). PLS models were assessed for performance based on the optimal number of latent variables or factors ( ), coefficient of determination of calibration ( 2 ), root-mean-square errors of calibration ( ) and crossvalidation ( ) [30]. The Unscrambler® software suggests an optimum based on the lowest . Good-fit models typically have similar or close values for and . Prediction performance of the models was assessed on having a high coefficient of determination of validation ( 2 ), low root-mean-square error of prediction ( ) and standard error of prediction ( ), and a close to zero. is the mean difference between the NIR-predicted and reference values? Two additional validation performance parameters were calculated: ratio of the standard deviation of the reference values in the validation set to , called the , and the ratio of the of reference values in the validation set to the , called [31]. Ideally, the SEP should be much lower than the standard deviation and range of reference values in the validation set, resulting in high and values. In forages, feed, and soils, models with > 4.1 are considered excellent for any application -screening, quality control, and process control [32].

Principal components analysis
The scores plot of the PCA of the raw spectra collected from H1 and H2 showed a similar pattern (Figure 1a,b) with corn stalk and corn silage samples close to each other, while Sweet Bran, wet distillers grain (WDG), and modified distillers grain (MDG) overlapping with each other (Figures 1a,b). Dried distiller grain (DDG) clustered on its own, with minor overlap with MDG. For H1, the first principal component ( 1) accounted for 89% of the variability in the raw NIR spectra, while the second principal component ( 2) accounted for 9% (Figure 1a). Similarly, for H2, 1 accounted for 91% of the variability in the raw NIR spectra, while 2 accounted for 8% (Figure 1b), indicating that despite the difference in range of NIR absorbances measured, calibration and validation performance of both instruments would be similar. To check this, the raw spectra from both instruments were subjected to PLS regression. The resulting scores plots (Figures 1c,d) were mirror images across the -axes (i.e., 2 = 0) of the PCA scores plots of the raw spectra (Figures 1a,b) for both instruments. For H1, PLS Factors 1 and 2 together accounted for 98% of the variability in the NIR absorbance spectra, but only 81% of the variability in (Figure 1c). In a similar vein, PLS Factors 1 and 2 together accounted for 100% and 77% of the variability in the NIR absorbance spectra of H2 and in , respectively. Because a variety of feedstuffs were being pooled in the calibration, it was better to have more variation accounted for in the NIR absorbance spectra than in the data. To achieve this, the NIR absorbance spectra from both instruments needed to be preprocessed using Savitzky-Golay derivative algorithm prior to the PLS regression. Doing so resulted in 61% of the variability in the NIR absorbance spectra of H1 accounted for by Factors 1 and 2 in the regression while accounting for 94% of the variability in (Figure 1e). The same trend was observed for H2, with Factors 1 and 2 accounting for 78% and 92% of the variability in the NIR absorbance spectra and , respectively (Figure 1f).

Partial least squares regression
Raw and preprocessed absorbance measurements from both handheld spectrometers were calibrated to using PLS regression ( Once again, applying SNV preprocessing did not improve model prediction performance. Models based on second derivative spectra had better performance than those based on first derivative spectra. Pretreating the spectra with Savitzky-Golay second-derivative and 13 smoothing points gave the model with the best prediction parameters (Model H2.6), such that 2 = 0.97, = 2.12% and = 2.43%, and validation performance parameters were 2 = 0.97, = 2.05%, = 2.04%, = 5.74 and = 17.14. Models H1.5 and H2.6 performed in agreement to the hypothesis ( 2 > 0.85). The prediction performance of Model H2.6 was slightly better than that of Model H1.5, contrary to the hypothesis that H1 models would have higher 2 and values than H2 models. Nonetheless, both models had high validation performance.   A direct comparison of the predicted ( ) to the reference showed no systemic deviation from linearity or significant offset for the best performing models (Models H1.5 and H2.6) for each instrument (Figure 2). A closer look showed that all samples had : ratio of unity, except Sweet Bran for which the ratio was 0.93 ≠ 1 ( = 0.03) when Model H1.5 was used. Overall, these results demonstrated that there was no bias introduced by each forage or feedstuff type to predictions by Models H1.5 and H2.6. However, it should be noted that the prediction confidence intervals are wider at at higher values than at lower values. For instance the precision for predicting of WDG would be less than that of predicitn corn silage. The standard error ( ) for measuring using the reference method were 0.44% (Sweet Bran), 0.57% (DDG), 1.07% (MDG), 1.09% (WDG), 0.35% (Corn silage) and 0.54% (Corn stalk). These values were, in all cases, lower then the for both Models H1.5 (2.25%) and H2.6 (2.04%).

Figure 2.
Comparison of predicted to reference crude protein ( ) of forage and feedstuff using partial least squares regression models developed for a transportable (Model H1.5) and a smartphone (Model H2.6) near infrared spectrometer.

Discussion
It was not surprising that corn stalk and corn silage were close to each other in the PCA scores plots (Figures 1a,b). Both feed types had relatively low compared to the other samples and were similar in material or botanical fractions of the plant. Corn silage is made from ensiling the entire corn plant [33,34], while corn stalk is composed of stover (stalks, leaves, husks, cobs, and some gran) left in the field after harvesting corn [35]. The samples did not overlap in the scores plot with PC2 differentiating them according to , i.e., corn silage had higher (6.2-10.8%) than that of corn stalk (3.4-6.0%). Sweet Bran and distillers grain had between 20.0 and 41.0% and these samples overlapped in the scores plots. Sweet Bran is a commercial product of the corn wet milling process which produces high frucutose corn syrup and starch. This co-prduct, known as corn gluten feed, is relatively high in protein [36] and is widely used in ruminant animal diets [37]. Distillers grain is a co-product of bioethanol production, with a high feed value based on its content [38]. Following the ethanol production process, most of the starch-filled endosperm has been extracted from distillers grain, and the remaining components would be similar to those of Sweet Bran, explaining their proximity in the PCA scores plot. The observed within-group score differences could be ascribed to further differences in content. Sweet Bran ranged from 20.1-26.9% while that of distillers grain ranged from 28.5-50.3%.
The PLS regression models obtained using H1 and H2 performed higher than hypothesized. To evaluate the utility of PLS regression models in food and agriculture, Williams proposed that models with 2 ≥ 0.92, and ≥ 4.1 can be used for rough screening, screening, research, quality control, and process control [31,32]. Models H1.5 ( 2 = 0.96, and ≥ 5.24) and H2.6 ( 2 = 0.97, and = 5.74) could be used for any of these applications to predict content of the forage and feedstuff samples represented in the calibration. The parameter can also be used to assess PLS regression models, however, it is sensitive to the range of samples in the validation set [39]. Its interpretation based on Williams' scale should be made with caution.
When predicting forage and feedstuff using NIR, it would be ideal to have a calibration for each type or species of forage and feedstuff with samples covering much of the expected variability in future samples. However, this is not always possible since there may not be enough available samples for each feed type. Even with a large number of samples, if there is an insufficient range in calibration samples, it may not be possible to develop a reliable calibration model [12]. Developing broad-based calibrations provides room for increasing the working range, allowing for a more reliable model for quantitative prediction [40]. For instance, Daniel et al. [24] [25]. The performance of their models was similar to that obtained in this study. Even with a single type of feedstuff with a range of 3.76-29.4%. Vokers et al. [22] reported 2 = 0.96 for a single variety of forage maize. Similar results were obtained for corn silage ( 2 = 0.94) and hay crop silage ( 2 = 0.95) when samples of the same feed type with a high range were used in calibration development [41]. On the contrary, lower 2 values were obtained for corn ( 2 = 0.61), DDGS ( 2 = 0.71) and corn stalk ( 2 = 0.85), when same-type samples with a narrow range were used to develop calibration models [23,42]. These observations, and the results obtained from this study underscore the need to have samples with a sufficient range of composition when developing calibration models. In situations, for example at a feed mill, farm, or processing plant, where of different forage and feedstuff is routinely measured, a NIR calibration model that encompasses all feed types allows for efficiency, saving time and cost involved in developing a calibration for each feed type.
The ability to determine content in forage is based on absorbed energy by organic bonds among carbon (C), hydrogen (H), oxygen (O), and nitrogen (N) in the sample. The amount of energy absorbed by these bonds in the NIR region is relative to the amount of constituents in the sample [43]. The spectral data can be calibrated to concentration using a set of reference samples measured by a standard method. Strong N-H absorptions are primarily responsible for the good relationships seen between chemical and spectral data during calibration development [44]. Another contributing factor to the observed predictions based on NIR spectra is the fairly high and wide range of concentrations in forage and feedstuff [1]. The calibration samples used in this study had ranging from 3.39 to 40.34%. With an accurate calibration, it allows for reliable determination of in future forage samples using their NIR spectra, without need for chemical analysis. Nonetheless, the limitations of such a global NIR model cannot be ignored. It is worth noting that, for each feed type, the of the referenc method was always lower than the for the selected models for both H1 and and H2. This limitation is inherent if one must build a calibration encompassing different types offorage and feedstuff. While the model errors may be large compared to the reference method, looking at the final ratios of NIR-predcited to reference values, there was no signicant difference. This is of great utility to the end-user.
Shenk and Westerhaus [46] reported the most important NIR wavebands for measuring to include: 2060 nm (the carbonyl stretch of the primary amide), 2168 to 2180 nm (combination band consisting of N-H bend 2nd overtone, C-H stretch/C=O stretch combination; and C=O stretch/N-H in-plane bend/C-N stretch combination bands), 2050 to 2060 nm (N-H stretching vibrations), 1640 to 1680 (C-H stretch), and 1500 to 1530 nm (N-H stretch). These bands were mostly in the first overtone and combinations regions of the NIR spectrum. Later on, Williams [31] reported a more extensive range of wavelengths of weak, fair, and strong principal absorption bands for protein that covered the entire NIR range (Figure 3). It follows that strong absorption bands are mostly concentrated in the first overtone and combinations regions, while the fair and weak bands are dispersed across the entire NIR spectrum. Factor one important wavebands identified by Marten's uncertainty test during PLS regression modeling with H1 spectra covered the whole NIR range overlapping many of the bands identified by Williams [31] (Figure 3) and all those reported by Shenk and Westerhaus [46] in the first overtone and combinations regions, allowing calibration of absorbance measurements to reference and obtaining a model with good prediction performance. The smartphone spectrometer, H2, only covered the NIR absorbances in the region of 900-1700 nm and PLS models were built over the 940-1660 nm range, covering mostly the second and third overtone regions During PLS regression modeling using these absorbances, Marten's uncertainty test identified several important spectral bands along the first factor ( Figure 3). The bands overlapped the weak and fair wavebands by Williams [31] and some of the principal wavelengths identified by Shenk and Westerhaus [46].
While H2 absorbances had a limited NIR window, the information obtained therein was sufficient enough to build a PLS model with prediction performance that even slightly performed better than that of H1. Since NIR absorption becomes weaker moving from the combinations to the first, second, and third overtone regions [47], it is reasonable to expect that the second and third overtone regions would have weaker absorption that translates to reduced prediction performance of a regression model. However, with adequate principal spectral bands, the limitation of a reduced NIR window did not seem to affect the performance of Model H2.6 negatively. A reliable prediction model could be obtained even with absorbance measurements covering the second and third overtone regions. Most reported studies predicting composition of forage and feedstuff using NIR use absorbance measurements in the entire NIR range (780-2500), with a focus on the 100-2500 nm range. However, similar to observations of this work, Modrono et al. [48] reported no difference in prediction performance when two handheld NIR spectrometers were used to predict content of a combined variety of feed for cattle, pigs, hens, sheep, and other animals. The first spectrometer recorded absorbance measurements in the 1600-2400 nm region, while the second only covered the 950-1650 nm region. Many of the low-cost handheld NIR spectrometers have only a limited spectral window, and it is useful to know they can be used to predict with sufficient utility.

Conclusions
Two handheld NIR spectrometers were used to predict content of mixed forage and feedstuff. The first was a costly, full-range NIR spectrometer, while the second was a low-cost, limited-range NIR spectrometer. PLS regression models based on spectral data from both spectrometers had similar prediction performance in terms of possible applications such as rough screening, screening, research, quality, and process control. The quantity of in forage and feedstuff is one of the most important quality parameters. If a low-cost handheld NIR unit, covering mostly the second overtone region, can be used to predict forage and feedstuff , a lot of time and cost required for routine wet chemistry analysis would be saved, provided a reliable calibration model exists. Additionally, the ability to build a single calibration model encompassing different types of forage and feedstuff eliminates the need to have one for each feed type. Miniaturized, highperforming NIR instruments will enable easy distribution and utilization of the technology, especially in developing countries where feed testing remains a challenge due to limitations in cost, access to laboratories, and analytical skills. The adaptation of low-cost, easy-to-use, fast NIR spectrometers could improve the control and management of animal feeding programs, as long as there is a reliable calibration. However, it should be noted that this study was performed using dry and ground samples, which controlled for sample moisture and particle size. Future evaluation will focus on evaluating the sensitivity of portable NIR spectrometers and chemometric models developed here to moisture content and particle size and to provide recommendations on how to overcome the effects of these interferents chemometrically.