Predicting Forage Quality of Warm-Season Legumes by Near Infrared Spectroscopy Coupled with Machine Learning Techniques

Warm-season legumes have been receiving increased attention as forage resources in the southern United States and other countries. However, the near infrared spectroscopy (NIRS) technique has not been widely explored for predicting the forage quality of many of these legumes. The objective of this research was to assess the performance of NIRS in predicting the forage quality parameters of five warm-season legumes—guar (Cyamopsis tetragonoloba), tepary bean (Phaseolus acutifolius), pigeon pea (Cajanus cajan), soybean (Glycine max), and mothbean (Vigna aconitifolia)—using three machine learning techniques: partial least square (PLS), support vector machine (SVM), and Gaussian processes (GP). Additionally, the efficacy of global models in predicting forage quality was investigated. A set of 70 forage samples was used to develop species-based models for concentrations of crude protein (CP), acid detergent fiber (ADF), neutral detergent fiber (NDF), and in vitro true digestibility (IVTD) of guar and tepary bean forages, and CP and IVTD in pigeon pea and soybean. All species-based models were tested through 10-fold cross-validations, followed by external validations using 20 samples of each species. The global models for CP and IVTD of warm-season legumes were developed using a set of 150 random samples, including 30 samples for each of the five species. The global models were tested through 10-fold cross-validation, and external validation using five individual sets of 20 samples each for different legume species. Among techniques, PLS consistently performed best at calibrating (R2c = 0.94–0.98) all forage quality parameters in both species-based and global models. The SVM provided the most accurate predictions for guar and soybean crops, and global models, and both SVM and PLS performed better for tepary bean and pigeon pea forages. The global modeling approach that developed a single model for all five crops yielded sufficient accuracy (R2cv/R2v = 0.92–0.99) in predicting CP of the different legumes. However, the accuracy of predictions of in vitro true digestibility (IVTD) for the different legumes was variable (R2cv/R2v = 0.42–0.98). Machine learning algorithms like SVM could help develop robust NIRS-based models for predicting forage quality with a relatively small number of samples, and thus needs further attention in different NIRS based applications.


Introduction
Perennial warm-season grasses, such as bermudagrass (Cynodon dactylon), old world bluestems (Bothriochloa spp.), and bahiagrass (Paspalum notatum), serve as major summer forage resources for grazing stocker cattle in the southern United States (US). While capable of producing large amounts of biomass, these perennial grasses often show a decline in forage quality with their maturation towards the mid-late summer growing season and do not meet the nutritional needs of grazing stocker cattle for the entire season [1,2]. Legumes, being high-quality forages, can be adopted to offset the summer slump in forage quality, and enhance the efficiency of forage-based beef production systems. Further, the continued increase in the cost of nitrogen fertilizers has added to the interest of producers in utilizing legume crops as forage in many regions across the US. In response, extensive research in the southern US over the last decade has focused on evaluating warm-season annual legumes as summer forage resources that can be grown in rotation with winter-wheat (Triticum aestivum L.) [3][4][5][6]. In more recent years, several legumes have received increased attention due to their capabilities of generating high amounts of biomass under the limited moisture conditions that prevail in the southern US [7].
Quantifying the quality of forage in pastures is crucial for both agriculture research and forage management, including cattle grazing and harvesting. However, the determination of the different parameters of forage quality, such as crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD), by classical analytical techniques is time-consuming and expensive, especially when numerous samples are required. The vast evolution of computers and multivariate statistical techniques has enabled the use of near infrared spectroscopy (NIRS) in assessing the quality parameters of many forages. The NIRS method is quick, inexpensive, and facilitates timely decision-making related to grazing periods. The technique is based on interactions between light reflectance in the wavelength ranging between 750-2500 nm and organic compounds in the plant biomass [8]. The method of applying NIRS to predict forage quality involves analyzing a particular forage with both traditional lab analysis and NIRS, and then developing a predictive equation by pairing the information in a calibration dataset ( Figure 1). The NIRS has been widely used in forage quality predictions of crops including alfalfa (Medicago sativa) [9], maize (Zea mays) [10], ryegrass (Lolium multiflorum) [11], tall fescue (Festuca arundinacea) [12], and other species. However, the technique has been underutilized to provide predictions of forage quality for many warm-season legumes.

Introduction
Perennial warm-season grasses, such as bermudagrass (Cynodon dactylon), old world bluestems (Bothriochloa spp.), and bahiagrass (Paspalum notatum), serve as major summer forage resources for grazing stocker cattle in the southern United States (US). While capable of producing large amounts of biomass, these perennial grasses often show a decline in forage quality with their maturation towards the mid-late summer growing season and do not meet the nutritional needs of grazing stocker cattle for the entire season [1,2]. Legumes, being high-quality forages, can be adopted to offset the summer slump in forage quality, and enhance the efficiency of forage-based beef production systems. Further, the continued increase in the cost of nitrogen fertilizers has added to the interest of producers in utilizing legume crops as forage in many regions across the US. In response, extensive research in the southern US over the last decade has focused on evaluating warm-season annual legumes as summer forage resources that can be grown in rotation with winter-wheat (Triticum aestivum L.) [3][4][5][6]. In more recent years, several legumes have received increased attention due to their capabilities of generating high amounts of biomass under the limited moisture conditions that prevail in the southern US [7].
Quantifying the quality of forage in pastures is crucial for both agriculture research and forage management, including cattle grazing and harvesting. However, the determination of the different parameters of forage quality, such as crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD), by classical analytical techniques is timeconsuming and expensive, especially when numerous samples are required. The vast evolution of computers and multivariate statistical techniques has enabled the use of near infrared spectroscopy (NIRS) in assessing the quality parameters of many forages. The NIRS method is quick, inexpensive, and facilitates timely decision-making related to grazing periods. The technique is based on interactions between light reflectance in the wavelength ranging between 750-2500 nm and organic compounds in the plant biomass [8]. The method of applying NIRS to predict forage quality involves analyzing a particular forage with both traditional lab analysis and NIRS, and then developing a predictive equation by pairing the information in a calibration dataset ( Figure 1). The NIRS has been widely used in forage quality predictions of crops including alfalfa (Medicago sativa) [9], maize (Zea mays) [10], ryegrass (Lolium multiflorum) [11], tall fescue (Festuca arundinacea) [12], and other species. However, the technique has been underutilized to provide predictions of forage quality for many warm-season legumes. As developed for other forage crops, well-calibrated NIRS species-based models for warmseason legumes could be useful tools to quickly asses the forage quality of different legume species As developed for other forage crops, well-calibrated NIRS species-based models for warm-season legumes could be useful tools to quickly asses the forage quality of different legume species grown under a range of environmental or management settings, or harvested at different stages of growth, or cutting or grazing height. Therefore, it is necessary to examine the effectiveness of NIRS in predicting forage characteristics of some important warm-season legumes. This work includes species such as guar (Cyamopsis tetragonoloba), tepary bean (Phaseolus acutifolius), soybean (Glycine max), and pigeon pea (Cajanus cajan), given past research, and the potential for expansion of use of these species across the southern US and other similar environments. There are also other warm-season legumes that may be capable of providing high-quality forage for summer grazing [3,13,14]. However, developing NIR calibration equations for every species can become challenging for public or private laboratories that test forage quality. Generally, accurate chemical analyses of a large number of samples is not readily available or feasible to develop calibrations, especially when novel legume species are involved. In response to challenges related to developing species-based calibrations, global models developed from samples of ranges of different warm-season legumes can prove useful, if such calibrations provide sufficiently accurate predictions.
Several calibration techniques are known to perform well in the application of NIRS in estimating forage quality and are generally available in most chemometric packages [15]. Partial least squares (PLS) is among the most commonly used methods, where least square algorithms are used to compute regressions [16]. In contrast, a comparatively novel and robust machine learning algorithm, support vector machine (SVM), has been gaining attention for NIRS calibrations [15]. Further, the Gaussian processes (GP) have provided better calibration results than PLS and SVM, in some cases [17,18]. However, tests of these calibration techniques on wide ranges of common and more novel legumes are required to define their function.
The combination of NIRS and machine learning calibration techniques could serve as an effective tool to streamline the monitoring efforts in warm-season legumes by eliminating the need for classical forage analytical methods. Therefore, the objective of this research was to evaluate the performance of NIRS in predicting the forage quality of four warm-season legumes (guar, tepary bean, pigeon pea, and soybean), using three different calibration techniques-PLS, SVM, and GP-on individual species bases. Additionally, the efficacy of global calibrations of these techniques, developed by combining datasets of all four species and mothbean (Vigna aconitifolia), was tested using different independent datasets of five species.

Materials
Forage samples used in the study (n = 410) were collected as parts of two different field experiments conducted at the USDA-ARS Grazinglands Research Laboratory near El Reno, Oklahoma, US (35.57 • N, 98.03 • W, elevation 414 m). Ninety samples each for guar and tepary bean, and 50 mothbean samples were collected from field experiments conducted during the summers of 2017 and 2018. An additional 90 samples of both soybean and pigeon pea were obtained from two long-term experiments (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008) conducted in the same location [3,19]. In all three experiments, aboveground biomass was collected from randomly clipped 0.5 m row lengths from experimental plots at 15-day intervals, starting at 45 days after planting. Apart from whole plant samples, a major proportion of collected biomass samples in these experiments were separated into leaf, stem, and pods fractions before laboratory analysis.

Laboratory and NIRS Analysis
All leaf, stem, pod, and whole plant samples were oven-dried at 60 • C until a constant weight. Dry samples were ground to pass a 2-mm filter using a Wiley grinding mill. Total nitrogen concentration in each sample was determined by the flash combustion method (Model Vario Macro, Elementar Americas, Inc., Mt. Laurel, NJ, USA) and then converted into CP by multiplying with a factor of 6.25 ( Table 1). The IVTD was obtained for each sample by following the Daisy Digester procedures (ANKOM Technology, Macedon, NY, USA). The NDF and ADF concentrations were only determined in samples of guar and tepary bean, in accordance with the batch fiber analyzer techniques (ANKOM Technology, Macedon, NY, USA). Table 1. Summary statistics of lab datasets used for calibration, cross-validation and external validation of crude protein (CP), neutral detergent fiber (NDF), acid detergent fiber (ADF), and in vitro true digestibility (IVTD) of four warm season legumes.

Species
Parameter  Aliquots of ground samples were filled into ring cups to eliminate voids. Spectral reflectance (R) of monochromatic light, averaged over 10 spectra per sample, were collected by scanning spectrophotometer (Model SpectraStar 2600 XT-R, Unity Scientific, Columbia, MD, USA). Spectral data were obtained as the logarithm of the inverse of reflectance [log(1/R)] at 1-nm interval over the range of 680-2600 nm.

Calibration Techniques
Partial least squares (PLS) is an extensively used class of statistical methods, which includes regression, classification, and dimension reduction techniques. It uses latent variables, also called score vectors, to model the relationship between input and response variables. In the case of regression problems, PLS first generates the latent variables from the given data and uses them as new predictor variables. There are different types of PLS, based on techniques employed to extract the latent variables. Two approaches are used to extend PLS for modeling non-linear relations among data. The first approach is to reformulate the linear relationship between score vectors, u and v, by a non-linear model: where g is the continuous function that models the existing non-linear relation. Generally, g is modeled using artificial neural networks, smoothing splines, polynomial, or radial basis functions. Remaining variables h and w denote a residual vector and a weight vector, respectively. The second PLS approach is to apply kernel-based learning. The kernel PLS method transforms the input space data to higher dimensional feature space and linearly estimates PLS in that space. To avoid the mapping function Φ from projecting data to feature space, PLS applies the kernel trick which uses the fact that a value of the inner product of two vectors x and y in feature space can be calculated using a kernel function k(x, y) [20]: By using the kernel function, score vectors (u and v) can be identified and used to define the non-linear relationship. The kernel PLS approach is used to model complex non-linear relations easily in terms of implementation and computation.
Gaussian processes (GP) are kernel-based, probabilistic, non-parametric regression models. A Gaussian process involves a set of random variables such that every finite number of those variables possess joint Gaussian distributions. A Gaussian process, f (x), can be described using a mean function m(x) and a covariance function k(x, x ). The covariance function defines the smoothness of responses, and the basis function Φ projects the input space vector x to a higher dimension feature space vector Φ(x). A Gaussian process regression (GPR) model describes the response by using latent variables from a Gaussian process. A GPR model is represented as: ), and f (x) are from a zero mean GP having a covariance function, k(x, x ) [21]. The covariance is specified by kernel parameters, which are also known as hyperparameters.
GPR is a probabilistic model, and an instance of response y is: GPR is non-parametric as there is a latent variable f (x i ) for each observation x i . Noise variance σ 2 , basis function coefficients w, and hyperparameters of the kernel can be estimated from the data while training the GPR model. Support vector machine (SVM) is a popular machine-learning algorithm used for identifying linear as well as non-linear dependency between input vectors and outputs. SVMs are non-parametric models, which means parameters are selected, estimated, and tuned in such a way that the model capacity matches the data complexity [21]. Generally, SVM starts by observing the multivariate inputs X and outputs Y, estimates its parameters w, and then learns the performed mapping function y = f (x, w), which approximates the underlying dependency between inputs and responses. The obtained function, also known as a hyperplane, must have a maximal margin (for classification) or the error of approximation (for regression) to predict the new data. In the case of SVM regression, Vapnik's error (loss) function is used with ε-insensitivity. It finds a regression function f (x) that deviates from the actual responses (y) by values no more than ε and is considerably flat at the same time.
For non-linear regression problems, SVM maps the input space to feature space (a higher dimension space) using a mapping Φ(x) to find a linear regression hyperplane in that space. However, there is no need to know the mapping Φ, as the kernel function k x i , x j , which is the inner product of the vectors Φ(x i ) and Φ x j , can be used to find the optimal regression hyperplane in extended space. There are many kernel functions available to describe non-linear regressions, such as the polynomial kernel, RBF kernel, Gaussian Kernel, normalized polynomial kernel, etc. The learning problem in classification as well as in regression, leads to solving the quadratic programming (QP). The sequential minimal optimization (SMO) is considered as the most popular optimizer for solving SVM problems [22]. It divides the large QP problem into a set of small QP problems and analytically solves them.

Performance Evaluation
Apart from calibration, 10-fold cross-validations and external validations were conducted to assess the performance of the calibration techniques. The 10-fold cross-validation is a unique statistical way of performance evaluations of machine learning models in which ten repeated hold-out executions are obtained and averaged. In each execution, the model is trained with 90% of the data points and tested with the remaining 10%, and thus every data point is taken nine times for training and once for testing the model.  Coefficients of determination, being upper-bounded by 1.0, are often adopted for meaningful comparisons across different models and therefore was used here as an estimate of prediction accuracy. To be precise, coefficient of determination in calibration (R 2 c), coefficient of determination in cross-validation (R 2 cv), and coefficient of determination in validation (R 2 v) were used for direct computation of the variance in the data captured at calibration, cross-validation, and external validation, respectively by each model. Additionally, root mean squared error estimation was also presented for comparing models, which were termed as RMSEc, RMSEcv, and RMSEv for calibration, cross-validation, and external validation, respectively.

Software
Regression models were calibrated, cross-validated, and externally validated using the Weka software, version 3.8 [23]. Weka is a suite of machine learning algorithms and is widely used for data mining. For implementing PLS, we used the PLS classifier package in Weka, which uses the prediction capabilities of PLSFilter. The PLSFilter runs the PLS regression on the given set of data and computes the beta matrix for prediction. By default, missing values are replaced, and the data are centered. For GP implementation, the Gaussian classifier for regression without hyperparametertuning was used. The kernel for the Gaussian classifier was configured as a polynomial. By default, missing values were replaced by the global mean. The SMOReg classifier was used to implement SVM in Weka. The classifier used the polynomial kernel and RegSMOImproved optimizer to learn SVM for regression. All remaining parameters, such as batch size, debugging, and filter type, which do not check capabilities, noise, etc., were kept as default.

Results and Discussion
The prediction accuracy of calibrated models is discussed by comparing their cross-validation (R 2 cv) and external validation (R 2 v) results to a scale proposed for NIRS calibrations [24]. According to the scale, the performance of a model is considered excellent if the R 2 of validations is greater than 0.95, and the resultant model can be used in any application. A model is assumed satisfactory with R 2 ranging from 0.9-0.95 and would be usable for most applications involving quality assurance. Coefficients of determination, being upper-bounded by 1.0, are often adopted for meaningful comparisons across different models and therefore was used here as an estimate of prediction accuracy. To be precise, coefficient of determination in calibration (R 2 c ), coefficient of determination in cross-validation (R 2 cv ), and coefficient of determination in validation (R 2 v ) were used for direct computation of the variance in the data captured at calibration, cross-validation, and external validation, respectively by each model. Additionally, root mean squared error estimation was also presented for comparing models, which were termed as RMSE c , RMSE cv , and RMSE v for calibration, cross-validation, and external validation, respectively.

Software
Regression models were calibrated, cross-validated, and externally validated using the Weka software, version 3.8 [23]. Weka is a suite of machine learning algorithms and is widely used for data mining. For implementing PLS, we used the PLS classifier package in Weka, which uses the prediction capabilities of PLSFilter. The PLSFilter runs the PLS regression on the given set of data and computes the beta matrix for prediction. By default, missing values are replaced, and the data are centered. For GP implementation, the Gaussian classifier for regression without hyperparameter-tuning was used. The kernel for the Gaussian classifier was configured as a polynomial. By default, missing values were replaced by the global mean. The SMOReg classifier was used to implement SVM in Weka. The classifier used the polynomial kernel and RegSMOImproved optimizer to learn SVM for regression. All remaining parameters, such as batch size, debugging, and filter type, which do not check capabilities, noise, etc., were kept as default.

Results and Discussion
The prediction accuracy of calibrated models is discussed by comparing their cross-validation (R 2 cv ) and external validation (R 2 v ) results to a scale proposed for NIRS calibrations [24]. According to the scale, the performance of a model is considered excellent if the R 2 of validations is greater than 0.95, and the resultant model can be used in any application. A model is assumed satisfactory with R 2 ranging from 0.9-0.95 and would be usable for most applications involving quality assurance. Models with R 2 ranging between 0.8-0.9 are considered moderately successful and can be used with caution for most applications, including research.

Guar
The chemical analysis of guar samples showed wide variability in parameters that define forage quality for different components (leaf, stem, or pod) of plant sampled at different growth stages ( Table 1). The CP content for all 90 (70 + 20) guar samples ranged from 3.7% to 34.9%, while NDF concentrations ranged from 16.8% to 75.8%, ADF concentrations ranged from 8.9% to 62.9%, and IVTD from 40.3% to 95.2%.
Among the three techniques, the PLS technique performed best at calibrating each of the four forage quality parameters in guar with R 2 c of 0.98-0.99, though calibration results of SVM (R 2 c = 0.94-0.98) were also comparable ( Table 2). While GP had a comparatively lower calibration accuracy with R 2 c ranging between 0.88-0.91 for IVTD, NDF and ADF, and R 2 c of 0.95 for CP of guar samples. Although PLS provided best calibrations out of the three, SVM gave better prediction accuracy in both cross-validation and external validation of all four indices of forage quality for guar. Thee GP approach generated the lowest R 2 cv for all four parameters and R 2 v for NDF and ADF. Among forage quality parameters, the greatest prediction accuracy was recorded for CP by all three techniques with R 2 cv of 0.93-0.97 and R 2 v of 0.93-0.98 (Table 2). In comparison, only the SVM technique resulted in a satisfactory prediction accuracy (R 2 cv = 0.92; R 2 v = 0.94) for NDF, based on the proposed scale [24]. Both the SVM and PLS techniques showed excellent accuracy at predicting ADF with R 2 cv and R 2 v between 0.94-0.96, while GP produced R 2 cv of 0.86. All three techniques resulted in relatively low prediction accuracy for IVTD, with R 2 cv ranging from 0.81-0.83. Overall, performances of SVM was most satisfactory among the three calibration methods, and it can be employed in NIRS-based prediction of CP, ADF, and NDF of guar. In contrast, use of IVTD predictions of guar would require caution, based on the type of application.
While currently a minor crop in the southern US, guar has a proven potential to serve as a multi-purpose legume and has potential for expansion in use. Guar is a common crop in regions of the Indian subcontinent, Africa, North and South America, and Australia [25]. Guar has been gaining attention as a forage resource in the southern US due to its capability of producing high N biomass under limited water conditions [3,5]. Therefore, this first report investigating the application of NIRS in guar would encourage the utilization of the technique in its research and forage management.

Tepary Bean
Results from the laboratory analysis of tepary bean samples showed high variability in all four of the quality indices, though the observed ranges were narrower than guar ( Table 1). The concentration of CP varied from 4.5-31.1%, while NDF ranged from 22.9% to 71.6%. In contrast, ADF and IVTD ranged between 15.3-59.2% and 55.9-93.2%, respectively. Best calibration results for tepary bean were recorded using the PLS technique, with R 2 c of 0.98-0.99 (Table 3). Whereas, neither SVM nor PLS clearly resulted in better predictions for all quality indices when cross-validated and externally validated. All calibration techniques showed best results at predicting CP in tepary bean samples with a R 2 cv or R 2 v above 0.90 among the forage quality characteristics ( Table 3). The SVM technique resulted in the lowest RMSE cv value (1.74) for cross-validation of CP, whereas PLS had the lowest RMSE v of 1.35 for external validation among the three techniques. In contrast, PLS showed the lowest RMSE cv values of 5.09 and 3.97 and SVM had the lowest RMSE v of 4.03 and 2.23 for NDF and ADF, respectively. Both PLS and SVM produced satisfactory results at predicting ADF concentration in tepary bean with R 2 cv of 0.86-0.89 and R 2 v of 0.92-0.95 compared to GP, while all three techniques had comparatively low performance at predicting NDF in tepary bean with R 2 cv and R 2 v of 0.72-0.84 and 0.75-0.84, respectively.
In comparison to ADF, the NDF concentration of tepary samples were less accurately predicted by all three techniques (Table 3). Similar differences between prediction accuracy of ADF and NDF were also noticed for guar in this study, and also reported earlier in NIRS studies involving Brassica napus [26], Lolium multiflorum [11], and Oryza sativa [27]. Though PLS performed better at predicting IVTD in tepary bean compared to other two, all three techniques resulted in relatively low prediction accuracy with R 2 cv and R 2 v ranging between 0.75-0.79 and 0.75-0.88, respectively. Overall, both PLS and SVM could be considered as good among three tested techniques and hence can be employed for satisfactory predictions of CP and ADF in tepary bean. While prediction results of NDF and IVTD would need some caution if calibrations are developed with similar sample sizes (n = 70) as used in this study.
Tepary bean is a vining, warm-season legume species originated from the areas of the southwestern United States and northwestern Mexico, that may have value for multiple uses in dryland agricultural systems. Due to its spreading growth habit, and the ability to generate high N biomass with limited soil moisture, tepary bean could be an ideal summer forage for the Southern Great Plains [14]. This first study investigating the application of NIRS to attributes of forage quality in tepary bean showed that the technique could aid in quantifying its role in meeting animal nutrition needs.

Soybean
All three techniques (PLS, SVM, and GP) gave excellent accuracies at calibrating CP and IVTD of soybean samples with PLS again performing the best out of three with a R 2 c greater than 0.98 (Table 4). Among three techniques, SVM performed best at predicting CP with RMSE cv and RMSE v of 1.85 and 1.78, respectively, followed closely by PLS. All three calibration techniques produced better predictions of IVTD in soybean (R 2 cv > 0.84 and R 2 v > 0.89), compared to prediction accuracies obtained for guar and tepary bean. As observed for CP, SVM performed better than the other techniques in cross-validation (R 2 cv = 0.89) of IVTD, while the other two techniques performed better in external validation (R 2 v of 0.92-0.93). All three techniques can be employed for rapid NIR-based predictions of CP and IVTD in soybean forage samples, with SVM would be the best choice. GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R 2 c , determination coefficient in calibration; RMSE c , root mean square error in calibration; R 2 cv , determination coefficient in cross-validation; RMSE cv , root mean square error in cross-validation; R 2 v , determination coefficient in external validation; RMSE v , root mean square error in external validation.
Soybean was initially introduced as a forage into the US in the 19th Century, but is now one of the most widely grown grain legumes in the Southern Great Plains [14]. In the last two decades, there has been increased interest from researchers in utilizing soybean as a summer forage in the US [28][29][30]. Hence the need for rapid and low-cost techniques for estimating forage quality. The NIRS technique has not been exploited for forage quality predictions in soybean. A single report investigated modified PLS and multiple scatter correction methods for NIR predictions of CP, NDF, and ADF concentrations, using 353 soybean samples collected at one (R6) growth stage [31]. In comparison, calibrations developed in the present study, used data on IVTD and CP with just 70 soybean samples collected across a range of different growth stages. Thus, our observed ranges for CP (4.1-39.7) and IVTD (42.4-99.3%) were more diverse ( Table 1). The accuracies (R 2 cv or R 2 v > 0.92) obtained in predicting CP in soybean forage by all three techniques were higher than the values reported [31], despite large differences in sample sizes (N = 70 vs. 353) used for developing calibrations. Therefore, this study showed machine learning algorithms could develop robust NIRS calibrations for precise analysis of forage quality of soybean with small sample sizes.

Pigeon Pea
Laboratory analyses for the current study showed wide variability in both CP (4.5-32.5%) and IVTD (30.7-91.1%) for forage samples of pigeon pea (Table 1). The CP concentration in pigeon pea was accurately calibrated (R 2 c > 0.95) by each of the three techniques (Table 5). All three techniques resulted in CP predictions with R 2 cv and R 2 v greater than 0.96. Both PLS and SVM also showed greater accuracies in predicting IVTD of pigeon pea with R 2 cv and R 2 v ranges of 0.91-0.92 and 0.96-0.97, respectively. Although lower than PLS and SVM, the performance (R 2 cv = 0.86) of GP-based calibrations were moderately satisfactory in IVTD predictions, following the proposed scale [24]. Overall, both PLS and SVM would provide excellent options for NIR predictions of CP and IVTD in pigeon pea. Pigeon pea is another legume species that has seen the development of a range of cultivars for different uses in its home range, and areas of greater cultivation. This includes research on the value of cultivars of pigeon pea in the US for forage, grain, and pasture productivity [4,32]. Pigeon pea has a high degree of heat and drought tolerance, and the capacity for high levels of forage production in the US and other tropical and sub-tropical regions.
While pigeon pea is a broadly grown crop in much of the world, there was only one preliminary report that discussed the possible use of NIRS techniques to predict forage quality of pigeon pea [33]. That report used limited numbers of samples (n = 48), involving leaves and branches, that were mostly collected at one growth stage for calibrations of CP, NDF, and ADF concentrations; however, no validations were performed [33]. In contrast, the present study undertook both calibrations and validations using 90 (70 + 20) pigeon pea samples, involving leaves, stems, or seed pods, collected at different growth stages during a long-term experiment. Further, we investigated the NIR-based predictions of IVTD, which is assumed as an important quality parameter in pigeon pea forage [4]. Therefore, this study confirms that NIRS techniques could be effective tools for predicting forage quality of pigeon pea.

Global Calibrations
Global calibrations for CP and IVTD of warm-season legumes were developed with 150 samples, which included 30 samples each of guar, tepary bean, soybean, pigeon pea, and mothbean ( Figure 2). As observed with the species-based calibrations for the four different legumes, the PLS technique performed best out of the three techniques for global calibrations both CP and IVTD (R 2 c of 0.97 and 0.94, respectively), while the GP technique was the least accurate (Table 6). In comparison, cross-validation of global models showed the SVM approach provided the greatest prediction accuracy for both CP (R 2 cv = 0.94) and IVTD (R 2 cv = 0.86), followed closely by PLS. Therefore, based on cross-validation results, the performance of global calibrations developed using SVM and PLS were satisfactory at predicting CP, and moderately satisfactory for IVTD.
When global calibrated models were validated using different external datasets for each of the five legume species, the predictions for CP by all three techniques resulted in sufficient accuracies with R 2 v ranging between 0.91-0.97 (Table 6). The SVM technique showed higher accuracy compared to the others in predicting CP, with the exception of guar, where the PLS approach provided slight improvements. Among species, the best CP predictions were noted for pigeon pea (R 2 v values of 0.98-0.99) for all three techniques. In contrast, IVTD predictions were not consistently accurate across all five species. The greatest accuracy was observed for IVTD predictions in pigeon pea with R 2 v of 0.97-0.98 under SVM and PLS. The lowest accuracy in predicting IVTD was noted for mothbean (R 2 v between 0.65-0.69 by SVM and PLS; 0.42 by GP). The best prediction accuracies for IVTD of soybean (R 2 v = 0.82-0.86 for all three techniques) and guar (PLS; R 2 v = 0.81) were moderately satisfactory. However, the performance of all three techniques was satisfactory at predicting IVTD in tepary bean (R 2 v of 0.91-92), which was better than the specific models developed for tepary bean (Table 3).
Overall, global-calibrated models for CP have the potential to offer sufficient prediction accuracies that are comparable to species-based calibration models. Diverse calibration sets that contain different legume species may allow the creation of robust, generalized models that provide predictions similar to species-based models. In some cases, global models may be capable of providing more accurate predictions, as was observed for IVTD predictions of tepary bean in this study.
The application of accurate globally calibrated models would be extremely useful for a broad range of end-users. They would reduce or eliminate the large amounts of time and other resources required to perform chemical analyses or the development and use of separate calibration sets for every species. However, adopting the global calibration approach for IVTD may not provide satisfactory predictions for all species. Some of the issues related to the low level of performance of calibrations for IVTD may be variability associated with using techniques that rely on rumen fluids in laboratory analyses [34]. Therefore, further investigations are required to compare the performance of global calibrations developed for IVTD of warm-season legumes derived using both rumen fluid and cellulose degradation methods. Table 6. Calibration, cross-validation, and external (species) validation statistics of global models obtained for crude protein (CP) and in vitro true digestibility (IVTD) in warm-season legumes using three calibration techniques.  GP, Gaussian processes; PLS; partial least square; SVM, support vector machine; R 2 c , determination coefficient in calibration; RMSE c , root mean square error in calibration; R 2 cv , determination coefficient in cross-validation; RMSE cv , root mean square error in cross-validation; R 2 v , determination coefficient in external validation; RMSE v , root mean square error in external validation.

Conclusions
The statistics obtained for calibration, cross-validation, and external validation in this study demonstrated that NIRS techniques could be effective for supplying rapid and accurate predictions of most attributes of forage quality (cell wall fractions, crude protein) for different warm-season legumes. Further, the applications of NIRS technique to guar, tepary bean, and mothbean represent the first reports of such tools to provide estimates of forage quality for these species. Though similar to PLS, the SVM technique performed consistently well in predicting quality parameters of five warm-season legumes under both species-based and global calibration strategies. The global calibration approach can be a useful approach for predicting CP in warm-season legumes, and reduce the time and resources required for traditional chemical analysis in the use of separate calibration equations for each species. However, the global model for IVTD was not accurate for all species. Further model development based on other analytical procedures may improve the consistency and reliability of the global approach. Machine learning algorithms like SVM could also allow the development of robust models with a relatively small number of samples. Additional research is required to refine the SVM approach for different NIRS applications.