Integration of handheld NIR and machine learning to “Measure & Monitor” chicken meat authenticity

By combining portable, handheld near-infrared (NIR) spectroscopy with state-of-the-art classification algorithms, we developed a powerful method to test chicken meat authenticity. The research presented shows that it is both possible to discriminate fresh from thawed meat, based on NIR spectra, as well as to correctly classify chicken fillets according to the growth conditions of the chickens with good accuracy. In all cases, the random subspace discriminant ensemble (RSDE) method significantly outperformed other common classification methods such as partial least squares-discriminant analysis (PLS-DA), artificial neural network (ANN) and support vector machine (SVM) with classification accuracy of>95%. This study shows that handheld NIR coupled with machine learning algorithms is a useful, fast, non-destructive tool to identify the authenticity of chicken meat. By comparing and combining different protocols to measure the NIR spectra (i.e., through packaging and directly on meat), we show the possibilities for both consumers and food inspection authorities to check the authenticity and origin of packaged chicken fillet.


Introduction
The supply of sufficient healthy, safe, and authentic food to a growing world population is one of the most important challenges for the present and the future (Pischetsrieder, 2018).Detection of food adulteration such as unlabelled replacement of food components may be hindered because of the targeted focus of analytical techniques (Reid, O'Donnell, & Downey, 2006;Sentandreu & Sentandreu, 2014).From an analytical standpoint, successful detection of food adulteration faces two major challenges (Reid et al., 2006).The first challenge comprises untargeted determination of undeclared ingredients or unknown (hazardous) naturally present substances.Secondly, and more analytically challenging, are claims like animal welfare, fair trade, or eco-friendly production.While these "soft claims" are generally beyond the scope of analytical chemistry, the effects on the chemical composition of the product may still be found and quantified.
Meat authenticity (and traceability) are of particular importance in modern society (Sentandreu & Sentandreu, 2014;Vlachos, Arvanitoyannis, & Tserkezou, 2016).Recent events of meat adulteration with non-declared species such as horse meat illustrate the global need for clear and reliable checks for consumer products, but even intact fresh meat is often indistinguishable between brands or price-range.Nowadays price and lifestyle, together with religion and health concerns, determine an individual's choice for particular food products (Reid et al., 2006;Sentandreu & Sentandreu, 2014).
Detection technologies applied for food authenticity are mainly based on spectroscopic and chromatographic techniques (Gallo & Ferranti, 2016).Spectroscopic techniques have great potential for discrimination of food materials.One promising and widely used technique in this context is near infrared (NIR) spectroscopy, a rapid and non-destructive technique.NIR enables preliminary monitoring of different types of food and as an analytical technique is able to give qualitative and quantitative information about complex samples (Abasi, Minaei, Jamshidi, & Fathi, 2018;Lohumi, Lee, Lee, & Cho, 2015;Prieto, Roehe, Lavín, Batten, & Andrés, 2009).
Developments in instrumentation technology have led to the availability of portable spectroscopic devices.Modern handheld NIR instruments that have been developed for food and drug quality control https://doi.org/10.1016/j.foodcont.2020.107149Received 4 December 2019; Received in revised form 27 January 2020; Accepted 28 January 2020 are fast, lightweight and relatively inexpensive.The trade-off for using these devices is that the spectral region and resolution are limited compared to benchtop technologies (Modroño, Soldado, Martínez-Fernández, & de la Roza-Delgado, 2017;Pasquini, 2018;Zamora-Rojas, Pérez-Marín, De Pedro-Sanz, Guerrero-Ginel, & Garrido-Varo, 2012).Additionally, scattering effects and instrumental and ambient noise make robust chemometric and machine learning methods crucial to extract the relevant information from the spectra (Arvanitoyannis & Van Houwelingen-Koukaliaroglou, 2003;Curran et al., 2018).
In the present contribution, a powerful machine learning algorithm is used based on ensemble learning (Merkwirth et al., 2004;Rokach, 2010).This method splits the data into multiple parts and combines the best models for the different parts (of the NIR spectra) to come to a majority vote classification.Random subspace discriminant ensemble (RSDE) (Ho, 1998) is proposed here as a fast and reliable method to use handheld NIR devices for food authenticity.The simplicity of the different components of our methodology will allow for "Measure & Monitor" technology to evaluate food authenticity.The goals of the presented research were (1) discrimination of fresh (Fr) and Thawed (Th) samples and (2) discrimination of growth systems based on handheld NIR spectra from three recording modes of on meat (OM), through the top of the package (TP) and through the package held bottom up (TB), such that the meat touched the covering foil.

Sampling and data collection
Fresh chicken breast fillet samples were kindly provided Albert Heijn B.V. (The Netherlands) and Musgraves Group Ltd. (Ireland) in their standard supermarket packages in June 2015.The animal welfare classification system differs between the countries of origin.
Albert Heijn B.V. has provided a set of 70 fresh chicken fillet samples from different production systems and batches, divided over a time span of 3 weeks.Animal welfare was expressed on the packaging by "no star" representing the lowest level of welfare and three stars representing the highest level of welfare.
Thawed samples (133 in total) were obtained by freezing at − 18 °C for 48 h and thawing for 24 h at + 4 °C.Twenty fresh samples were used for β-hydroxyacyl-CoA-dehydrogenase (HADH) reference measurements (13% of the total sample set) to assess their storage history, i.e. whether the samples had been chilled or frozen (Boerrigter-Eenling, Alewijn, Weesepoel, & van Ruth, 2017).For the Dutch set, three samples of each class were used for HADH, whilst for the Irish set two samples were subjected to HADH per category.No deviations were found in the freshness of the samples.Samples which were subjected to HADH measurements were not subjected to NIR measurements for the thawed category.No reference methods were available for confirmation of the growing system of the chicken fillet samples.Providers have confirmed that the indicated growing system is correct.Note that growth conditions may be similar across countries (e.g., CONV and STD), but that different labels have been attached in order to classify between-country variation.
NIR data was acquired using a MicroNIR Pro NIR (Viavi Solutions, Milpitas, CA, USA), powered by the MicroNIR Pro software (version 2.2, Viavi Solutions) in diffuse reflectance mode in wavelength range of approximately 908-1676 nm with an evenly distributed spectral resolution, resulting in 125 variables/measurement.A 99% white diffuse reflectance standard was used for calibration followed by a dark measurement.This calibration was repeated in 10 min cycles.The 153 chicken fillet samples were subjected to non-destructive NIR measurements by applying the NIR with standard collar in three different ways: on meat (OM), through package (TP) and through packaging bottom up (TB).First, TP measurements were acquired by placing the package on a flat surface and applying the NIR on the transparent top foil without pressure above the fillet sample.In most cases an air pocket was between the foil and the sample.Secondly, the TB measurements were performed by flipping the package bottom up, letting the fillet sample lean on the top transparent foil, followed by NIR measurements through this transparent foil.Finally, the transparent top foil was removed and NIR measurements were taken directly on the fillet sample without applying considerable pressure.Prior to freezing, the fillet package was covered with a new layer of identical transparent top foil.Five replicates were taken per OM/TP/TB, with a total of 4590 raw NIR measurements.Scheme A1 of the appendix illustrates how the samples were collected.The raw data used for this study is available as supplementary material (Parastar et al., 2020).

Data handling and preprocessing
Spectral data was labelled to ensure replicate measurements and measurements from different modes of the same fillet could be connected.Training and test sets were created using the Duplex method (Daszykowski, Walczak, & Massart, 2002;Puzyn, Mostrag-Szlichtyng, Gajewicz, Skrzyński, & Worth, 2011;Snee, 1977) on the entire data set in order to ensure a representative test set including boundary cases (Reitermanova, 2010;Westad & Marini, 2015).All classes were represented in the test set.Importantly, all measurements (i.e., spectra) of a sample were assigned to either training set (70%) or test set (30%) in order to ensure that the test set did not include data from the same sample that the model was trained on.
When looking for the optimum pre-processing strategy a design of experiments was used (Gerretzen et al., 2015).The predictive classification model was built (PLS-DA, CP-ANN, SVM and RSDE) and validated using cross validation (CV).In every CV, spectra belonging to the same chicken sample were removed from the train set (leave-chickenout).After training, tuning and evaluation of the model, the test set was used for final performance estimation.The data analysis pipeline of the presented work is shown in Scheme 1.

Random subspace discriminant ensemble
Classification of the NIR spectra was done using Random Subspace Discriminant Ensemble.This method divides the spectra into a number of random subspaces (30 random subspaces as standard in this case), selected from the spectral domain (e.g., a random subset of 60 wavelengths is the default in Matlab).Discriminant analysis (DA) was used to classify the spectra in each subspace (Ho, 1998;Tan, Li, & Qin, 2008).Each subspace may result in different classification probabilities.These probabilities are combined by taking their average across all subspaces to come to a single classification model of the full spectra.Fig. 1 shows the general architecture of RSDE algorithm.The potential of RSDE in high dimensional data comes from the fact that each model requires only a limited number of variables.

Software
Chemometric data analysis was performed in MATLAB environment R2016a, with the exception of the leave-class-out validation (Section 3.5), which was done in R2019B (Mathworks, MA, USA).The PLS-Toolbox v7.8 (Eigenvector, WA, USA) was used for PLS-DA modelling, the pre-processing toolbox (Gerretzen et al., 2015) was used to choose the best preprocessing strategy (based on an experimental design), the CP-ANN toolbox (Milano Chemometrics and QSAR Research Group) was used for optimization of the Kohonen network and supervised classification and the Classification Learner toolbox of MATLAB was used for SVM and RSDE modelling.

Fresh vs. thawed
The RSDE algorithm was first used to discriminate Fr and Th samples for each of the three different spectral recording modes.For the preprocessing of the data, an experimental design was used to find the best strategy with minimal classification error (Gerretzen et al., 2015).Classification performance was evaluated using accuracy (Acc), precision (Pre), sensitivity (Sen), specificity (Spe) and error rate (ER) (Ballabio & Consonni, 2013).
Fig. 2 shows the NIR spectra of Fr and Th samples in three different spectral recording modes of OM/TP/TB.Coloring the spectra by recording mode shows that there are clear differences in absorbance related to how the spectra were obtained.The differences are similar for Fr and Th samples.Because Fr and Th samples have similar spectra, the first challenge in this study was to discriminate Fr and Th samples based on NIR spectra.
The RSDE performed well in discriminating individual spectra of Fr and Th samples; Acc values were 90.2% for training set, 87.6% for cross validation (CV) and 85.2% for test set of OM samples.For TP samples, Scheme 1. Data analysis pipeline for each presented study.the values were 96.4%, 95.4% and 92.0% for train, CV and test sets, respectively.The Acc values for TB samples were respectively 95.4%, 93.3% and 91.0% for train, CV and test sets.Details on the classification power can be found in Table A1 in the appendix.

Classification in growth conditions
The ability of the RSDE method to classify individual NIR spectra was promising.The next objective was to evaluate whether the RSDE method could also be used to discriminate between the growth systems of the chickens.The RSDE algorithm was used for classification of seven growth conditions of 1*/2*/ORG/CONV/STD/FR/CF as well as MAR samples in Fr and Th conditions in OM, TP and TB modes (details in section 2.1).As an example, Fig. 3 depicts the discrimination performance of RSDE for classification of chickens in different growth conditions in terms of Acc in OM, TP and TB modes.As can be seen, the values of Acc for training, validation and test sets are between 80 and 90% for OM (Fig. 3a), TP (Fig. 3b) and TB (Fig. 3c).The values of Acc are low because of the complexity of the samples and similarity in the NIR spectra of samples in different conditions.
To compare the results of RSDE model with common classification methods in chemometrics, NIR data of chickens in different growth conditions were classified by partial least squares-discriminant analysis (PLS-DA) (Ballabio & Consonni, 2013;Gromski et al., 2015), counter propagation-artificial neural network (CP-ANN) (Ballabio et al., 2009;Ballabio & Vasighi, 2012) and support vector machine with quadratic kernel function (Q-SVM) (Brereton & Lloyd, 2010).Model performance of the RSDE was better than that of the other methods.In Fig. 3 For PLS-DA, the best preprocessing strategy was chosen according to experimental design approach (lowest classification error) (Gerretzen et al., 2015).In this regard, mean-centering and pareto scaling were the best pre-processing strategies.Other attempts such as outlier detection using Q-residuals/Hotelling's T 2 (Ballabio & Consonni, 2013) and variable selection using variable importance in projection (VIP) with "greater than one" rule (Andersen & Bro, 2010) were performed to improve PLS-DA classification.These methods slightly improved the models but not to acceptable levels (see Table A2 for more details).
For CP-ANN, firstly, the genetic algorithm (GA) (Ballabio, Vasighi, Consonni, & Kompany-Zareh, 2011) was used to optimize the network topology including neurons and number of epochs, resulting in 40 neurons and 150 epochs.As shown in Fig. 3, the performance of CP-ANN is not good in the CV and test sets.
In SVM, the quadratic kernel gave the best accuracy (among linear, quadratic, cubic and radial basis function) (Brereton & Lloyd, 2010).The performance of Q-SVM was better than PLS-DA and CP-ANN in terms of Acc (see Table A3 for more details of Q-SVM performance), but were still far from ideal (accuracy values were below 77.7%).In summary, going from linear PLS-DA to non-linear CP-ANN and Q-SVM improved classification performance, but results were deemed insufficient.
The RSDE outperformed other classification methods for discrimination of growth conditions.To obtain a more detailed view of the classification power of this method, the classification performance of RSDE in terms of Acc, Sen, Prec for Fr samples in OM mode is presented here; Acc value for eight classes was 79.4%, the Sen values ranged from 55.8 to 95.4% and Prec values ranged from 63.6 to 90.5% for the test set (467 spectra).Though the Acc value of RSDE (79.4%) was significantly higher than that of the closest Acc of Q-SVM (79.4% vs. 71.1%,z = 2.9389, p = .00164)the classification performance strongly showed room for improvement.Table A3 shows more details of the classification performance of RSDE in terms of Acc, Sen, Prec for Fr samples in OM mode.
One of the surprising aspects of RSDE algorithm is its insensitivity to preprocessing.In other words, conventional chemometric spectral preprocessing does not affect the performance of this algorithm and therefore, raw data can be used as input for this algorithm (Figure A1 and Table A4) (Ho, 1998;Tan et al., 2008;Zheng, Hu, Tong, & Du, 2014).Additionally, the detector of the MicroNIR is especially sensitive in the region of approximately 1425-1575 nm.In the raw spectra in Fig. 2 it could be observed that absorbance units of 3-3.5 were recorded.The detector operated at its limits in this region and noise is visible with some large spikes.Still, there were no issues in classifying the samples, including in the external model validation, indicating the power of RSDE in NIR spectroscopy.

Combining modes
In the previous analyses, we classified single NIR measurements.Of course, it is also possible to take multiple NIR scans of a sample through multiple sample handling protocols (i.e., OM, TP and TB) and to combine the spectra (Borràs et al., 2015).This is cost-insensitive as multiple measurements are easy to obtain with handheld technologies.By simply concatenating the measurements of OM, TP and TB NIR spectra, we can boost the performance of RSDE.In this manner, the spectral dimension is increased and the RSDE has more flexibility to select random subspaces and as a result classification may be improved.
Two different options for data combination were tested (i) different measurement modes (i.e., TP/TB for consumers and OM/TP/TB for food administration); and (ii) multiple spectra from same mode (i.e., OM/ OM/OM, TP/TP/TP and TB/TB/TB).To combine different measurement modes, we randomly selected one of replicate spectra from each mode TP and TB (and OM) of a sample to simulate 'uncontrolled consumer measurements'.Table 1 gives detailed classification Acc values of the RSDE when classifying the individual or combined NIR spectra.The results in Table 1 confirmed that the data combination resulted in improvement of the classification over individual modes.
For the second combination method (multiple measurements of the same mode) there was no significant improvement of the classification performance compared to the individual measurements.Single measurements closer to the meat lead to better performance of the RSDE, as can be seen from the increasing values going from column 3 to column 5. Apparently, a single measurement is already highly representative of the sample, and combining data will improve the classification performance only if new aspects of the sample are added to the data (e.g., measurements in OM, TP and TB).For a fair comparison with other classification methods, the TP/TB and OM/TP/TB combined data sets were also analysed with Q-SVM, CP-ANN and PLS-DA.In summary, the RSDE again strongly outperformed the other methods.The classification accuracy of 99.4% for the OM/TP/ TB test set was significantly higher than the 82% of the Q-SVM (z = 5.3586, p < .00001), the 65.4% of the CP-ANN and the 63.5% of the PLS-DA.Also the classification accuracy of 96.9% for the TP/TB test set was significantly higher than the 81.3% of the Q-SVM (z = 4.4773, p < .00001), the 63.6% of the CP-ANN and the 62.4% of the PLS-DA.More details on the differences between models for the classification of TP/TB and OM/TP/TB test set (160 spectra) are provided in Table A5.
Efforts to validate the developed RSDE models were made by using two shuffling methods (y-randomization and the permutation test) (Rücker, Rücker, & Meringer, 2007).After permutation, classification accuracy of the RSDE deteriorated.As an example, in the classification of growth conditions of Fr samples in OM mode, CV classification accuracy was reduced from 99.0% (non-shuffled) to an average of 17.4% for permutated data.For the combined data, it is noteworthy that the RSDE is so powerful that it can get > 80% accuracy in a training set.However, cross validation and test set reveal that no structure was present in the data, as the accuracies drop to values which are no better than random assignment (See Table A6 for more details).

Leave-class-out analysis
In the previous analyses, the RSDE model was trained on data from all growth conditions.But what if spectra from a not before seen growth condition were classified by the RSDE?To evaluate this, a final study was done based on a leave-class-out (LCO) methodology.Several RSDE models were trained (and cross-validated) using 8-1 = 7 classes, while the left out class was completely used as a test set.This method was used 8 times, such that every class was left out, and classified, once.
It is important to note that RSDE classifications are done according to the highest classification probability, regardless of the absolute value of that probability.Because there is no 'correct classification' in the LCO situation, cut-offs were imposed on the classification probability, before a classification would be accepted.This can protect a researcher from classifying highly deviating spectra.The results of the LCO analysis, with varying cut-offs are shown in Table A6.The classification accuracy of each 7-class models was over > 98.5% (see column 2).One of the a priori expectation was to find that CONV and STD (having similar growth conditions) would be classified as the other in most situations.By increasing the minimum classification probability, we expected to see this pattern more clearly.
Even when no cut-off was used for the classification probability, the results defied expectations.The CONV spectra were classified as either 1* (63.3%) or ORG (36.7%), while the majority (54.4%) of STD spectra were classified as FR.Furthermore, the majority (66.7%) of the 1* and the majority (83.3%) of ORG spectra were classified as CONV.Somewhat in line with expectations was that CF spectra were mainly classified as FR (74.7%) and FR spectra as CF (62.7%).
Though the total number of spectra that could be classified decreased when we increased the minimum classification probability the results became more distinct (See sub-tables of Table A7).For example, with a cut-off of 0.90, an unexpected pattern became apparent.Namely, the classifications reveal that Dutch fillets (1*, 2*, ORG and CONV) are always classified as other Dutch fillets, and Irish (FR, STD, CF and MAR) fillets are always classified as other Irish fillets.We expect that this pattern is related to the difference in lifetime of the chickens (Irish chickens on average live longer than Dutch chickens).Interestingly, the control fillets of the MAR condition work well in the sense that they are mostly classified as STD, and much less as the more premium fillets FR and CF.

Conclusion
A RSDE was used as a fast and reliable machine learning algorithm for authentication of the growth condition of chicken fillets and their freshness within a thoroughly validated chemometric workflow with several specific practical implementations.The RSDE considerably outperformed other common classification models such as PLS-DA, CP-ANN and SVM.Also, combining spectra improved the classification performance of this method even further.We demonstrated that the use of a relatively inexpensive portable device was able to provide very fast results in the application of NIR spectroscopy in food authenticity.Considering the measurement time of approximately 8s (~3.0s per NIR measurement and a few seconds to flip the package) a complete analysis (measure and monitor) would require approximately 20s, including data analysis.The combination of handheld NIR with RSDE algorithm may offer a very interesting and reliable tool for monitoring meat authenticity (and quality) directly in the field.
The RSDE algorithm was so powerful that it could not only clearly discriminate between NIR spectra based on the growth conditions of the chickens, the Leave-Class-Out validation provided the authors with new insights about the differences between country of origin and the differences in meat.Our analyses do indicate that some adjustments to the existing implementation are needed before the methodology can be applied in a real-world setting.Imposing minimal classification probabilities can protect from classifying meat of known origin (i.e., chicken) into just any class.However, it is not advised to use this method for classification of meats from unknown origin (i.e., other animals).Therefore, future work could implement a pre-screening based on, for example, the Mahalanobis distance of a new spectrum to the spectra of the known classes.After these adaptation, the combined approach presented in this work is very fast and if applied throughout the supply chain, it could improve the quality of meat that reaches consumers' tables in everyday life.
, the classification results for PLS-DA, CP-ANN and Q-SVM for training, validation and test sets in terms of Acc are shown in comparison with RSDE.Due to the type of subspace selection, the RSDE is only slightly affected by noise and is less prone to overfitting (shown by similar Acc values for train, validation and test sets).
Scheme A1.Details of the data collection.

Fig. A1 .
Fig. A1.Effect of preprocessing on data scattering in PCA space.Raw data (upper left), preprocessed data by mean centering (MC) and standard normal variate (SNV) (upper right), original data without outliers (bottom left) and preprocessed data after outlier removal using MC and SNV (bottom right).Red circles show extreme values which were removed in bottom figures according to Q-residuals and Hotelling's T 2 tests.

Table 1
Accuracy of RSDE (in %) for individual and combined spectra.

Table A1 Classification
results (in %) for freshness classification of chicken fillets based on Fr and Th.Sensitivity (Sen) and precision (Pre) are reported.Precision (Pre) indicates how confident one can be about the given classification Classification performance (in %) of RSDE and Q-SVM models for classification of eight growth conditions.Precision (pre) indicates how confident one can be about the given classification.The test set contained 467 spectra.The Q-SVM had the closest overall test set Acc to RSDE.Still the RSDE significantly outperformed Q-SVM (z = 2.9389, p = .00164).RSDE classification performance (in %) for raw and preprocessed Fr data in OM mode for classification of growth conditions.Accuracy of RSDE for the test set was significantly higher than the accuracies of other methods, with all p-values < .00001based on one-sided ztests.

Table A7 RSDE
Classification performance for Fr combined data (OM/TP/TB) in leave-class-out validation.Acc values (column 2) are based on 10 Fold Cross validation of the remaining 7-class data.Percentages sum to 100 (to 1 decimal place) over rows.