Evaluation of portable near-infrared spectroscopy for organic milk authentication

Organic products are vulnerable to fraud due to their premium price. Analytical methodology helps to manage the risk of fraud and due to the miniaturization of equipment, tests may nowadays even be rapidly applied onsite. The current study aimed to evaluate portable near infrared spectroscopy (NIRS) in combination with chemometrics to distinguish organic milk from other types of milk, and compare its performance with benchtop NIRS and fatty acid profiling by gas chromatography. The sample set included 37 organic retail milks and 50 non-organic retail milks (of which 36 conventional and 14 green ‘pasture’ milks). Partial least squares discriminant analysis was performed to build classification models and kernel density estimation (KDE) functions were calculated to generate non-parametric distributions for samples’ class probabilities. These distributions showed that portable NIRS was successful to distinguish organic milks from conventional milks, and so were benchtop NIRS and fatty acid profiling procedures. However, it was less successful when ‘pasture’ milks were considered too, since their patterns occasionally resembled those of the organic milk group. Fatty acid profiling was capable of distinguishing organic milks from both non-organic milks though, including the ‘pasture’ milks. This comparative study revealed that the classification performance of the portable NIRS for this application was similar to that of the benchtop NIRS.


Introduction
The appeal for organic milk has created a growing market share in the last years. In this situation, many agricultural regions in the world have experienced an organic revolution to respond to this demand. In 2015, up to 12% of all dairy sales belonged to the organic dairy market in the EU [1]. The production of organic milk was 4.4 million metric tons in 2015, which is almost double the volume of 2007. However, there is still insufficient supply due to limited production in organic systems. Meanwhile, organic milk retails at a premium price due to the higher production costs [2]. These two aspects make organic milk susceptible to fraud. Vulnerability studies of the liquid milk supply chain showed that there is limited implementation of fraud control measures in this chain in general [3]. Part of these potential measures are fraud monitoring systems. These systems require adequate methods, both in the laboratory and beyond.
Different approaches have been developed to detect some potential biomarkers for organic milk authentication such as for iodine [4] and carbon and nitrogen isotopes [5]. In addition, some other studies have focused on untargeted fingerprints to assure the authenticity of organic products in the dairy sector based, which were based on fatty acids (FAs) profiles [6,7]. These approaches have demonstrated accurate results after a series of complex sample preparation steps and professional instrument operation. Nevertheless, the market still demands faster and cheaper methods which can be performed (preferably onsite) by different tiers in the supply chain, including farmers, processors, retailers and possibly even consumers [8]. From this point of view, the portability and operability are important aspects to consider too.
Near-infrared spectroscopy (NIRS) as a fast, non-destructive method may be an interesting solution. This technique observes the characteristic reflection and absorption spectra in the NIR region (780-2500 nm). The valuable information in these spectra relates to overtones and combinations of vibrations of some characteristic bonds, such as C-H, N-H, O-H and S-H, which typically exist in all organic molecules. NIRS has been widely accepted and applied in food analysis https [9][10][11]. Furthermore, advanced techniques allow miniaturization of optical components without excessive loss of performance. These developments significantly improved the portability of NIRS systems. Some studies have been carried out to apply portable NIRS in food composition analysis including fruit ripening evaluation [9,12], palm oil adulteration [13], as well as feed safety [14]. Promising results were obtained in these studies by combining portable NIRS with suitable chemometrics.
When applying this methodology for distinguishing organic milk, it is importance to realize that in the Netherlands, there is so-called green milk ('pasture milk') which promotes the idea of being more natural through regular grazing of the cattle. In this system, cows have to be outside at least 120 days per year, for 6 h per day. It is relevant to consider pasture milk when comparing milk from the organic and conventional systems, because its composition may be somewhat similar to organic milk [7,15].
The aim of the current study was to evaluate portable NIRS in combination with chemometrics to distinguish organic milk from other types of milk (conventional and pasture milks), and compare its performance with benchtop NIRS and fatty acid profiling by gas chromatography.

Milk samples
A total of 87 cartons of full-fat, pasteurized retail milks were collected from supermarkets in the Wageningen area, Gelderland region, in the Netherlands during a period of eight weeks between May and June of 2016. The sample set included 37 organic retail milks (OM) from five brands and 50 non organic retail milks (NOM). The latter comprised 36 conventional retail milks (CM) from six brands and 14 pasture retail milks (PM) from two brands. Samples were analysed by a portable and benchtop NIRS on the day of purchase or the first consecutive day after purchase, and an aliquot of each sample was stored at − 18°C for fatty acid analysis later.

Portable NIRS: Micro-NIRS
An ultra-compact spectrometer, Micro-NIR 1700 (JDSU, Milpitas, CA/USA) with a spectral working range of 908-1676 nm and a 6 nm sampling step was selected as the portable NIRS instrument to collect the spectrometric data. The reflectance mode was selected according to default settings. The spectrometer employs a linear variable filter (LVF) as the light dispersing element and is powered by USB (500 mA, 5 V). Three ml of each milk sample was taken to a 4 ml vial (Sun Sri, Wilmington, NC, USA) and analyses were carried out in triplicate. Triplicate readings were averaged for further data analysis.

Benchtop NIRS: FT-NIRS
A NIRFlex N-500 benchtop instrument (Buchi AG, Flawil, Switzerland) was used to generate the FT-NIR spectral data. The spectrometer was equipped with six glass cuvettes (light path 2 mm) (QX 2.0 mm, Hellma Analytics, Müllheim, Germany). Each sample was scanned in the range of 1000-2500 nm with the transmission mode as default settings. A reference standard was measured before each serie to calibrate the spectrometer. Each sample was analysed in triplicate and placed randomly in different cuvettes during each serie. Triplicate readings were averaged for further data analysis.

FA by gas chromatography (GC)
The FA compositions of the milk samples were determined by a GC16958 (Agilent 7890A, Agilent Technologies, Palo Alto, CA, US) according to NEN-ISO 1740:2004 | IDF 6as fatty acid methyl esters (FAMEs) [16]. The GC was equipped with a 100 m × 0.25 mm × 0.2 µm film thickness fused silica capillary column (Varian, Palo Alto, CA) coupled to a flame ionization detector (column temperature 275°C). All the chemicals were ACS grade, and purchased from Sigma-Aldrich (St. Louis, MO, USA). A volume of 2 ml milk was weighted in a 30 ml sterile, screw top plastic bottle, mixed with 5 ml internal standard solution 500 mg of C13:0 triglyceride and 500 mg of C11:0 FAME in 250 ml tert-butyl methyl ether. To start the transesterification, 5 ml methanolic sodium methoxide solution (5%, m/v) was added, and 2 ml hexane and 10 ml neutralization solution were added after 180 s and 210 s, respectively. The mixture was vortexed for 30 s and then centrifuged for 5 min, and 1 ml of supernatant was removed with a pipette into GC amber glass vials. Each sample was weighed and measured in duplicate. Since spectrometry has a better correlation with FA concentrations in milk instead of concentrations in milk fat [17], the concentrations of FA in this research were expressed as µg/100 g liquid milk. Average values of the duplicates were used for further data analysis.

Statistical analysis
Univariate analysis was applied to the FA dataset, but FAs concentrations were firstly tested for normality by using Shapiro-Wilk. As the data did not always show normal distribution, non-parametric Kruskal-Wallis tests were applied for group comparison followed by Mann-Whitney U-test for pairwise comparison [18]. FAs with P < 0.01 after Benjamini-Hochberg (BH) adjustment were indicated as statistically significant. Principal component analysis (PCA) was carried out to explore the three multivariate datasets acquired by Micro-NIRS, FT-NIRS, and FAs by GC separately. In order to eliminate the effects of noise and to balance the weights of different variables, all three datasets were pre-processed in various ways, including auto-scaling, meancentering, smoothing, 1st derivative, log 10 transformation and multiple scattering correction (MSC). The best pre-processing combination was chosen for each dataset to get the best separated PCA distribution. The relationship between NIR spectra and FAs profiles was determined by computing the correlation coefficient between FAs concentrations and wavelength absorbances.
Considering that a higher risk of overfitting will come with nonlinear predicting algorithms, classification models were estimated by partial least squares discriminant analysis (PLS-DA) to discriminate (A) OM against CM and (B) OM against the non-organic milks CM&PM. As a linear discrimination model, PLS-DA is suitable for multi-collinearity data [19] and more robust than non-linear models [20]. The data sets were randomly divided into two sub-sets, a training set (70% of the samples of each class) and an external validation set (the remaining 30% of the samples). The training set was used to build models and internally validate the models by 500 times repeated leave-20% out cross-validation. The external validation set was used to externally validate the models after the internal validation. The performance of classification models was measured according to several parameters, including: accuracy, the overall rate of correct classification; sensitivity, the rate of correct identification; specificity, and the rate of correct rejection [21]. In our research, correct identification refers to organic milk that would be correctly classified, while correct rejection refers to non-organic milk that would be correctly classified. Besides a binary classification, all the samples were also scored by the class probability valued from 0 (OM identified) to 1 (OM rejected) and kernel density estimation (KDE) functions were applied to generate a non-parametric distribution for samples' class probability [22]. Compared with binary results, the quantitative scores of class probability is more informative and could solve the problem of resolution caused by smaller sample sets. To evaluate the discrimination capacity of models built from the three datasets (Micro-NIRS vs FT-NIRS, Micro-NIRS vs FAs and FT-NIRS vs FAs), Passing-Bablok linear regression models [23] were built. A joint test was performed to investigate if slopes = 1 and intercepts = 0  at a 95% confidence level. The acceptance of the null hypothesis (H 0 ) meant there was no difference between the two investigated approaches [9]. All the analyses were conducted by Pirouette 4.5 (Infometrix, WA, USA) and R 3.2.3 (R Foundation for Statistical Computing, Vienna, Austria).

Spectral features: Micro-NIRS and FT-NIRS
All samples were subjected to spectroscopy analyses by Micro-NIRS and FT-NIRS. The spectra obtained by Micro-NIRS and FT-NIRS are presented in Fig. 1a and b, respectively. The spectra differ due to instrument specific traits (e.g. different optical path length [24]) and mode of application (reflectance/transmission). In the spectra acquired by Micro-NIRS, as shown is Fig. 1(a), the wavelength range 1220-1390 nm shows largest separation between the groups of samples. The peak around 1340 nm relates to the presence of the combination of methyl(-CH 3 ) and methylene (-CH 2 ) groups [25]. The peak at around 1510 nm is caused by the stretching of methyl (С-Н). These bonds are likely to be strongly related to the concentration of different FAs. In the spectra acquired by FT-NIRS, as shown is Fig. 1(b), there are two main ranges where samples show separation, i.e. in the 1492-1887 nm and 2083-2381 nm ranges. The peak in the 2240-2360 nm range originates from the stretching of the methyl and methylene groups, while the peak near 1725 nm and 1760 nm is the first overtone (vibration) of methyl (-CH 3 ), methylene (-CH 2 ) and ethenyl (CH=CH-) groups [25]. The ethenyl group expresses the degree of unsaturation of the fatty acids. Monounsaturated fatty acids (MUFAs), such as oleic acid (C18:1), tend to show a peak around 1725 nm [26].
To obtain an overview of the differences of different type of milk, raw Micro-NIR spectra and raw FT-NIR spectra were subjected to PCA after pre-processing. The optimized pre-processing methods for Micro-NIR spectrum and raw FT-NIR spectrum are as follows: (1) Micro-NIR spectral data are subjected to log 10 transformation, mean-centering, MSC and 1st derivative; (2) FT-NIR spectral data are subjected to meancentering, smoothing, MSC, and 1st derivative. The scores distribution of the samples is presented in Fig. 2, which shows that OM and CM are relatively well separated, whereas PM is more widely spread, overlapping with the two other groups. This phenomenon can be explained by the more diverse management of PM. According to the rules, cows producing PM should be outside at least 4 h per day for at least 120 days per year, which may lead to large variation in fresh grass consumption and thereby milk composition. In Fig. 2a (Micro-NIRS), the first two principal components (PCs) explain 92% of total variance, whereas in Fig. 2b (FT-NIRS), the first two PCs only explain 35% of total variance. This is most likely due to the larger wavelength range of the FT-NIRS, which comprises more multidirectional variance between samples. Apparently, this larger variance cannot be reflected well by only two principal components.

FAs profiles by GC
All milks were analysed for their FA compositions. Since the concentrations of FAs were not normally distributed, non-parametric statistics were applied. There were 26 FAs selected with significantly different (P < 0.01) concentrations between milk types (Kruskal-Wallis test, Table 1). Among these 26 FAs, the three most abundant FAs were C16:0 (palmitic acid), C18:1n9c (oleic acid) and C14:0 (myristic acid), together accounting for more than 50% of total FAs. Similar dominant FAs were also found by Capuano et al. [7]. CM had significantly higher concentrations of these three FAs, compared with the other two types of milk. According to the result of pairwise comparison by Mann-Whitney U-test, OM had significantly different concentrations of 18 FAs, but only six of them had concentrations higher than 10 µg/ 100 g. This result suggests that if the focus is just on those predominant compounds, OM could be hardly distinguished from the other types of milk. Because of the nutritional expectations from consumers, polyunsaturated fatty acids (PUFAs) have drawn public attention, especially long chain PUFAs, such as eicosapentaenoic acid (EPA; 20:5n3), docosahexaenoic acid (DHA; 22:6n3) [27] and their precursor, alpha linolenic acid (ALA) [28]. In our results, these four FAs, as well as the total amount of PUFAs were significantly higher in OM, which is in line with previous observations [6,[29][30][31][32][33]. However, this level of differences is thought to have limited impact on human health [34]. On the other hand, CM and PM only had 9 and 6 discriminating FAs respectively. This means they showed fewer unique features than organic milk, according to the post hoc test in Table 1. This is due to the flexible rules of PM, making it more difficult to distinguish between CM and PM [15].

PCA and correlation of NIR spectral data and individual FAs
To obtain an overview of the characteristics of different types of milk, after optimization of the data pre-processing the FA concentrations were subjected to PCA after auto-scaling. The distribution of PCA scores is shown in Fig. 2c. The first two principal components (PCs) explain 59% of total variance. The scores plot shows the distinction Table 1 Average composition of FAs in organic (OM), pasture (PM) and conventional (CM) whole milks (µg/100 g liquid milk): mean concentrations, standard deviations in brackets, and statistical relevance of differences between milk types (P) x .  between OM and the other two types of milks. The CM and PM are mixed with each other, matching the results of the Mann-Whitney Utests. Compared with Micro-NIRS and FT-NIRS, FAs profiles contained information that allowed better separation of OM. Although the peak regions in the NIR spectral data refer to C-C, C-O and C-H bonds, which are the major structural elements of FAs, the resolution is lower, because no individual FAs can be identified. Comparing the PCA scores plots from these three techniques showed distinct differences, with FA profiles showing the best separation between groups. Fig. 3 presents the correlation between the spectral data obtained by Micro-NIRS and FT-NIRS on the one hand (horizontal), and FA profiles obtained by GC on the other hand (vertical). The Micro-NIR spectra show a more predominant correlation with the concentrations of the FAs C14:0, C14:1n5, C16:0, C18:1n9c and C20:3n6. Combined with the results in Table 1, it was found that C14:0 and C16:0 were two highly abundant long chain saturated fatty acids, whereas C18:1n9c is the most dominant unsaturated fatty acid in the milk. Wavelength ranges with higher correlation coefficients with the FAs appear in the range from 900 to 1470 nm. Similar results are observed for FT-NIRS (Fig. 3b). For the FT-NIR spectral data, FAs showing higher correlation coefficient values with longer wavelength ranges (1700-2500 nm) show similar patterns for shorter wavelength ranges (1000-1700 nm). This implies that signals in longer wavelength ranges may not provide extra information in addition to the signals in shorter wavelength ranges.

Classification models
PLS-DA models were developed for the three datasets for two comparisons, (a) OM versus NOM (CM+PM); (b) OM versus CM. The probability distributions of the two comparisons are presented in Figs. 4 and 5, respectively. Compared with binary models, KDE distribution plots provide more information than a single value [22]. Traditional binary models classify samples according to a threshold value. Samples with probability scores lower than the threshold value are classified in one group, while samples with probability scores higher than the threshold value are classified as the other group. Usually, the number of samples classified correctly will be presented. However, KDE distribution plots also show the difference between sample probability scores to the threshold value. The smaller the difference between probability scores and the threshold value, the higher the risk of misclassification. In this study, the threshold value was set as 0.5 by default, but it could be modified according to specific needs for future applications.
The Micro-NIRS dataset (Fig. 4a) shows two sub-groups in NOM, a larger sub-group on the right hand side of 0.5 and a smaller sub-group on the left hand side. Regardless of the smaller sub-group on the left, the larger sub-group seems well distributed. The tails at both sides are light and the location of the peak is far from the threshold value 0.5. Combining with the information of PM, the smaller sub-group on the left side is caused by PM samples. As regard to the distribution of OM, there are tails both at right and left hand sides, and the average score is close to the threshold value. With the removal of PM (Fig. 5a), the distribution of CM improves because the left hand side sub-group, represented by PM samples, vanishes. Therefore, OM and CM can be distinguished efficiently by Micro-NIRS, but PM is blurring the separation. Similar results were obtained from the FT-NIRS data (Fig. 4b/  5b). With the removal of PM, the tails of the distribution of CM become lighter and the scores of most CM are higher than the threshold. For the FA by GC data, however, fairly perfect separation is observed for OM and NOM as well as OM and CM. Thus, the smaller FAs may play an important role in the separation of OM and NOM, or alternatively other characteristics affect the NIRS results.
Classification results for Micro-NIRS, FT-NIRS and FA by GC are summarized in Table 2. They confirm the KDE plots showing that both Micro-NIRS and FT-NIRS result in sufficient success for OM versus CM classifications, but is less successful when PM is considered too (OM versus NOM). On the other hand, FAs by GC is very suitable to distinguish both OM and NOM, as well as OM and CM. The first five FAs with highest absolute loading scores in the model of OM versus NOM and the model of OM versus CM are ALA, EPA, C22:0, C18:2n9c11t, C24:0 and C14:1n5, C20:3n6, C16:0, C12:0, C18:1n9c, respectively. Combined with the correlation results in Fig. 3, it is revealed that FAs with higher contribution to the model of OM versus CM also show higher correlation values than those FAs with higher contribution to the model of OM  To determine if there exists any statistically different capability of prediction among the models based on Micro-NIRS, FT-NIRS and FAs by GC data, Passing-Bablok regression was applied [9]. The results are shown in Table 3. The test shows that the models based on Micro-NIRS data and FT-NIRS data have an equivalent ability to predict the identity of OM samples versus NOM samples. This means they have similar capabilities. However, for the same type of prediction there is no equivalence between the results of the NIRS methods and FAs by GC. The three approaches have the same ability though to distinguish between OM and CM (without PM present).
The main difference between Micro-NIRS and FT-NIRS is the optical device. The Micro-NIRS instrument is equipped with a linear variable filter (LVF) whereas FT-NIRS is equipped with a Michelson interferometer. Compared with the Michelson interferometer, the LVF is tiny and can be easily interpreted, but the limitations of this optical device are the low resolution and wavelength shifts that may occur [35][36][37]. Furthermore, the LVF applied in our research had a narrow wavelength range (908-1676 nm). Despite these differences, classification results were similar. In other words, higher resolution and a wider wavelength range did not significantly promote the prediction ability. This could be due to several reasons. Firstly, higher resolution and a wider wavelength range do not make any differences in detecting low concentration compounds, like OM markers. Secondly, a certain correlation among the signals of different wavelengths may exist [38], which is also shown in Fig. 3. In this case, wider wavelength range or more data points do not guarantee more information that can help to distinguish one group from another.

Conclusions
Portable NIRS (Micro-NIRS) was shown to be able to distinguish between organic and conventional milks, but will result in less successful class assignment for pasture milk samples. Benchtop NIRS (FT-NIRS) showed similar ability as Micro-NIRS to differentiate between milks. FAs by GC analysis allowed distinction of all groups well. Although not perfect, the portable NIRS shows potential as a first, on site check of the identity of organic milks, being non-inferior to benchtop NIRS for this application.