Fast Verification of Buffalo’s Milk Authenticity by Mid-Infrared Spectroscopy, Analytical Measurements and Multivariate Calibration

Mid-infrared spectroscopy (MID), chemical composition and physicochemical characteristics associated with chemometrics were used to rapidly detect and quantify the amount of cow’s milk added in buffalo’s milk. A total of 165 samples, divided into buffalo’s milk, buffalo’s milk added with cow’s milk (10 to 90%) and cow’s milk were evaluated to obtain fat, protein, lactose, total and defatted solids, urea, pH, acidity, cryoscopic index and band absorbances in the spectra associated with principal component analysis (PCA), multiple linear regression (MLR) and partial least squares regression (PLS). The treatments were separated into groups by PCA, allowing the classification of samples. MLR and PLS models were able to predict cow’s milk contents in buffalo’s milk. MID and results of the analytical measures studied when associated with chemometrics are efficient in the rapid quantitative detection of adulteration in buffalo’s milk.


Introduction
The reproductive seasonality of the buffalo species associated with the high price of buffalo's milk when compared to cow's milk are the main factors responsible for the common occurrence of adulteration by partial addition of buffalo's milk. 1 For that, ensuring the authenticity of buffalo's milk in order to minimize this economic fraud is a concern for regulatory agencies, research laboratories and industries, in which researchers [2][3][4][5][6][7][8][9][10][11][12] have demonstrated the importance of development and/or adaptation of analytical methods for detections and quantifications of the type of milk present in the samples evaluated.
Verification of buffalo's milk authenticity is generally based on the identification of single markers or groups of markers, such as proteins and their fractions, peptides and amino acids by electrophoretic, chromatographic and immunological techniques. [13][14][15][16][17][18][19][20] Although these techniques have good sensitivity and generate accurate and reliable results, they are expensive, time consuming, need specialists trained in conducting laboratory activities, use chemical reagents harmful to handlers and the environment, making it difficult to implement them in monitoring programs of industries. 15 Thus, it is necessary to adapt techniques, especially nondestructive, through appropriate control methods, to be used in association with traditional/official methodologies to identify the type of milk used by industries. 16 Mid-infrared spectroscopy (MID) from numerical analytical data of constituents and their functional groups in the spectra may be a viable alternative. 5 The MID stands out for being a fast technique, easy to perform, which requires little preparation of the samples to be evaluated, besides presenting good sensitivity and low operating cost. In the study of milk, characteristic spectra are generated from the vibration of the functional groups that are part of the structure of its chemical constituents. The absorbance data of the bands in the spectra and the amount of the milk constituents, when associated with multivariate statistical analyses, produce accurate answers in the detection of adulteration, 12,17,18 enabling the development of mathematical models capable of predicting with reliable results the levels of tampering in the material under study. 19 A recent work 9 used MID study to verify the authenticity of buffalo's milk, experiments were performed with lyophilized milk samples and it was not possible to obtain a model capable of predicting the concentration of cow's milk in buffalo's milk, serving only as a screening technique. 9,12 For fluid milk, MID has not been observed so far to quantify the presence of cow's milk in buffalo's milk samples.
A relatively simple alternative, that has not yet been performed to verify the authenticity of buffalo's milk, is the association of physicochemical characteristics and chemical composition that quantitatively differentiate buffalo's milk from cow's milk with multivariate analysis, allowing application of these data, routinely obtained by laboratories of dairy industries that process this type of milk, to the statistical models generated.
In this regard, the objective was to use the MID, chemical composition and physicochemical characteristics associated with multivariate statistical methods for rapid detection and quantification of cow's milk in buffalo's milk.

Sampling and formulation
Buffalo and cow's milks were collected from May to August 2017, during mornings. Bovine milk was obtained by mechanical milking of crossbred cows (Dutch × Zebu) fed on pasture (Brachiaria decumbens), while buffalo's milk was acquired from manual milking of crossbred buffalo (Jafarabadi × Murrah) females fed on pasture (Brachiaria decumbens), both in adequate hygienicsanitary conditions. After the collections, the samples were made in 15 repetitions, totaling 165 experimental units, by adding variable and increasing amounts of cow's milk (10,20,30,40,50,60,70,80 and 90%) to the buffalo's, as well as formulations made exclusively with buffalo and cow's milk.

Chemical composition, physicochemical characteristics and MID
A 40 mL aliquot of each milk formulation was heated in water bath of 40 ± 2 °C and analyzed with MID (DairySpec FT, Bentley Instruments, Inc., Chaska, Minnesota, USA) from 1000 to 3000 cm −1 , obtaining fat, protein, lactose, total solids (TS), defatted solids (DS), urea and cryoscopic index, as well as acidity measurements were made using 10 mL of samples and 1 mL of phenolphthalein indicator and titrating with NaOH (0.1 mol L −1 ) and pH determinations were made using the glass electrode method, by a direct reading pH meter calibrated with solutions of pH 7.00 and 4.01. 20 The results of the analytical measurements and the absorbance data of the obtained spectra were used to calibrate the content of cow's milk added to the buffalo's milk.

Experimental design
The sample data set was organized in matrices formed by m × n elements (m rows corresponding to the treatments and n columns corresponding to the variables), constructed for the measurements of chemical composition and physicochemical characteristics (matrix A 1 ), absorbance data of specific bands of MID (matrix A 2 ) and absorbance of the full MID spectra (matrix A 3 ). The Statistical Analysis System software 21 was used to conduct multivariate statistical analyses and to remove outliers from the Cook's distance, in which samples with a distance ≥ 2 were discarded individually for each matrix.

Principal component analysis (PCA)
PCA was performed with the transformation of the original data matrices (matrices A 1 and A 2 ) into covariance matrices, which express the individual variances and linear combinations (covariances) between two variables. The eigenvalues and their respective normalized eigenvectors were used to construct the principal components (PC). To set the number of PC's, the interpretability criteria was considered and the eigenvalue diagram, therefore the PC that had the highest proportions of variance of the original attributes (above 70% of variance) were selected along with the PC's related to eigenvalues that caused significant changes in variance.

Multiple linear regression (MLR)
The original data from the matrix A 1 were standardized (mean value = 0 and standard deviation = 1), eliminating the differences between the measurement units of the variables. Multicollinearity analysis was performed between the variables, so that low multicollinearity values did not influence the estimation of β values of the regression equations, minimizing the errors. 22 Data were randomized and separated into two sets (70% for training and 30% for validation) by the Kennard-Stone algorithm. 23 Three forms of model selection were tested: backward, forward, and stepwise, 24 and the model was chosen based on the highest coefficient of determination (R 2 ), root mean square error of training (RMSET) and root mean square error of validation (RMSEV), along with the evaluation of the number of parameters, the performance/ deviation ratio (RPD) and the error rate (RER). 25

Partial least squares regression (PLS)
Data from matrix A 3 were randomized and divided into training (70%) and validation (30%), 23 evaluated by PLS in order to develop a multivariate calibration model. The number of latent variables in the model was tested according to the method proposed in the software, 21 in which the T 2 test was used to determine the significant differences of two predicted residual sum of squares (PRESS) values. If there were no significant differences, the models were chosen and only those having fewer factors than the minimum PRESS model were compared. After selecting the number of latent variables, the PLS model was used to predict the concentration of cow's milk in buffalo's milk. The values of R 2 , RMSET, RMSEV, correlation coefficient, RPD and RER were obtained to explain the prediction capacity of the model. 25

Chemical composition and physicochemical characteristics
The results of chemical composition and physicochemical characteristics of the evaluated milk samples presented variations when compared to the studied treatments ( Table 1).
The milk consists mainly of water, where solid components are dissolved or dispersed, constituting the TS formed by proteins, lipids, carbohydrates, vitamins and minerals. From the total solids, the lipid content is subtracted, obtaining the defatted solids (DS). 26 Due to the peculiar characteristics of the buffalo species, associated with non-genetic factors such as lactation stage, feeding strategies and udder health, among others, buffalo's milk, when compared to cow's milk, usually presents higher contents of fat, lactose, protein, TS, DS, vitamins, minerals and acidity due to the amount of proteins with acid characteristics and lower values for water content, cryoscopic index, 27,28 facilitating the differentiation between the present treatments study.  9,17,18,28 The milk spectra of the two species did not present distinct bands, but differences were observed regarding the absorption intensity. The cow's milk samples presented bands with lower intensity than the buffalo's milk samples. This was due to the higher percentages of water in bovine milk samples, thus, lower levels of lipids, proteins and carbohydrates, for the intensity of the vibrational modes of the functional groups is related to the contents of the present constituents, 28 since the intensity of the vibrational modes is proportional to the concentration of the constituents. The spectra of buffalo and bovine treatments were overlapped on 9 levels of tampering (10 to 90%) ( Figure 2). Spectra with the lowest levels of adulteration showed similar behavior to buffalo's and, as cow's milk was added to buffalo's milk, the treatments had characteristics similar to bovine milk. The peaks are caused due to the vibration of functional groups present mainly in the structure of milk fat, protein and lactose, so that any changes in the contents of these components caused variations in the behavior of the milk spectrum bands, due to peculiar characteristics in the milk of the different species.

Chemometrics Principal component analysis (PCA)
The classification of buffalo and bovine treatments by PCA (Figure 3) showed differences between the samples of the two species, as well as the relationships between the evaluated milk formulations and the analytical measures. By the interpretability criteria, associated with the eigenvalue diagram, 2 PC's with the highest proportions of variance of the original attributes were selected.
For the matrix A 1 , the principal component 1 (PC1) explained 95.51% and the principal component 2 (PC2) 3.62% of the total data variance, sufficing to discriminate the treatments regarding the chemical composition and  physicochemical characteristics studied. The most relevant data information is contained in PC1, which behaved as a combination of fat, protein, lactose, TS, DS, urea and acidity (positive measures) against cryoscopic index and pH (negative). Thus, buffalo's milk samples had superior results of positively correlated measurements when compared to cow's milk samples, hence the spatial separation into two distinct groups by PCA.
For the matrix A 2 , PC1 explained 87.72% of the total data variance, while PC2 6.54%, thus the samples also separated themselves in relation to PC1. All variables (13 bands) were significantly correlated with PC1. As cow's milk was added to buffalo's milk, the absorbance values of the bands decreased, causing changes in the location of samples with higher percentages of bovine milk to negative regions of PC1, since there were lower vibration intensities of the functional groups present in the sample constituents, due to lower presence of constituents in cow's milk when compared to buffalo's. Thus, buffalo's and cow's milk samples presented distinct characteristics. For the differentiaton between the treatments, the 8 fat-related bands, 3 protein-related bands and the 2 lactose-related bands influenced.
The PC's, both in relation to composition ( Figure S1, Supplementary Information (SI) section) and absorbance data ( Figure S2, SI section), were able to explain most of the total variance of the original data, with the highest explanation percentages being attributed to PC1 (significant positive correlation between fat, protein, lactose, TS, DS, urea and acidity levels for matrix A 1 and for all 13 absorption peaks of matrix A 2 ). As adulteration rates increased in buffalo's milk, samples moved toward the negative PC1 scale, which displays a decrease in the constituent values (except pH and cryoscopic index) and, consequently, in the absorbance of the 13 bands of the spectra, allowing the separation/classification into distinct groups. By PCA ( Figures S1 and S2, SI section), it can be stated that from 20% of adulteration it is possible to observe the separation of buffalo's milk samples from adulterated samples and cow's milk samples.

Multiple linear regressions (MLR)
The MLR model was chosen by the backward technique (Figure 4a). The multicollinearity between the variables was low (≤ 100), hence there was no high degree of correlation between the independent variables, thus the parameters were estimated with precision. The model was able to predict the amount of bovine milk in buffalo's milk (Figure 4b) from the chemical composition and physicochemical characteristics data (matrix A 1 ), since there were no high errors in the comparisons between the values predicted by the models and the experimental values.
Among the composition variables and physicochemical characteristics only cryoscopic index, protein and TS were not significant, therefore they are not relevant for the quantification of cow's milk content in buffalo's milk. The model presented R 2 = 0.9844, RMSET = 4.73% and RMSEV = 4.04%, correlation = 0.9922 (for the validation data), besides low multicollinearity, RPD = 7.89 and RER = 24.78, thus, the model is able to predict a constituent with good practical applicability. This demonstrates that variables that are commonly obtained on a daily basis or are part of the routine of milk quality control laboratories can be used to verify the authenticity of buffalo's milk. As the percentages of cow's milk increased, the components of buffalo's milk were diluted, allowing the quantification of cow's milk in the formulations, enabling the verification of the authenticity of buffalo's milk by the dairy industries that use buffalo's milk as feedstock.

Partial least squares regression (PLS)
Spectral data (matrix A 3 ) were used to predict the levels of cow's milk in buffalo's milk by PLS, a statistical technique that allows the adjustment of highly correlated variables, a common fact in MID data. The adulteration levels studied were quantified due to the decrease in peak intensities of the MID spectra as cow's milk was added to the formulations (Figure 2).
The R 2 of the training model was 0.9877, requiring 7 latent variables to explain the variance of the data with greater precision. The correlation between the predicted values by the model and the experimental values for the validation data was 0.9938 ( Figure 5), with RMSET and RMSEV of 3.484 and 3.481%, respectively, and RPD = 9 and RER = 28.72, indicating good ability to detect and quantify the presence of cow's milk in buffalo's milk with good practical applicability.
Some studies used fast methods associated with multivariate statistics to verify the authenticity of buffalo's milk. Silva et al., 9 in a study with lyophilized milk samples and ATR-FTIR (attenuated total reflectance-Fourier transform infrared spectroscopy) associated with PCA and artificial neural network (ANN), found differences between the milk of the species only from 40% of adulteration, and could not assess the content of cow's milk added to the buffalo's milk, only its presence. On the other hand, Velioglu et al., 15 when researching the authenticity of buffalo's milk by fluorescence spectroscopy associated with PCA and PLS found models with R 2 of 0.98 and RMSET and RMSEV of 2 and 4%, respectively, results similar to the ones found in this search.
Regarding the effectiveness of the tested models, it was found that both models (MLR and PLS) presented compatible results. The PLS modeling for the spectra data and MLR from the results of chemical composition and physicochemical characteristics proved to be efficient to verify the authenticity of buffalo's milk, with the capacity of quantification when cow's milk is present in the samples assessed, since they have good RMSE (≤ 4.73%), R 2 (≥ 0.9844) and correlation index (≥ 0.9922). The  MID technique, in addition to being a reliable, fast and accurate technique, enabling its use in dairy industries and surveillance laboratories, can be applied directly to milk without the need for sample preparation and the use of reagents harmful to handlers and the environment. Data on chemical composition and physicochemical characteristics can be used by research laboratories that routinely analyze cow's and buffalo's milk.
Compared to cow's milk, buffalo's milk has higher contents and is marketed at high prices. Associated with this, there is the issue of decreased availability of buffalo's milk at certain times of the year, usually in the months with spring-summer seasons. The use of chemometric combined with physicochemical characteristics and MID data can detect the adulteration in buffalo's milk starting at 10% of addition of cow's milk, offering alternative strategies that may provide greater power of inspection to the dairy industry that perform the processing of buffalo's milk and the supervisory authorities. The authenticity assessment of buffalo's milk in the samples collection places using portable devices can provide a fast and cheap option for the results obtention, as long as these devices can correctly quantify the value os the chemical composition and physicochemical characeristics and/or MID used in this study.

Conclusions
Techniques have been developed to verify the authenticity of buffalo's milk, based on analyses of chemical composition and physicochemical characteristics performed routinely and from MID spectra. Both techniques, when associated with chemometrics, showed compatible results in the detection of tampering.
PCA allowed the differentiation between the treatments in different regions in the two-dimensional graphs, allowing the classification of the samples according to the levels of tampering. The MLR and PLS models applied to the chemical composition, physicochemical characteristics and MID data were effective in quantifying cow's milk levels in buffalo's milk.
Due to the accuracy and speed of MID analyses, the use of the technique becomes a viable alternative to verify the authenticity of buffalo's milk.