Adulterated stingless bee honey identification using VIS-NIR spectroscopy technique

The objective of this study was to study the ability of the VIS-NIR spectroscopy to classify the pure and adulterated stingless bee honey across the wavelength range of 450– 969 nm using an optical spectrometer. The physicochemical properties such as soluble solid content (SSC) and moisture content (refractive index, RI) of pure and adulterated honey has also been investigated using a refractometer. The result showed that pure stingless bee honey has the highest transmittance rate, SSC and RI value compared to adulterated honey. There are significant differences (P < 0.0001) in the transmittance rate, SSC and RI of stingless bee honey over five different types of treatments. The results also showed that VIS-NIR data were good in classifying the samples into different treatments with 99.33% accuracy rate. About thirty-four wavelengths were found to be the most significant to discriminate the different treatments by principal component analysis (PCA) and linear discriminant analysis (LDA) techniques.


Introduction
Stingless Bees (Kelulut) are social bees that exist in almost every continent, subtropical and tropical areas and live around the world (Kek et al., 2014;Vijayakumar and Jeyaraaj, 2014). There are currently more than 600 known species in 56 named genera (Cortopassi-Laurino et al., 2006). It is not until the 1860s that researchers found that another type of honey can be collected from a different species of bees called the stingless bee (Ya'akob et al., 2019). Honey produced by a stingless bee is known by several names such as Meliponine honey, pot-honey and also, kelulut honey (Amin et al., 2018).
The farming of the stingless bee honey is known as meliponiculture and is continuously growing as farmers recognize the expanding future in this area. The main reasons contributing to this is that stingless bees do not sting their offenders thus making them easier to manage and the price of the stingless bee honey is comparably higher to that of normal honey (Ávila et al., 2018). The price difference is attributed to the lower production of honey by stingless bees when compared to normal bees (Souza et al., 2006). The stingless bee only produces up to 1 -5 kg of honey per year while the honey bees produce an estimate of 20 kg of honey per year (Biluca et al., 2016;Chuttong et al., 2016).
Stingless bee honey from all over the world had been studied thoroughly in the past and demonstrates a rich and variable composition (Ávila et al., 2018). This was clarified by the fact that the composition of the stingless bee honey differs according to the floral source and origin (Amin et al., 2018). According to the Codex Alimentarius Commission (CODEX STAN 12-1981), no additives such as food ingredients and food additives shall be added into honey and should be sold as such. While the popularity of the stingless bee honey among the consumers continues to grow with time, it has also attracted attention and is targeted by some irresponsible retailers and processers of the honey by adulteration with cheaper sweetener which resulted in the loss of quality of the pure honey (Chen et al., 2011). The introduction of high-fructose corn syrup (HFCS) in the 1970s by the industry has rouse up the adulteration of honey extensively (Mehryar and Esmaiili, 2011) while some honey manufactures adulterate the pure honey with chemicals and sweeteners such as corn syrups (CS), invert syrups (IS) or high fructose inulin syrups (HFIS) (Mehryar and Esmaiili, 2011;Naila et al., 2018). A significant number of studies were published over the years to explore methods for detecting honey adulteration. The physicochemical methods have been the traditional way to detect honey adulterants such as sugar content, moisture content, pH value, HMF content and ash content (White, 1979;Codex Alimentarius, 2001;Naila et al., 2018). This method needs monotonous preparation of the samples, is relatively time -consuming and also requires complex analytical equipment (Zábrodská and Vorlová, 2015).
The Visible-Near-Infrared (VIS-NIR) spectroscopy technique has been used widely to assess food quality as it appears to be a prominent technique for nondestructive analysis of various biological and biomedical materials (Li and Yang, 2012). It is a method that can save time and is inexpensive, rapid, and non-destructive. VIS-NIR spectroscopy technique in combination with multivariate statistical techniques (chemometrics) is the most direct, reliable, and rapid method to obtain information on multiple parameters simultaneously and It is used to measure the quantity of the adulterant and was capable to classify correctly the origin of the honey (Gallardo-Velázquez et al., 2009;Li and Yang, 2012). Based on the reasons stated, the VIS-NIR spectroscopy combined with statistical method has been chosen.
In honey adulteration, water and other food sweetener incorporated are known to affect the moisture content and soluble solid content (SSC) of the honey. The relationship between the water content of the honey and the NIR spectroscopy combined with aquaphotomics has been studied by other researchers and previous results showed that NIR spectroscopy is applicable to detect honey adulteration (Bázár et al., 2016). The SSC of honey that is also studied in this research has likewise been found to be a reliable index of adulteration (Terrab et al., 2004).
Spectral data usually require computational intelligence to analyse, especially data reduction method as these analysis methods will optimise the data processing process, these processes include reducing data processing time, reduce the dimensionality of data and enhance data generalisation by lowering overfitting (Khaled et al., 2020). Due to the reasons mentioned, the data reduction methods were specifically considered for this study.
Principal component analysis (PCA) as a dimensional reduction method often unveils unsuspected relationships between variables that lead to new interpretations of the data. PCA is more of a preliminary process before further tasks since it is frequently used before larger studies such as the classification process. The classification method with Linear Discriminant Analysis (LDA) deals with the approach of minimizing the total probability of misclassification usually to find the projection that maximizes the ratio of scatter among the data of different classes to scatter within the data of the same class which depends on group probabilities and multivariate distributions of the predictors (Welch, 1939;Masnan, et al., 2012). The important trait that obtained by LDA in pattern classification is the way they bring the similar and same class data closer and the different data from the different class farther (Kim et al., 2003;Masnan et al., 2012). While Quadratic Discriminant Analysis (QDA) models the likelihood of each class as a Gaussian distribution, then uses the posterior distributions to estimate the class for a given test point (Friedman et al., 2001). Support vector machines (SVM) (Vapnik, 1982;Noble, 2006) maps the data into a higher dimensional input space and one constructs an optimal separating hyperplane in this space. SVM formulation of computer algorithm of pattern recognition (binary) problems brings massive advantages among any other approach (Mavroforakis and Theodoridis, 2006;Noble, 2006).
The objective of this study was to study the ability of the VIS-NIR spectroscopy to classify the pure and adulterated stingless bee honey across the wavelength range of 450-969nm. This research also investigates the relationship between the VIS-NIR spectroscopy properties, adulteration percentage and physicochemical properties.

Honey samples
Stingless bee honey sample was collected from Trigona itama species on January 2020 at Ladang 10, Universiti Putra Malaysia. The samples afterwards were sealed in a low-temperature freezer at -20°C for storage eISSN: 2550-2166 © 2020 The Authors. Published by Rynnye Lyan Resources FULL PAPER until further analysis. Before the spectral measurement, the samples were placed in the (Memmert WNB 14, Germany) water bath at 45°C until the soluble substance fully dissolve. All the samples are then treated with five different treatment. The treatments are pure stingless bee honey and adulterated with 60% concentration of sucrose solution at the different percentages (5%, 10%, 20%, and 30%). The samples then continued to be placed in the water bath for another 3 hrs to ensure that the solution is mixed well and dissolve any crystals present in the samples (Yanniotis et al., 2006).

VIS-NIR spectroscopy
VIS-NIR spectroscopy was performed at room temperature (25°C ~ 27°C) Spectral scanning was conducted by an Optical Spectrometer (Ocean Optic HR4000CG-UV-NIR, USA) ( Figure 1) in transmission mode with the recorded wavelength range of 450 nm to 969 nm with 2058 wavelength values measured in disposable plastic cuvettes. Before samples measurement, the readings were calibrated with light reference (open light source and empty cuvette on the cuvette holder) and dark reference (with the light source off/blocked) to get optimum and accurate results. The reading of the transmittance rate of each sample was conducted three times and averaged on each sample. At the end of the experiment, the transmittance rate values were saved as Microsoft excels files for statistical analysis.

Soluble solid content (SSC)
SSC of Stingless Bee Honey was determined by (°B rix) using Abbe Refractometer (KRUSS Digital Abbe refractometer AR2008, Germany) with an accuracy of ±0.1%. Around 0.15 mL or three drops of every honey samples are needed to carry out the test. The analysis was performed five times and averaged.

Moisture content
Moisture contents were recorded by the refractive index (RI) method. The measurement of the RI was performed using an Abbe refractometer (KRUSS Digital Abbe refractometer AR2008, Germany) the samples were kept at a constant temperature of 20°C before the reading was taken. The refractometer was then cleaned and dried before the measure of the next sample. The repeatability of the RI determination was performed three times and averaged (Department of Standards Malaysia, 2017;Codex Alimentarius, 2001) 2.5 Statistical analysis

Principal component analysis (PCA)
PCA is applied for dimensionality reduction prior to any classification rules and solving singularity issue in LDA. Reduction of data through the elimination of redundant data is important to cater to computational costs especially in handling spectral data with high dimensionality. PCA was perform using MASS package in R version 3.6.1 software (Lucent Technologies, New Jersey USA). PCA was considered in this study to reduce the dimensionality of the VIS-NIR spectral data into several significant components without changing the underlying value of the overall data (Khaled et al., 2020). Based on Figure 2, an elbow is observed at the third principal component (PC3) hence, only three principal components (PCs) were selected based on their total variance attribute. The selected data were then used as inputs for three classifiers, which were LDA, QDA, and SVM.

Linear discriminant analysis (LDA)
LDA is used to find the linear combination of features which best separate two or more classes of object or event. It required a reduction of the number of variables based on maximizing the ratio between-class vs within-class variance using linear combinations of the original variables to achieve class discrimination (Borràs et al., 2015). The use of LDA required a reduction in the number of variables. The use of PCA as a variable  FULL PAPER reduction tool before LDA requires a careful evaluation of the number of components to be used (Skrobot et al., 2007).

Quadratic discriminant analysis (QDA)
QDA classifier was used to maximise the ratio of the determinant of the between level scatter matrix. QDA is closely related to LDA but there is no assumption that the covariance of each of the classes is identical (Balabin et al., 2010).

Support Vector Machine (SVM)
The SVM is applied to optimise the marginal perpendicular distance to the hyperplanes in order to distinguish the different levels of data within the training data set. Two parallel hyperplanes are constructed on each side of the hyperplane that separates the data and maximizes the distance between the two parallel hyperplanes. Larger margin or distance between parallel hyperplanes gives better the classifier generalization error (Balabin et al., 2010).

VIS-NIR distribution of stingless bee honey at 450-969 nm wavelength between treatments.
The plot of the VIS-NIR property (transmittance rate) with five different treatments (pure honey, 5% adulteration, 10% adulteration, 20% adulteration and 30% adulteration) against wavelength at room temperature (25±2°C) is presented in Figure 3. As can be seen, the plot shows changes across the wavelength range of 450-969 nm. Shapes of the spectra of all samples were quite homogeneous where the highest peak of the graph was identified as pure stingless honey samples.
The wavelength of 787.677 nm was chosen to compare the mean between treatments for SSC and RI because it has the maximum transmittance value compared to the other wavelength and it gives the best signal-to-noise ratio, which improves the precision of measurement. In view of the results obtained, it was found that there were highly significant differences (P < 0.0001; Table 1) in the transmittance rate of stingless bee honey over 5 different types of treatments. Generally, the mean of transmittance rate (Table 2) of pure stingless bee honey is the highest compared to any adulterated samples.
The result shows that the addition of sucrose solution to stingless bee honey could increase the transmittance of stingless bee honey samples. This is assumed to occur due to the adulteration process with sucrose solution with higher moisture content and therefore will affect the viscosity of the stingless bee honey, which affects the transmittance. The decrease in the concentration of the stingless bee honey is believed to affect the VIS-NIR properties as expressed by the Lambert-Beer law, which is more commonly known as Beer's law (Equation 1). Beer's law states that the absorbance of a light-absorbing material is proportional to its concentration in solution: Where A = the absorbance of the materials, ε = the extinction coefficient of the substance, L = the sample path length and c = the molar concentration of the solution This relation is further related to the VIS-NIR transmittance properties that have been measured. Absorbance is related logarithmically to transmission:   Liu et al. (2020) where it was found that the viscosity may influence the lightness and the transmittance of the tested liquid.

Physicochemical properties of stingless bee honey samples between treatments.
The physicochemical properties results show highly significant differences in the mean of SSC and RI over 5 different treatments (P < 0.0001) (Table 3). Primarily, the mean of SSC and RI were decreased as the adulteration percentage increase, SSC mean value decreased from 69.98% to 67.47% while RI mean value decrease from 1.4652 to 1.4594 as adulteration percentage increase from 0% (unadulterated samples) to 30% (Table 4). The decreasing in the physicochemical values were expected and may be due to the addition of the sucrose solution to the Stingless bee honey samples which had increased the moisture content of the samples. Figure 4 presents the two-dimensional PC score plot based on the first two components (PC1 and PC2) that were acquired from the raw VIS-NIR spectral data.

Dimensional Reduction using Principal Component Analysis (PCA)
It shows that all samples could be graphically clustered well according to each treatment based on the first two principal components. This concludes that the transmittance rate across the wavelength of 450 nm to 969 nm can be used to demonstrate a distinct difference between pure and adulterated stingless bee honey. Table 5 shows the trace proportion of all possible linear discriminants of PCA-LDA method. Based on Table 5, the first linear discriminant (LD1) could explain 85% variation in all principal components used, while the second (LD2) and the third (LD3) linear discriminants could explain the rest of the 14.81% and 0.18% respectively. Next, the linear discriminant model was assessed in terms of the accuracy rate. Based on Table 6, one sample from the third treatment (T3: 10% adulteration) is misclassified into the first treatment (T1: Pure honey) yielding an accuracy rate of 99.33%. The description of the misclassified sample is presented in Table 7. Table 6 clearly shows that the misclassified sample which is the 16 th sample is the closest to the first treatment with a squared distance of 3.888, followed by the third treatment with a squared distance of 104.446. This justifies the misclassification of the sample.

Classification by Quadratic Discriminant Analysis based on Principal Components (PCA-QDA)
Similar to the result from PCA-LDA, Table 8 also shows that one sample from the third treatment (T3: 10% adulteration) is misclassified into the first treatment (T1:    Table 9. Table 9 shows that the misclassified sample which is the 16 th sample is the closest to the first treatment with a squared distance of 15.04, followed by the third treatment with a squared distance of 41.07.

Support Vector Machine (SVM) based on principal components (PCA-SVM)
Similar to the result from PCA-LDA and PCA-QDA, Table 10 shows that one sample from the third treatment (T3: 10% adulteration) is misclassified into the first treatment (T1: Pure honey) yielding an accuracy rate of 99.33%.
Generally, all three different methods namely PCA-LDA, PCA-QDA and PCA-SVM are equally good in classifying the samples into different treatments based on the Visible-near-infrared (VIS-NIR) data as all classification method has the same accuracy rate.

Significant wavelength to discriminate the different treatments
The significant wavelength was determined by conducting the PCA method and the top 100 wavelengths were selected with the highest loadings for the first principal component. The LDA was performed on the same wavelength to identify the first two linear discriminants. Lastly, the wavelength that was in the top 50 of both linear discriminants were selected and as much as 34 wavelengths were found to be the most significant to discriminate the different treatments. This procedure is adopted and adapted from Wang and Sousa, 2009. Significant wavelengths were selected based on the high discriminatory power of features as listed in Table 11.

Conclusion
In this study, the transmittance rate of pure stingless bee honey showed the highest mean value compared to other treatment at 787.677 nm. Pure stingless bee honey also showed highest SSC and RI value compared to the adulterated sample. There is a significant difference between pure and adulterated stingless bee honey for both SSC and RI. The results of PCA-LDA, PCA-QDA and PCA-SVM as data reduction and classification methods indicated that all these methods are equally good in classifying the samples into different treatments based on the VIS-NIR properties data with 99.33% accuracy. The PCA and linear discriminant technique revealed as much as 34 wavelengths that were found to be the most significant to discriminate the different treatments. More detailed work is planned to be carried out to acquire more accurate data that can be used to develop a rapid prediction of adulteration in stingless bee honey in the future.