Hyperspectral identification of oil adulteration using machine learning techniques

Food adulteration is a global concern, drawing attention from safety authorities due to its potential health risks. Detecting and categorizing oil adulteration is crucial for consumer safety and food industry integrity. This research explores hyperspectral imaging (HSI) analysis to identify substandard oil adulteration at different stages. Using the non-destructive HSI Specim Fx 10 system, a method for precise and easy imaging-based fraud detection and classification was proposed. The 670 oil samples, including pure (Almond, Mustard, Coconut, Olive) and adulterated (Sunflower, Castor, Liquid Paraffin), were analyzed. The Savitzky-Golay filter preprocessed the images to remove noise and smooth spectral signatures. The oils were identified using various machine learning approaches, including Support Vector Machines, Logistic Regression, Linear Discriminant Analysis, Random Forests, Decision Trees, K-Nearest Neighbors, and Naïve Bayes with Linear Discriminant Analysis excelling in identification. Performance parameters, including precision, recall, F1-score, and overall accuracy, were calculated. The proposed method achieved a validation accuracy of 100%, outperforming numerous state-of-the-art approaches. This study introduces a robust pipeline for effective oil adulteration detection, offering a significant advancement in food safety and quality control.


Introduction
Manual detection and classification of oil adulteration in the food industry are acknowledged as time-consuming, prone to inaccuracies, and challenging (Johnson, 2014).This concern stems from the significant risks to public health and the erosion of consumer trust caused by the pervasive issue of oil adulteration (Chaudhari et al., 2021).The growing global challenge of adulterated cooking oil has adverse health effects, making individuals more susceptible to diseases such as cancer and cardiac conditions.An alarming case in Spain, where rapeseed oil adulteration led to 600 deaths and affected 300 individuals, underscores the severity of this issue (Posada De La Paz et al., 2001).To address these challenges, an intelligent system for automatic oil adulteration detection and classification is imperative (Samuel and Geetha, 2014).Despite the critical role edible oils play in human nutrition (Lima and Block, 2019;Gonzalez-Ortega et al., 2024), challenges in their production and quality control persist (Review, 2023), with oil adulteration detection being a significant hurdle (Wei et al., 2023;Wahrburg et al., 2002).Adulteration involves the addition of cheaper oils or impurities to pure oil to enhance volume, shelf life, and appearance (Pal, 2011).Various detection methods, including physical and chemical tests (Heuscher et al., 2005), gas chromatography (Abed and Khairy, 2023), and high performance liquid chromatography (Abed et al., 2023), have been explored but are often time-consuming, costly (Portarena et al., 2019), and not universally effective.In response to these challenges, hyperspectral imaging (HSI) emerges as a promising solution, combining spectral and spatial information analysis with machine learning (ML) algorithms for rapid and accurate detection of various oil adulterants (Lozano-Garzón et al., 2023;Choudhary et al., 2020).
The proposed intelligent system employs HSI technology to capture spectral and spatial data from oil samples, which is then analyzed using ML algorithms like Linear Regression (LR) (Mohammadinia et al., 2023), Random Forest (RF) (Tie et al., 2024), Decision Tree (DT) (Raj et al., 2023), Support Vector Machine (SVM) (Lu et al., 2023), K-nearest Neighbor (KNN) (Saifullah et al., 2023), and Linear Discriminant Analysis (LDA) (da Cruz Souza et al., 2023).HSI's non-destructive nature and high accuracy offer the potential to detect a wide range of adulterants (Wang et al., 2023).The system aims to revolutionize oil adulteration detection by providing a more comprehensive understanding of the chemical characteristics of oil samples.The study involves four pure oils-Almond, Mustard, Coconut, and Olive-adulterated with Sunflower, Castor, and liquid Paraffin.Machine-learning algorithms, including SVM, LR, LDA, RF, DT, KNN, and Naïve Bayes, are applied for oil adulteration classification, with the ultimate goal of designing an intelligent, fast, accurate, and cost-effective system for the food industry.
The research methodology includes the acquisition of oil samples, HSI data capture using a Specim Fx 10 push broom/line scanner system, determination of reflectance and absorption, and estimation of mean absorption spectra signatures for Regions of Interest (ROI).Preprocessing involves noise removal and smoothing using the Savitzky-Golay filter, followed by conversion into data text or CSV files for further analysis.A novel algorithm for oil adulteration detection and classification using ML models is proposed, with LDA identified as the optimal model.Performance parameters are calculated for model evaluation, highlighting the system's effectiveness in detecting and preventing oil adulteration in the food industry.
In conclusion, the integration of hyperspectral imaging and ML algorithms presents a promising and innovative approach to revolutionize oil adulteration detection and classification in the food industry, ensuring the quality and safety of edible oils.The proposed intelligent system addresses the limitations of traditional methods and offers a comprehensive solution to the complex and global challenge of oil adulteration.This paper makes the following contributions.
The study conducted a thorough analysis of various commercial oils, including paraffin, coconut, almond, mustard, castor, olive, and sunflower oils.Utilized a push broom/line scanner HSI system (Specim Fx 10) to obtain high-resolution spectral data (224 bands) with spatial information for each oil sample.Implemented the Lumo Scanner software for data processing, calculating reflectance, and generating absorption spectra signatures for each oil sample.Employed noise removal techniques such as median, Savitzky-Golay, steady, and moving average filters to enhance the quality of the spectral data.Trained machine learning models (SVM, LR, LDA, RF, DT, KNN, and Naïve Bayes) using preprocessed spectral data to distinguish between pure and adulterated oil classes.Computed various performance metrics, including recall, kappa, F1-score, overall accuracy, and precision, to assess the reliability and efficiency of the models.

Materials and methods
Although adulteration in edible oil is a major problem that negatively impacts human health worldwide, edible oil plays a vital role in preserving human health.The adulteration of expensive and mostly useable cooking oil in Pakistan is the subject of this study.The following procedures have been used in conducting the research.
• Samples of paraffin, coconut, almond, mustard, castor, olive, and sunflower oils were bought from the commercial market.Pure Almond, Coconut, Olive, and Mustard oils are tainted with Sunflower, Liquid Paraffin, and Castor oils.In the second phase, oil samples were obtained utilizing a push broom/line scanner HSI system (Specim Fx 10).Using Lumo Scanner software, 224 spectral bands (λ) with 100 × 512 spatial pixels were attained for the acquired sample.• Reflectance was calculated using the oil data that was collected.This work allows the use of a complete hyperspectral imaging (HSI), however large datasets are required for machine learning model training.As such, we choose to extract several Regions of Interest (ROIs) from a single image.Selected pixel values are used to create these ROIs, which contain spatial information.Since every pixel contains a spectrum of spectral data, we calculate the average of these spectra.For every ROI that was chosen, the mean reflectance spectra signature was determined.But because oil has the ability to absorb light, the reflectance spectra were used to calculate the absorption spectra signature for each oil sample.From each mean ROI, 224 features in total were extracted.In order to facilitate smooth preprocessing, noise and other undesired effects were removed from the spectral signature of the raw, captured oil samples (pure and tampered with).The noise in the oil samples was removed using median, Savitzky-Golay, steady, and moving average filters; the spectral signature was smoothed using smoothing filters.The Savitzky-Golay filter was found to be the most effective among other filters at removing noise and smoothing the samples' spectral signatures after a comparison of all the resulting spectral signatures.• The entire dataset was divided into training (80%) and testing (20%) files after the Savitzky-Golay spectral signatures were converted to data text or a CSV file.ML models were trained using preprocessed spectral data.The model is a computer programme that is capable of learning to distinguish between various pure and adulterated classes based on specific spectral signatures.The pure and adulterated classes were predicted by the ML models (SVM, LR, LDA, RF, DT, KNN, and Naïve Bayes) using a spectral signature.• In addition, it has been determined that LDA is the most effective model that has been suggested for the detection and classification of oil adulteration after comparing these ML models.Ultimately, performance metrics such as recall, kappa, F1-score, overall accuracy, and precision were computed to assess the model.The suggested model's general workflow is shown in Fig. 1.

Dataset collection and acquisition
An HSI camera (FX-10, Specim, Spectral Imaging Ltd., and Oulu, Finland) was used in this experiment.The camera came with a unique Scheiner lens (Cinegon 1.4/8 mm) preinstalled.The camera was placed on a lab scanner to scan the entire sample because it operates on the line scan principle, which involves scanning one thin line of the object with its complete spectrum at a time.The scanner is equipped with three movable 40 × 20 cm platforms, a mounting plate for the camera, and three halogen lights for illumination.The camera mounting plate can be adjusted in height so that it always sits 6 cm above the sample.After a few cycles, the height was adjusted to ensure that the field of vision closely matched the sample.The camera's spatial resolution was enhanced.
Samples of mustard, castor, sunflower, almond, olive, coconut, and paraffin oil were bought from a local commercial market in Pakistan.In order to represent the contaminated oil, which ranges in volume from 1 to 25 ml, 10 samples of each pure oils were first taken.During the adulteration process, varying concentrations of different oils were added in increments from 1 ml to 15 ml, with 10 samples taken for each concentration.The all oils were treated in the same way: pure almond oil was tainted with sunflower oil; pure mustard oil was tainted with castor oil; pure olive oil was tainted with sunflower oil and last, pure coconut oil was used as pure oil and tainted with liquid paraffin.We were able to produce 70 pure samples and 600 contaminated samples using this method.Table .1 contains comprehensive details about every oil sample.Purified castor oil (class "1″), pure mustard oil (class "2″), pure olive oil (class "3″), pure sunflower oil (class "4″), pure coconut oil (class "5″), pure liquid Paraffin oil (class "6″), pure almond oil (class "7″), pure mustard + castor oil adulterated (class "8″), pure olive + sunflower oil adulterated (class "9″), pure coconut + liquid Paraffin oil adulterated (class "10″), and pure almond + sunflower oil adulterated (class "11").There are 5500 rows in the CSV file, with 224 columns.
Each row in the dataset represents a specific observation by each selected ROI, which could correspond to an HSI captured from a particular oil sample.The 224 columns may correspond to different spectral bands or features extracted from each pixel within the hyperspectral recorded image.These spectral bands capture information across various wavelengths, allowing for a detailed characterization of the chemical composition of the oil.The dataset contains spectral data for both pure and adulterated oil samples.Each row would represent a different selected ROI spectral information of the oil sample, while the columns would represent the spectral features extracted from the hyperspectral image of that sample.
• A Petri dish with the prepared samples in it is seen in Fig. 2. Samples were levelled using a surface leveler to avoid shadowing and blurring.The levelled samples were put on the scanner's moving bed.As a point of comparison, a white tile with a 99.9% reflectance rating was set up on the moving platform next to the samples.The reference item was placed in the same spot as each sample for the purpose of calculating reflectance.Next, the platform was configured to move at a speed of 20 m/s, and the frame rate of the camera was set to 50 Hz.
The aperture gathering samples of the lens was manually adjusted for each example in order to adjust the exposure.The sample data was gathered using the methodology described below (see Fig. 3).
• With the camera's shutter closed, 100 dark frames were captured.Many frames were recorded in order to capture the sensor's response in its entirety.• The camera captured 100 frames of the white tile, with the white reference placed where the lab scanner translational platform had been relocated.• The translational platform was moved to the sample's location and line-by-line scanned.
The Lumo Scanner Software was updated to alter the white reference, target start, and target stop locations in addition to the ratio of

Table 1
Samples prepared for each pure and adulterant.dark to white frames in order to control the position of the global platform and sample collection.Three raw files in total-a dark reference, a white reference, and the sample-were retained for every sample.Using the object's spectral responses at 430, 510, and 670 nm, a false-color RGB image of the scanned sample was produced in addition to these files.The sample and reference data were loaded into MATLAB as a 3D cube using the multiband-read command.

Spectral reflectance
The spectral radiance that the hyperspectral sensor records is influenced by various factors such as atmospheric effects, illumination, object geometry, and sensor properties, which can lead to inaccurate results.To account for those effects, the imperial line approach was used to determine the object's true reflectance.In order to eliminate atmospheric effects from reflectance calculations, a dark and white reference is required.An opaque black cap was used to completely cover the camera lens in order to obtain the dark reference B, and a white tile was scanned in order to record the white reference W. All of the samples were prepared in accordance with equation (1).
Where Rad is the spectral radiance recorded by the HSI-system, with 0% reflectance for B and 99.9% for W, and R is the normalized accurate reflectance cube.Additionally, by utilizing the oil reflectance, it is able to compute each oil absorption spectral signature because of its transparency.Equation (2) provides a formula for computing the absorption values of oil samples.where A stands for absorption and R for the cube values of reflectance.
The chemical bond composition is represented by the computed absorption spectrum.

Classification methods
Hyperspectral imaging offers a sophisticated approach to detect oil adulteration by analyzing spectral signatures across a broad range of wavelengths.This method leverages the unique spectral properties of different oil types, allowing for precise classification and identification of adulterants.By capturing a multitude of spectral bands, hyperspectral imaging provides detailed information about the chemical composition of oils, enabling the detection of even subtle alterations.Advanced classification algorithms (SVM, LR, LDA, RF, DT, KNN, and Naïve Bayes) are employed to interpret the spectral data and distinguish between genuine oils and adulterants, facilitating rapid and accurate detection.This technology holds significant promise for ensuring the authenticity and quality of oils in various industries, including food, pharmaceuticals, and cosmetics, ultimately safeguarding consumer health and trust.

Support Vector Machine
SVM can be used in the context of HSI data-based oil adulteration detection and classification to precisely distinguish between pure and adulterated oil.The best hyperplane in a high-dimensional feature space for separating data points into different classes is found by the SVM model.In this work, pure and diluted oils at varying concentrations make up the classes, and the spectral signatures taken from the HSI data define the feature space.Preprocessed data are separated into two sets: testing and training.The model is trained on the training set, and its performance is assessed using a 10-fold test on the testing set for cross validation.The typical metrics are used to assess the model's performance in accurately classifying pure and adulterated oils at various stages, including accuracy, precision, recall, F1-score, and kappa.Optimizing the hyperparameters kernel = 'linear', C = 10, decision_-function_shape = 'ovo', and random_state = 10 is necessary to attain optimal performance in SVM.

Logistic Regression
A potent ML algorithm called LR Model for HSI Data was created to correlate target variables with spectral information that was taken from hyperspectral images.Because the features and the target have a linear relationship, it is perfect for qualitative analysis and estimation of pure and adulterated oils at various stages.To predict the target adulterated and pure class for newly discovered, unseen hyperspectral images, the LR model is trained on labelled hyperspectral data.Table .2shows how 10-fold metrics like recall, accuracy, precision, F1-score, and kappa value are used to assess the performance of the model.

Linear Discriminant Analysis
Oil adulteration in hyperspectral imaging data is detected and classified using LDA, a supervised ML technique.To maximize the separation between various classes of oil samples, such as pure oil and adulterated oil, a linear combination of spectral bands is used.Following hyperspectral data preprocessing and the extraction of pertinent spectral signatures from ROI, LDA is applied.It assists in finding trends in the data that set pure oil samples apart from tainted ones.Table .2illustrates how feature selection, appropriate model evaluation, and data quality all affect how effective LDA is.

Random Forest
In order to identify and categorize oil adulteration in HSI data, RF is a potent technique.To create precise classifications, it builds several decision trees during training and merges their forecasts.Highdimensional data is a good fit for RF, which can also automatically choose key spectral characteristics to differentiate between pure and tainted oils samples.It can handle noisy data, is resilient, and is less likely to overfit.RF is a useful tool for tasks involving the detection and classification of oil adulteration because of its capacity to provide feature importance rankings, which aid in the understanding of the key discriminative features in the data.A summary of each class's classification is shown in Table .2.

Decision Tree
The DT is used to identify and categorize oil adulteration in HSI data.It divides the data recursively according to various spectral features, resulting in a structure resembling a tree.Every leaf node represents the class (pure or adulterated oil), and every internal node represents a choice made in response to a particular feature.The use of multiple DTs combined to increase classification accuracy and robustness.

K-Nearest Neighbors
KNN works by using a similarity metric, like Euclidean distance, to find the two-nearest data points (samples) to a new data point (test sample).The majority of the k-nearest neighbors are used to categorize samples of pure and contaminated oil.KNN is simple to use and understand because it is non-parametric and doesn't require any training when creating the model.For the detection and classification of oil adulteration, KNN is a helpful model, particularly in cases where the dataset is small.The classification report for each class is displayed in Table .2.

Naïve Bayes
In HSI data, oil adulteration is detected and classified using the Naïve Bayes model.The method, which is based on the Bayes theorem, determines which class a sample belongs to base on the observed features and computes the probability that the sample belongs to a pure and adulterated class.It functions effectively with high-dimensional data and is computationally efficient.The classification summary reports for every class are displayed in Table .2.

Results and discussion
This study presents the most inventive and dependable approach to oil adulteration detection and classification using hyperspectral imaging and ML.The experimental results show that the various ML techniques combined in this study provide an accurate, sensitive, and effective way to guarantee the quality and authenticity of edible oils.Improved consumer safety and increased trust among various stakeholders are two potential advantages of the suggested model for the food industry, which rely on its importance in preserving the integrity of the global food supply chain.To preserve human health and reduce health risks that could ultimately endanger lives, we proposed a method to detect adulteration in cooking oils.By gathering an enormous amount of spectral information for every pixel in an image over a broad range of wavelengths, HSI provides a state-of-the-art solution.A thorough understanding of the physical and chemical makeup of the substances being investigated is possible thanks to this high-dimensional data.HSI can be used to distinguish between genuine oils and those that have been tainted or combined with inferior oils when analyzing edible oils.Furthermore, the suggested hyperspectral (400-1000) imaging-based model's non-destructive and instantaneous checking properties make it a perfect tool for quality assurance and control in the food processing sector.The model provides both spatial and spectral data for pure and contaminated oil.Certain costly and widely used cooking oils are utilized in developing countries like Pakistan.Samples of locally accessible liquid paraffin oil, castor, sunflower, and coconut oils, as well as pure mustard, almond, olive, and coconut oils and their adulterants, were used in this investigation.Following that, spectral signature was computed using each oil's reflectance to determine the absorption spectral signature for each oil sample due to oil transparency.The suggested Savitzky-Golay filter increases the accuracy of the built model by eliminating noise from the spectral signature.The Naïve Bayes, SVM, LR, LDA, RF, DT, and KNN models are used for classification.

Preprocessing of spectral analysis
Although the spectral signature of oil sample absorption contains extremely sensitive data, it also contains systematic noise (scattering and ambient light).These noises have the potential to introduce errors into the obtained spectral data and compromise the ML model's accuracy.Numerous mathematical methods are available to eliminate the noise from the obtained spectral data.Certain mathematical strategies or approaches that work best eliminate the impact of errors from spectral data based on the hyper-cube's nature.In order to remove noise from the obtained spectral data, we employed the following filters: moving average, median, Savitzky-Golay, and steady.We then compared the outcomes to determine which filter performed best.Fig. 4 illustrates the results of comparing all the resulting spectral signatures and finds that the Savitzky-Golay filter works best for removing noise and smoothing the spectral signature of oil samples.

Spectral analysis of pure and adulterated oil samples
Using the spectral characteristics of various oils, spectral analysis offers a quick and non-destructive way to identify oil adulteration.In order to identify adulterants, ensure the integrity and purity of oil products, and monitor adulteration, it can be especially helpful in quality control and monitoring.Through the use of quantity level changes, the spectral characteristics of various adulterants are examined in order to identify distinct signatures that allow for their differentiation.
This process is known as spectral analysis.Every pixel in an image has its spectral information recorded by HSI over a wavelength range of (400-1000 nm).Fig. 5 displays the signatures of pure and contaminated oil derived from an analysis of the distinct spectral signatures of various oils.
In Fig. 6, the nutritional makeup of contaminated olive and sunflower oils can be compromised.The nutrients found in sunflower and olive oils can vary depending on the number of antioxidants, vitamins, and fatty acids they contain.Buying oils from reliable sources is essential Fig. 4. Compressed spectral signatures of oils (a-d) using absorption, steady, median, average, and Savitzky-Golay smoothing filters.
to ensuring their authenticity and quality, as adulteration is both illegal and unethical.Examining the spectral properties of pure and adulterated oil samples allows for the identification of differences and the detection of adulterants.Fig. 6 shows that the majority of the samples' spectral signatures overlap at various spectral line locations.As seen in Fig. 6 (a to d), spectra clearly illustrate all similarities and changes.Each spectral signature displays a sharp curve with varying wavelengths (in nanometers).

Pipeline models comparison
With HSI, a broad range of electromagnetic wavelengths are captured, resulting in full spectral signature information for every pixel in an image of an oil sample.This allows the spectral signatures of pure and adulterated oils to be distinguished.The objective of oil adulteration detection is to locate and measure any adulterant oil substances present in pure oil.The chemical compositions and spectral signatures of various oils differ.ML models are able to distinguish between the spectral signatures of pure and contaminated oil.ML-based HSI systems identify various degrees of adulteration to guarantee the authenticity and quality of oils.The evaluation parameters for pure and adulterated stage classification are precision, Recall, F1-score, kappa, and overall accuracy.Table 2 illustrates  Several statistical tests have been conducted (overall accuracy, precision, recall, F1-score, and Kappa) for experimental evaluation.The assessment metrics are derived through the subsequent mathematical expressions.
where FP and TP are false positive and true positive respectively, and FN and TN are false negative and true negative respectively, computed from the performance matrix.
x i is the performance score for fold i, x is the mean performance score across all folds, k is the number of folds in the cross-validation.The stability of the model's performance over the course of the ten folds and the corresponding training time are also shown in the table.Out of all the tested samples, 100% of the samples are correctly classified in the accuracy column.The precision and recall columns show that 100% of all tested oil samples have true positive predictions.The F1-Score, which balances the two metrics by calculating the harmonic mean of recall and precision, has a range of up to 100%.After comparison, LDA is found to be the first most time-efficient and most accurate model.With eleven classes considered, the average accuracy ranges for each model are 95%-100% overall, and the ranges for precision, recall, F1-score, and kappa are (77%-100%, 76%-100%, 75%-100%, and 73%-100%) respectively.
Table .4provides a thorough post-spectral data processing comparison of the performance demonstrated by various research approaches (SVM, LR, LDA, RF, DT, KNN, and Naïve Bayes) used in the published study.The evaluation uses a 10-fold cross-validation method and includes multiple classification metrics such as recall, precision, F1-score, kappa, and overall accuracy.Utilizing 10-fold cross-validation to assess the robustness and generalization ability of the classifiers.Crossvalidation involves splitting the dataset into 10 subsets, training the model on different subsets, and evaluating its performance on the remaining data (see Table 5).
The dataset is divided into ten folds of the equal size.The model is then trained repeatedly ten times, using one fold for validation and nine folds for training in each iteration.This methodical methodology ensures that every data point is used exactly once for validation.This is followed by the averaging of performance metrics over all 10 folds, including kappa, accuracy, precision, recall, F1-score, to approximate the model's generalization performance.This comprehensive analysis provides a broad assessment of the model's effectiveness at every level.This process helps to mitigate the risk of overfitting and provides more reliable estimates of performance.K-Fold cross-validation is a widely used technique for evaluating the performance of ML models.The standard deviation can be used to determine how widely the performance metric is distributed among the k folds and how much the metric deviates from the K-Fold mean performance.Fig. 7 summarizes the performance metrics of various applied ML models on different classes spectral signature.The SVM consistently achieves perfect scores across all classes, indicating flawless performance.LR demonstrates high performance overall, with slightly lower scores in some classes compared to SVM.LDA exhibits perfect scores across all classes, showcasing robust performance.RF achieves high scores overall, with some variation in performance across different classes.DT maintains strong performance, albeit with slightly lower scores in certain classes.KNN shows high performance, with some variability across different classes.Naïve Bayes achieves varying levels of performance across each class, with lower scores observed in certain cases.Overall, it demonstrates LDA is decent performance as compared to the other models, indicating their robustness in classifying the pure and adulterated oil data.Evaluation summaries provide valuable information for decision-making processes related to the deployment of classification models in oil adulteration applications.Stakeholders can use these summaries to assess the risks and benefits associated with using a particular model for oil adulteration detection and classification.
A confusion matrix is a performance evaluation tool used in classification tasks to visualize the performance of a machine learning model are shown in Fig. 8.It provides a comprehensive summary of the model's predictions by comparing them with the actual ground truth labels across different classes.The matrix is square and has dimensions equal to the number of classes in the classification problem.Each row of the matrix represents the instances in a predicted class, while each column

Comparison with state-of-the-art techniques
Table 4 provides an extensive overview of different approaches and how well they work to identify different kinds of oil adulteration.These methods range from conventional analytical methods to sophisticated spectroscopic and imaging methods.Every technique has different accuracy levels and uses different technologies for analysis.The identification, categorization, and adulteration of camellia oil were examined by Wang et al. (2019).NIR methods in conjunction with chemometrics were employed by Gurtz et al. (Gertz et al., 2020) as a reliable analytical and statistical tool for evaluating virgin olive oil (VOO).In order to investigate sugar adulteration, Phillips et al. (Phillips and Abdulla, 2023) developed a unique method for honey fraud detection using hyperspectral imaging and machine learning.Among the notable additions is a brand-new feature smoothing method designed specifically for classification.In order to quantify the amount of starch present in adulterated fresh cheese, Barreto et al. (2018) proposed a study that used Hyperspectral Imaging (HSI) and partial least squares regression (PLSR) methodology to model the starch content.The integrity of vegetable oil was discussed by Wu et al. (2023), particularly in areas where concerns about authenticity are raised by economic considerations.They carried out a qualitative and quantitative investigation of adulteration in different vegetable oils using 3D fluorescence spectroscopy.In their study, Tachie et al. (2024) assessed how well machine learning (ML) techniques combined with Attenuated Total Reflectance Fourier Transform Infrared (ATR-FTIR) spectroscopy could distinguish and classify different oils and margarines, including those that were adulterated with canola-flaxseed oil and sunflower oil.
The adulteration of Camellia oil (CAO) was discussed by Chen et al. (2023), who concentrated on creating intelligent analysis methods for determining the categories and amounts of adulterated oil in CAO.
Visible/Near Infrared Spectroscopy (Vis/NIRS) in conjunction with multivariate chemometrics was employed by Feng et al. (Srata et al., 2019) to assess superior engine oil samples that had been tampered with.Kaufmann et al. (2022) focused on verifying the authenticity of coriander oil and identifying adulteration with palm olein, canola oil, and soybean oil.PCA effectively distinguished between pure oils, while LDA and KNN algorithms were used for classification.Partial Least Squares (PLS) regression models demonstrated strong coefficients of determination (R2) ranging from 0.98 to 0.99 for detecting adulterated coriander oils, indicating suitability for quality control during processing.Du et al. (2021) introduces a novel real-time method for quantitatively detecting adulteration in coriander essential oil (CAO) using in situ near-infrared (NIR) spectroscopy and chemometrics.Discriminant analysis (DA) achieved a 96.7% accuracy in identifying adulterated oils, while partial least squares (PLS) regression, optimized with various pretreatment methods, demonstrated excellent predictive performance for adulteration levels, with determination coefficients (R2) exceeding 0.995 and low root mean square errors of calibration (RMSEC) and prediction (RMSEP).Vieira et al. (2021) were proposed a portable NIR spectrometer was utilized to verify the authenticity of pure sacha inchi oil.One-class models were employed for the untargeted detection of sacha inchi oil adulteration.DD-SIMCA and OC-PLS models achieved sensitivities exceeding 87%.

Conclusions
This paper aims to evaluate the feasibility of identifying the pure and adulterated oil using HSI with a spectral range of 400-1000 nm.HSI series of 670 oil samples were acquired under both UV and halogen illumination settings.HSI has emerged as a promising technique for oil adulteration detection and classification.By leveraging the unique spectral signature of different oil types and their mixtures, HSI provides a non-destructive and rapid means of distinguishing pure oils from adulterated ones.The ability to detect adulteration at various stages of the supply chain can have significant implications for ensuring the authenticity and quality of oils, safeguarding consumer's health, and protecting the reputation of producers.Machine learning is a powerful technique for oil adulteration detection and classification that combines multiple base classification (SVM, LR, RF, DT, LDA, ANN, KNN and Naïve Bayes) models to make more accurate and robust predictions.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 2 .
Fig. 2. HSI experimental setup and spectral signature of the oil.

Fig. 3 .
Fig. 3. Visual difference of the pure oil classes (a-g) and adulterated oil (h).

Fig. 5 .
Fig. 5.The visualization (a to d) of pixel level HSI spectral signature of different oil samples.

K−
Fold cross validation accuracy = OA + Precision + Recall + F1 − score + kappa 5 (13) represents the instances in an actual class.The diagonal elements of the matrix represent the number of correctly classified instances for each class, while off-diagonal elements indicate misclassifications.By analyzing the confusion matrix, one can gain insights into the model's strengths and weaknesses, such as its ability to distinguish between different classes and the types of errors it makes.In the classification experiments conducted using various machine learning algorithms, the confusion matrices provide valuable insights into the performance of

Table 2
Proposed model evaluation performance parameter comparisons.

Table 3
Proposed models confidence prediction for unseen data.

Table 4
Self-comparison of the proposed pipeline ML models.

Table 5
Comparison of the proposed and reported oil adulteration identification techniques.
Fig. 7. Comparison of performance evaluations of each model.each model.Across all classifiers, a remarkable level of accuracy is evident, with most showing near-perfect classification across multiple classes.Notably, DT, LDA, LR, RF, and SVM demonstrate flawless classification accuracy, each achieving 100% correct predictions across all classes.KNN exhibits high overall accuracy, albeit with a slight misclassification in a subset of samples.However, Naive Bayes showcases more variability, with noticeable misclassifications observed, particularly in instances of overlapping features.