Classification and authentication of spices and aromatic herbs by means of HPLC-UV and chemometrics

Recent increase in the adulteration of spices and aromatic herbs in food industry constitutes a problem that requires exhaustive quality control. As every spice has a different composition with characteristic biomarkers, chromatographic profiles are especially valuable to authenticate these products. Thus, in this work a new high performance liquid chromatography (HPLC) method with UV – vis detection was developed for the characterization, identification and authentication of cinnamon, oregano, thyme, sesame, bay leaf, clove, cumin


Introduction
Food authentication is defined as the analytical process that verifies label description of food products and it is a field of ongoing concern mainly due to increasing attention of our society in food quality and safety (Danezis et al., 2016).In particular, there is a special interest in food products containing bioactive compounds with health-promoting properties, which also may have a key role in food sensorial and functional properties.In this sense, spices and herbs, which are a group of products that are used to add flavour and enhance organoleptic properties of food, are a rich source of phytochemicals (i.e., bioactive plant compounds with positive effects on health) such as phenolic compounds, terpenoids, carotenoids, phytosterols, alkaloids, sulfur-containing compounds, and organic acids such as citric (natural antioxidant) or ascorbic (vitamin C) (Opara & Chohan, 2014;Rubió et al., 2013;Yashin et al., 2017).In particular, phenolic compounds are a large class of chemical substances considered as secondary metabolites of plants that belong to a group of organic compounds that comprise an aromatic ring and a benzene ring with one or more hydroxyl groups including functional derivatives.Phenolic compounds are extensively present in spices and herbs, and much attention has been devoted to these compounds not only due to their influence on different organoleptic parameters, such as taste or colour, but also owing to their antioxidant, antimicrobial and anti-carcinogenic potential (Pandey & Rizvi, 2009;Parthasarathy et al., 2008;Quideau et al., 2011).Moreover, from an analytical perspective, phenolic compounds have been renowned as meaningful tools in the study of food authenticity and fraud detection, since their profiles or the relative amounts of particular compounds are frequently characteristic of a certain plant, and any deviation from the authentic samples profile suggests fraudulent manipulation (Escarpa & Gonzalez, 2001;Ignat et al., 2011;Kartsova & Alekseeva, 2008;Khoddami et al., 2013).This is of particular importance considering that the market of spices and herbs, which involves substantial amounts of money, is under constant threat from fraudsters.Some condiments such as oregano, vanilla, turmeric, cinnamon, saffron, and paprika are especially susceptible to adulteration for economic gain at the expense of the consumer.This fraud entails a potential threat to public health since many adulterants are defined as carcinogenic or lethal when exposed to them for a long time.Moreover, it should be also considered that food labels may not be fully descriptive of the food content, which could lead to severe allergic reactions (Galvin-King et al., 2018;Srirama et al., 2017).
Therefore, in view of the foregoing, quality control and screening techniques are needed to create a correct control for both stakeholders of the supply chain and final consumers.In this sense, the most commonly used analytical technique for the characterization, identification and authentication of spices and herbs is liquid chromatography coupled to different detectors such as mass spectrometry (MS), ultraviolet-visible (UV-vis), electrochemical detection (EC) and fluorimetric detection (FD), since it can be typically applied to detect phytochemicals such as phenolic compounds (Serrano & Díaz-Cruz, 2022).
Taking into account that every spice and herb have a different composition with dissimilar characteristic biomarkers, e.g., eugenol is one of the major constituent of clove and cinnamon, carvacrol of oregano, thymol of thyme, sesamol of sesame seeds, and vanillin of vanilla (Parthasarathy et al., 2008), the chromatographic profiles obtained after performing the corresponding analysis can be exploited as a source of analytical data to characterize, identify and authenticate spices and herbs.
In this direction, the current work investigates on the possibilities of merging liquid chromatography, to acquire profiles of the characteristic phenolic content of cinnamon, oregano, thyme, sesame, bay leaf, clove, cumin, and vanilla, with chemometric techniques for the extraction of characteristic profiles that allow the characterization, identification and authentication of considered spices and herbs samples.For this purpose, a reliable and simple HPLC method with UV-vis detection for the determination of characteristic phenolic profiles in the analysis of spices and herbs samples was developed.A total of six phenolic compounds (sesamol, vanillin, salicylaldehyde, eugenol, carvacrol and thymol) characteristic of studied spices and herbs were considered for the optimization of the chromatographic separation.A simple and low cost sample treatment, based on an extraction by sonication with methanol and subsequent stirring, was implemented for the analysis of different types of cinnamon, oregano, thyme, sesame, bay leaf, clove, cumin, and vanilla.Chromatographic data were submitted to chemometric methods such as unsupervised principal component analysis (PCA) for exploratory data analysis, and supervised soft independent modelling by class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA) to evaluate sample discrimination and classification.

Chemicals and instrumentation
Methanol HPLC Gradient grade (≥ 99.9%, Fisher Scientific, Geel, Belgium), formic acid (98%, PanReac AppliChem, Barcelona, Spain), and Milli-Q reference A+ water purification system (Millipore, France) were used for mobile phase preparation.Phenolic compounds, including sesamol, eugenol, thymol, carvacrol, salicylaldehyde and vanillin, were supplied by Acros Organics (Geel, Belgium).1000 mg L − 1 standard stock solutions of each phenolic compound were prepared in methanol and stored at 4 • C. Milli-Q water was used for the preparation of diluted working solutions from standard stock solutions.
HPLC-UV analyses were carried out by means of an Agilent Series instrument (Palo Alto, CA, USA), which is comprised of a quaternary pump (G1311A), an ultraviolet-visible detector (G1314B), an autosampler (G1329A) and a vacuum degasser (G1322A).The data were acquired and processed using Agilent ChemStation software.Reversephase separation in a Kinetex® C18 column (5 μm C18 100 Å, 100 × 4.6 mm) supplied by Phenomenex (Torrance, CA, USA) under gradient elution mode using 0.1% formic acid in Milli-Q water and methanol was proposed for recording the HPLC-UV chromatograms.HPLC-UV chromatograms were captured at 280 nm keeping the chromatographic column at room temperature and using an injection volume of 10 μL.
The mobile phase flow rate was 1 mL min − 1 .

Samples and sample preparation
A total of 87 samples acquired in local supermarkets and corresponding to different spices and herbs were analyzed in triplicate by HPLC-UV: cinnamon (16 samples), oregano (13 samples), thyme (12 samples), sesame (12 samples), bay leaf (8 samples), clove (8 samples), cumin (9 samples), and vanilla (9 samples).Considered samples were chosen as eight representative examples of spices and herbs for seasoning meals or as a condiment.
The different samples were treated as follows: 0.25 g of sample were weighed and 1 mL of methanol was added.The sample was sonicated for 15 min and then stirred for 45 min at 1000 rpm.The supernatant extracts were filtered through 0.22 μm nylon filters and methanol was added up to 1 g.The obtained extracts were stored at − 18 • C until further analysis.

Data treatment
Samples were analyzed by HPLC-UV in triplicate and randomly in order to minimize systematic error (and to avoid introducing any trends in subsequent chemometric analysis), generating a total of 261 chromatographic profiles.Initially, six blanks of Milli-Q water were injected to stabilize the systems and, as a control, two more blanks were injected every ten analyzed samples.Chromatographic data were extracted from the instrument using the Agilent ChemStation software and processed in a Matlab® environment (Matlab Version R2021b Ed., 2021).Prior to the construction of chemometric models, HPLC-UV chromatograms were pre-treated to prevent any possible artefact derived from small time shifts or baseline drifts.Thus, firstly, baselines were adjusted using the baseline estimation and denoising with sparsity (BEADS) algorithm (Navarro-Huerta et al., 2017;Ning et al., 2014) using the following parameters: cut-off frequency = 0.003 cycles/sample, asymmetry ratio = 17, filter order = 1, λ 0 = 0.1, λ 1 = 1, λ 2 = 10, and amplitude = 0.1.Once the baseline was adjusted, a peak alignment was performed with the function Variable Alignment using the correlation optimized warping (COW) algorithm, included in PLS_Toolbox (Eigenvector Research (PLS_Toolbox, Version 8.9.2, 2021)), with a section length of 50 and a slack of 5. Finally, the edges of chromatograms (before 50 s and after min) were discarded since they did not provide valuable information to the chromatographic profile.
Principal component analysis (PCA), soft independent modelling of class analogies (SIMCA) and partial least squaresdiscriminant analysis (PLS-DA) models were built using PLS_Toolbox, which is implemented in Matlab.For PLS-DA and SIMCA, samples of each class were randomly distributed between a training and validation set following approximately a 60:40 ratio.The final training and validation sets contained and 31 samples, respectively, corresponding to a 64% and 36% of the total samples.The optimal PLS-DA model consisted in 7 latent variables (LV), which were chosen according to the first minimum in the average classification error obtained in cross validation performed using the venetian blinds sample split (Fig. S1, see supplementary material).SIMCA model was compiled using the following number of principal components (PC) in each individual PCA model: cinnamon (2 PCs), oregano (4 PCs), thyme (3 PCs), sesame (2 PCs), bay leaf (1 PC), clove (4 PCs), cumin (4 PCs), and vanilla (2 PCs).This number of PCs was selected according to the % of variance explained.

HPLC-UV optimization
Firstly, in order to develop the chromatographic method, the chromatographic separation was optimized based on six phenolic compounds: sesamol, eugenol, thymol, carvacrol, salicylaldehyde and vanillin, which are important and characteristic constituents in the spices and aromatic herbs studied (Parthasarathy et al., 2008).This optimization sought to achieve the best separation in the shortest possible time as well as to procure a chromatographic profile rich in phenolic compounds that would allow the discrimination among different types of spices.Taking into account that the six phenolic compounds are structurally similar, the use of an elution gradient to perform the separation was considered.
The best separation was achieved with the following elution gradient program between 0.1% formic acid in Milli-Q water (solvent A) and methanol (solvent B): 0-2 min, linear gradient from 5 to 20% solvent B; 2-6 min, isocratic elution at 20% solvent B; 6-9 min, from 20 to 50% solvent B; 9-26 min, at 50% solvent B; 26-28 min, from 50 to 95% solvent B; 28-32 min, at 95% solvent B; and 32-35 min, from 95 to 5% solvent B. Between injections, an isocratic elution at 5% solvent B during 5 min was used for column reequilibration.Fig. 1a displays the HPLC-UV chromatogram obtained under the optimized gradient conditions for a standard mixture of the studied phenolic compounds at a concentration of 15 mg L − 1 each.As it can be seen, an acceptable separation of the mixture was attained in 35 min.
In the optimization of the chromatographic method it was also important to consider the detection step.The optimal working wavelength to perform the UV detection of phenolic compounds was studied in the range from 240 to 360 nm (Fig. 1b).The wavelength chosen as optimal was 280 nm since, as it can be seen in Fig. 1b, at this wavelength the peaks corresponding to the six studied compounds could be identified and most of them were more intense than those achieved at other wavelengths.This optimal wavelength is in agreement with that reported in the literature for direct UV-absorption detection of polyphenols and phenolic acids (Cetó et al., 2018;Pardo-Mates et al., 2017).

Sample analysis
The 87 samples of spices and herbs, previously treated by the procedure described in section 2.2, were analyzed in triplicate by the optimized HPLC-UV method.Fig. 2 shows the characteristic chromatographic profiles registered for each type of sample.
As shown in Fig. 2, the chromatographic profiles obtained for the different spices and aromatic herbs studied are, in general terms, significantly different from each other, being able to identify some of the characteristic biomarkers of each studied spice and herb such as eugenol in clove, cinnamon and bay leaf; salicylaldehyde in cinnamon; or vanillin in vanilla.Nevertheless, it should be noted the considerable similarity existing between the chromatographic profiles of oregano and thyme, which is attributed to the fact that both herbs have a very similar composition (Gavaric et al., 2015;Parthasarathy et al., 2008).Among the compounds that are common in both herbs we can find, for example: γ-terpinene, linalool, borneol, thymol methyl ether, carvacrol methyl ether, trans-caryophyllene and caryophyllene oxide, all with a similar weight in the composition of both oregano and thyme, but not being the majority.Apart from these compounds both herbs also contain carvacrol and thymol, which are the two major compounds in both oregano and thyme.However, both herbs differ in the ratio of these compounds: oregano contains more carvacrol than thymol, whereas thyme has a higher content of thymol than carvacrol.
The above-mentioned existing differences in the chromatographic profiles obtained by the developed HPLC-UV method for the different spices and herbs considered, suggest that the chromatographic profiling combined with the appropriate chemometric techniques, could be fully suitable for the characterization, identification and authentication of considered spices and herbs samples.

Sample classification by means of SIMCA and PLS-DA
Prior to the development of classification models, data preprocessing was optimized in order to avoid variability related to instrumental artifacts such as peak shifting and baseline irregularity.This optimization was based on an objective criteria that employs Silhouette (Kaufman & Rousseeuw, 1990) as an index to measure the obtained clusterization after the addition/subtraction of each preprocessing step.The optimized data preprocessing, which is summarized in Fig. 3, included three steps: i) baseline correction based on BEADS algorithm; ii) peak alignment by means of COW; and iii) data selection.Detailed information about this optimization can be found in Supplementary Material.
Quantitative separation among the different types of spices considered was first assessed by means of SIMCA, a linear method based on PCA able to discriminate between a high number of classes.For this purpose, the 87 samples were divided into a training and validation sets and individual identification models were built for each class as described in Section 2.3.As displayed in Fig. 4, relatively good results were obtained for the training set using this method, although a few samples of sesame and thyme were misclassified as oregano and bay leaf, respectively.This confusion can be attributed to the closeness of Fig. 3. Data processing applied to chromatographic profiles and its effect on the separation observed in the scores of the PCA models generated.Effect of each processing step is exemplified with the chromatographic profile of a cinnamon sample.SI: Silhouette index.these types of spices in the scores obtained in the PCA model (see Fig. 3, processed data).However, much poorer results were attained in the validation set, where not only sesame and thyme samples were misclassified but also a few replicates of clove, oregano and vanilla were assigned to cinnamon (Fig. 4), leading to a global classification error of 3.67% in the external validation.This error in classification could be attributed to the close proximity of some classes observed in PCA scores, as SIMCA model is built from the assembling of individual PCA models constructed based on the direction of highest variation for each class, which is not necessarily the same direction as that of maximum  separation between classes (Bylesjö et al., 2006).Aiming for a better classification, a PLS-DA model was considered.PLS-DA is a linear method based on PLS in which class sample is employed as response matrix, thus maximizing class-separation.Fig. 5a shows the scores diagram obtained with PLS-DA built using 7 LVs, which is quite different from the analogous PCA plot but still reveals cinnamon, clove and cumin as the three spices better resolved.The classification obtained for the training set is slightly better than that attained by SIMCA, with only one replicate of bay leaf and, the three replicates of one sample and one replicate of another sample of thyme incorrectly described by the model (Fig. 5b).Nevertheless, the major improvement as compared to SIMCA is the classification achieved for the validation set, in which a global classification error of 0.14% was achieved, and only one bay leaf replicate was misclassified as sesame.PLS-DA and SIMCA were quantitatively compared in terms of sensitivity (ability to detect true positives), specificity (ability to detect true negatives) and classification error (model ability to perform a correct classification, considering both true positives and true negatives).As it can be observed in Table 1, both methods provided similar results for the training set, with only a few values of sensitivity and specificity below 1 and a total global error of 0.75% and 0.82% for PLS-DA and SIMCA, respectively.Nevertheless, PLS-DA clearly outperformed SIMCA for the validation set, demonstrating higher sensitivity and specificity as well as lower global error.The better performance of PLS-DA is likely attributed to its ability to maximize class separation, which is particularly important in this case as PCA scores show low within-class variability but close proximity between some of the considered classes (Bylesjö et al., 2006).
An interesting aspect of PLS-DA is the information provided by the loadings, which are most frequently studied through the VIP scores.A close inspection to VIP scores for each class (Fig. 5c) revealed that the relevant variables for the classification (these above 1, red dashed line in Fig. 5c) are placed in the time regions where the chromatographic peaks of characteristic substances considered in the optimization of the chromatographic separation appear, but they also cover other parts of the chromatogram.It is observed that the vanillin peak region (retention times: 9.00-9.58min) contributes significantly to the classification of all classes of samples.The same happens with sesamol (retention times: 8.40-8.75min), which contributes to the classification of all spices with the exception of cinnamon and thyme.Although vanillin and sesamol should not be present in all considered spices, it could happen that other relevant compounds with similar characteristics exist eluting in this region.This aspect should be studied more thoroughly with techniques allowing a more qualitative view such as mass spectrometry.The eugenol region (retention times: 15.25-16.37min) is only important for clove and slightly contributes to the classification of thyme.Finally, the regions of carvacrol (retention times: 24.87-25.60min) and thymol (retention times: 26.80-28.00min) are only important in the classification of oregano and thyme, respectively, which is in accordance to these two chemical spices being the main components of oregano and thyme.

Conclusions
The merging of HPLC-UV with chemometric methods has been demonstrated to be a satisfactory approach for the characterization, identification and authentication of cinnamon, oregano, thyme, sesame, bay leaf, clove, cumin, and vanilla samples, providing a quality control and screening tool to ensure a correct assurance of studied spices and herbs.
Firstly, the HPLC-UV conditions were optimized for the determination of six characteristic biomarkers (sesamol, eugenol, thymol, carvacrol, salicylaldehyde and vainillin), achieving a good chromatographic separation with an analysis time lower than 35 min using UV-Vis detection at 280 nm.
The exploratory study by PCA showed the usefulness of the proposed three-step data pretreatment based on baseline removal, peak shifting correction and edge removal for the discrimination of the spices and herbs studied.
The developed SIMCA and PLS-DA models were able to discriminate between the eight classes of spices and aromatic herbs considered.However, it should be noted that although the analysis by SIMCA has provided a correct classification, the model obtained has lower sensitivity and selectivity, with a higher overall prediction error.Thus, it can be concluded that PLS-DA is the most effective chemometric method for the characterization, identification and authentication of the spices and herbs samples studied.

Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 Fig. 2 .
Fig. 1. a) Chromatografic profile (blue line) obtained with the optimized gradient elution (red line) for the separation of sesamol (1), vanillin (2), salicylaldehyde (3), eugenol (4), carvacrol (5), and thymol (6), all of them at 15 mg L − 1 and performing the detection at 280 nm.b) Optimization of the working wavelength for UV-detection, using the same conditions as in (a).(For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)

Fig. 4 .
Fig. 4. Most probable prediction plot for SIMCA model constructed using data from pretreated chromatographic profiles.

Fig. 5 .
Fig. 5. Scores diagram (a), most probable prediction (b), and VIP scores (c) plots for PLS-DA model constructed using data from pretreated chromatographic profiles.For VIP scores dashed red line represents the threshold value of 1. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)