The HPLC Fingerprint Analysis of Selected Cirsium Species with Aid of Chemometrics

Twelve Cirsium sp. methanolic extracts were analyzed using high-performance liquid chromatography gradient elution method with run time of 45 min. Four Kinetex (150 × 4.6 mm) chromatographic columns (C18 5 μm, C18 2.6 μm, pentafluorophenyl 5 μm, phenyl-hexyl 5 μm) and mobile phase consisting of methanol/water/formic acid 1% were used. Eight standards (naringin, vanilic acid, chlorogenic acid, caffeic acid, rutin, luteolin, apigenin, p-coumaric acid) were analyzed in the same conditions to confirm their presence in all of Cirsium methanolic extracts. The obtained chromatograms were compared and the similarity between them was evaluated using the similarity indices (Pearson’s correlation coefficient, determination coefficient and congruence coefficient), distance indices (Euclidean, Manhattan and Chebyshev distance) and multi-scale structural similarity (MS-SSIM). Obtained results were confirmed using the principal component analysis (PCA). The attempt of identification of two unknown Cirsium species was performed using the similarity, distance indices and PCA analysis.


Introduction
Cirsium species (Asteraceae) are popular plants growing in the meadows of Europe, North Africa, Siberia, Central Asia and America.In Poland the thistles are very common and widespread.The most popular species are C. vulgare, C. rivulare, C. oleraceum, C. canum, C. eriophorum, C. decussatum, C. pannonicum, C. acaule, C. helenoides and C. erisithales.They grow in pastures, fallow, river walleyes and prefer calcareous soils.These plants are famous for their use in traditional and conventional medicine, cosmetology, and some species are used as additive to food because of the nutritional value.The main compounds of Cirsium are flavonoids, phenolic acids, sterols, alkaloids, polyacetylenes, acetylenes, triterpenes, sesquiterpene lactones, lignans, hydrocarbons and minerals.The extracts of Cirsium exhibit many biological activities, such as antimicrobial, 1,2 anticancer, 3 antioxidant, 4,5 hepatoprotective, 6 antifungal, 7 and antibacterial. 8][12][13][14] The chemometric methods, as the application of mathematical and statistical techniques, can be important instruments to retrieve more information from the chromatographic data and for evaluating of similarity between various herbal species. 15n our work, the fingerprint analysis of twelve thistles (ten known and two unknown) were performed using HPLC gradient elution technique.The similarity between the studied Cirsium species were evaluated using the similarity indices (Pearson's correlation coefficient, R; determination coefficient, R 2 ; and congruence coefficient, cosine), distance indices (Euclidean, Manhattan and Chebyshev distances) and multi-scale structural similarity (MS-SSIM).The principal component analysis (PCA) was also necessary to attempt of identification of the two unknown species.
Before the analysis, all raw extracts were filtered using filter paper.Ten microliters of 0.1% solutions of standards were applied on the chromatographic column C18 5 µm for the identification of compounds in individual extracts.
Obtained chromatograms of extracts and standards were elaborated using Agilent EZChrom Elite software (Santa Clara, CA, USA).

Extraction procedure
The plant raw material (ten known Cirsium species) was obtained from the Botanical Garden of Maria Curie-Skłodowska University (Lublin, Poland) and two unknown plants of the same species were harvested in the meadow in Turka, near Lublin (Poland).The names of the plants are presented in Table 1.
The identity of species of individual plants was confirmed by the Botanical Garden workers; voucher specimens are placed in the Botanical Garden.The aerial parts of the plants were dried in the shade and wind, at ambient temperature.The mass of raw material was 10 g for samples 1, 2, 3, 7, 9 and 10; 20 g for samples 4, 5 and 8; and 5 g for sample 6.The mass of the two unknown species was 3 g.The aerial parts of dried raw material were ground in a hand mill, then placed in paper case and extracted in a Soxhlet apparatus during 12 h using dichloromethane as solvent and next for another 12 h using methanol as solvent.The obtained extracts were evaporated using rotary vacuum evaporator.Dried extracts were dissolved in methanol and poured into 25 mL graduated flasks.The extracts were stored in the refrigerator.

Preparation of standards
Ten milligrams of eight samples of standards (naringin, vanilic acid, chlorogenic acid, caffeic acid, rutin, luteolin, apigenin and p-coumarin acid) were dissolved in 1 mL of methanol to obtain about 0.1% solutions.

Chemometric analysis
Chromatograms of ten known and two unknown species of Cirsium were exported to text files (American Standard Code for Information Interchange, ASCII) and then opened using Excel program.The next processing was performed using the data including the retention times and the absorbance values obtained in 320 nm analytical wavelength.The 12 columns (number of studied extracts) and 6750 rows (45 min = 2700 s, frequency of sampling 2.5 Hz) matrix was created and it was saved as .csvformat.The obtained file was opened using the program SpecAlign (version 2.4.1), 16which is often used for alignment process of chromatographic data.Smoothing, denoising and background subtraction are also possible using this program.
At the beginning, the smoothing process was conducted for the obtained chromatograms using the Savitzky-Golay filter.The noise compression was performed using the discrete transformation wavelets Symmlet-8 and next using the soft threshold elimination with the value of threshold parameter equal to 0.5.Then the background subtraction was made.The baseline was designated using the limited moving average method with width of the window equal the twenty percent of chromatogram length.
According to Jiang et al., 17 the recursive alignment by fast Fourier transform (RAFFT) algorithm was selected to chromatograms alignment process.This algorithm is characterized by high efficiency and no effect on peak shape.Aligned chromatogram was divided on the segments and synchronized with target in all segments.The target chromatogram was characterized by highest average correlation coefficient (in this case C. decussatum) in comparison with the others. 18In all cases, the synchronization was performed with maximum shift equal ten.
After, the similarity and distance indices were calculated and PCA analysis was performed.

Similarity and distance indices
In our work the following similarity and distance indices were used: (i) Pearson's correlation coefficient that determines the level of linear dependence between the variables, with values from −1 to 1.A high absolute value of R confirms the strong relationships between the data, and the lack of correlation is when R is equal zero; (ii) determination coefficient that determines what percentage of one variable explains the variability of the second one, with R 2 values from 0 to 1.The lack of correlation is observed for zero value and the great similarity is when R 2 is equal 1; (iii) congruence coefficient (cosine measure) that is the cosine of the angle between the vectors in n dimensional space.The unit value of congruence coefficient confirms the great similarity between samples; (iv) Euclidean distance that is the distance between two points in n dimensional space equal with the length of the segment connecting these points.For similar vectors its value is close to zero; (v) Manhattan distance (city block) that is the sum of absolute differences of coordinates pairs of both vectors; (vi) Chebyshev distance that is the longest linear segment along one of the directions and it determines the greatest difference of coordinates; (vii) MS-SSIM as the plugin for ImageJ program was also used. 19,20The MS-SSIM was used to calculate the structural multidimensional parameter of similarity for quantitative measure of quality of recognition in optical character recognition (OCR) process.It is based on the picture of the analysis in various scale.Its mathematical definition can be presented as follows: (1 where M is the greatest coefficient of scale obtained after M-1 iterations.The particular elements of equation such as loss of contrast (c), deformation of lumination (l) and perturbation of the structure (s) are expressed using some indexes determined for all scales separately.The measures of the similarity and the distance were calculated to determine the similarity of analyzed samples and for identification of two unknown thistles.

Results and Discussion
Chromatographic analysis of some extracts of Cirsium species were performed by Kozyra and Skalicka-Woźniak 21 and Koryza and Głowniak, 22 and the presence of some flavonoids and phenolic acids in extracts were confirmed.
In our work, the attempt of identification of eight standards (naringin, vanilic acid, chlorogenic acid, caffeic acid, rutin, luteolin, apigenin, p-coumarin acid) was performed to confirm their presence in the studied Cirsium species (Table 1).The retention times of standards obtained for C18 (5 µm) chromatographic column are presented in Table 2 and the presence of standards in particular studied Cirsium extracts are presented in Table 3.

Measures of similarity and distance
Measures of the similarity were calculated for four chromatographic columns (C18 5 µm, C18 2.6 µm, PFP 5 µm, phenyl-hexyl 5 µm) and the summary of results is presented in Table 4.These calculations were performed for chemical comparison of ten analyzed Cirsium species and the attempt of identification of two unknown thistles (samples 11 and 12).
The confirmation of identity of unknown Cirsium species is ambiguous using these chromatographic and chemometric methods.Our aim was preliminary the estimation of the similarity of studied Cirsium species and the attempt of identification of two unknown species.The similarity between Cirsium decussatum Janka (sample 4) and Cirsium erisithales (Jacq.)Scop.(sample 6) was noticed in the case of PFP and phenyl-hexyl chromatographic columns using Pearson's correlation coefficient, congruence coefficient (values greater than 0.9) and determination coefficient (values higher than 0.8; Table 4).The similarity between the samples 4 and 5 was confirmed for described above hydrophobic columns, with R and cosine of 0.8857 and 0.8895, respectively, for phenylhexyl column; and 0.8298 and 0.8354, respectively, in the case of PFP column.Euclidean, Manhattan and MS-SSIM parameters also confirm the greatest similarity between Cirsium decussatum Janka and Cirsium eriophorum (L.) Scop.
In the case of both C18 columns, the similarity between samples 4 (Cirsium decussatum Janka) or 5 (Cirsium eriophorum (L.) Scop.) was observed using all similarity and distance indices.The value of R and cosine are in the range of 0.8652-0.8773;MS-SSIM is equal 0.3322 for C18 2.6 µm and 0.4040 for C18 5 µm.The distance measures (Euclidean, Manhattan and Chebyshev distances) and MS-SSIM also confirm the similarity between samples 4 and 5.
The comparison of two unknown thistles with Cirsium species from the Botanical Garden of Maria Curie-Skłodowska University (Lublin, Poland) was performed using the similarity and distance indices.Based on results from Table 4, the similarity between sample 11 and Cirsium decussatum Janka (sample 4) was confirmed using the first three similarity parameters (R, R 2 and cosine) for pentafluorophenyl and phenyl-hexyl chromatographic columns.The similarity between samples 11 and 5 (Cirsium eriophorum (L.) Scop.) was noticed using R, R 2 , cosine and Chebyshev distance (for C18 2.6 µm column) and Chebyshev distance (for PFP and phenyl-hexyl columns).
The similarity of the second unknown Cirsium species (sample 12) and Cirsium canum (L.) (sample 3) were noticed for four chromatographic columns using R, R 2 and cosine.
In the case of PFP and phenyl-hexyl chromatographic columns, the obtained values of Euclidean, Manhattan and Chebyshev distances confirm the similarity between samples 12 and 10 (Cirsium vulgare (Savi.)Ten.).
Comparing the two first hydrophobic columns (PFP and phenyl-hexyl), which were used as alternative of octadecyl columns, some differences were observed.It is interesting that the same results (the similarity of 11 and 12 with the other samples) were obtained for these two columns.Sample 11 is similar to Cirsium decussatum Janka (sample 4) using the Pearson's correlation coefficient, determination coefficient and congruence coefficient; or to 10, 12, 5 and 2 using appropriately Euclidean, Manhattan, Chebyshev distances and MS-SSIM.The second unknown Cirsium species (sample 12) is similar to Cirsium canum (L.) All.(3) using R, R 2 and cosine; or to Cirsium vulgare (Savi.)Ten.(10) using the three distance indices; or to 11 using the MS-SSIM.
The summary of all obtained chromatograms for exemplary chromatographic column (C18 5 µm) is presented in Figure 2.

PCA analysis
The preliminary chromatographic data processing, including the smoothing, noise reduction, background subtraction and alignment process, was performed.PCA matrix was consisted of 20 columns and 6751 lines.Obtained results are presented as principal component PC3 and PC2 graphs, indicating the percentage of the variability on the respective axis (Figure 3).
The close proximity of lines corresponding to known and unknown Cirsium samples confirms their similarity.
In accordance with Table 1.
The HPLC Fingerprint Analysis of Selected Cirsium Species with Aid of Chemometrics The similarity of Cirsium decussatum Janka (sample 4) and Cirsium erisithales (Jacq.)Scop.(sample 6) was also confirmed by PCA analysis.In the PC2 vs. PC3 charts for PFP and C18 (5 µm) columns (Figures 3a and 3d) lines corresponding with samples 4 and 6 are close to each other.For other chromatographic columns (Figures 3b and 3c) these lines are also near to each other, but not so close.
The second unknown Cirsium sp.(sample 12) was also compared to the other known Cirsium sp.For all chromatographic columns (Figures 3a-d), the line corresponding to sample 12 is located near sample 3 (Cirsium canum).
In conclusion, the similarity between samples 12 and 3 (Cirsium canum) was confirmed using the similarity indices (R, R 2 and cosine) and PCA for all used chromatographic systems.Moreover, the similarity between unknown sample 12 and Cirsium vulgare (Savi.)Ten.(sample 10) was confirmed using the distance parameters (Euclidean, Manhattan and Chebyshev distances) for PFP and phenyl-hexyl columns.The similarity between the second unknown Cirsium species (sample 11) and sample 2 (Cirsium arvense (L.) Scop.) was confirmed using MS-SSIM parameter for all chromatographic systems; and for PFP, phenyl-hexyl and C18 (2.6 µm) columns using PCA method.The similarity of sample 11 and 4 (PFP and phenyl-hexyl), 5 (C18 2.6 µm) and 6 (C18 5 µm) was confirmed using the similarity indices (R, R 2 and cosine); whereas the distance parameters confirm the similarity between 11 and 10, 12, 5 and 2 (for PFP, phenyl-hexyl and C18 2.6 µm) and 4, 12 and 2 (for C18 5 µm).The PCA analysis confirms the similarity between 11 and 2 for the first three chromatographic columns.

Conclusions
The chromatographic fingerprint constructions of twelve Cirsium species were prepared using HPLC and chemometric methods.The attempt of identification of standards (naringin, vanilic acid, chlorogenic acid, caffeic acid, rutin, luteolin, apigenin, p-coumaric acid) was performed based on the retention time values of particular standards.The presence of some standards in all Cirsium methanolic extracts was confirmed.
The similarity between various Cirsium species was evaluated using the similarity and distance indices.The similarity of unknown Cirsium species (12) and Cirsium canum was confirmed using the similarity indices (R, R 2 and cosine) whereas the distance parameters (Euclidean, Manhattan and Chebyshev distances) confirm the similarity between unknown sample 12 and Cirsium vulgare (Savi.)Ten. for PFP and phenyl-hexyl columns.PCA analysis conforms the similarity between 12 and Cirsium canum for all used chromatographic systems.

Table 2 .
Retention times values of standards for C18 (5 µm) column obtained in high-performance liquid chromatography (HPLC)

Table 3 .
Presence of standards in studied extracts (as detailed in Table2)

Table 4 .
Measures of the similarity for four chromatographic columns a Pearson's correlation coefficient; b determination coefficient; c congruence coefficient.MS-SSIM: Multi-scale structural similarity.