Classification of Cannabis Cultivars Marketed in Canada for Medical Purposes by Quantification of Cannabinoids and Terpenes Using HPLC-DAD and GC-MS

As of 2014, 545 compounds have been identified in cannabis, among which there are 104 cannabinoids [1]. Cannabinoids have been indicated for sixteen potential therapeutic uses, ranging from pain management to neurological disorders [2]. Although the research focus is mainly on psychoactive THC, other non-psychoactive cannabinoids, such as CBD, cannabigerol (CBG), and cannabichomene (CBC), also have broad therapeutic potential without the negative effects of THC and may enhance the beneficial effects of THC [3]. Apart from cannabinoids, a significant number of compounds produced in cannabis are terpenes, which are responsible for cannabis’s distinctive odour [4]. Although clinical studies are still nascent, terpenes are receiving increasing attention for their synergistic interactions with cannabinoids in the treatment of pain, inflammation, depression, anxiety, addiction, epilepsy, cancer, and infections [3]. Cannabis, as an herbal medicine, is suggested to be greater than the sum of its individual components [3].


Introduction
As of 2014, 545 compounds have been identified in cannabis, among which there are 104 cannabinoids [1]. Cannabinoids have been indicated for sixteen potential therapeutic uses, ranging from pain management to neurological disorders [2]. Although the research focus is mainly on psychoactive THC, other non-psychoactive cannabinoids, such as CBD, cannabigerol (CBG), and cannabichomene (CBC), also have broad therapeutic potential without the negative effects of THC and may enhance the beneficial effects of THC [3]. Apart from cannabinoids, a significant number of compounds produced in cannabis are terpenes, which are responsible for cannabis's distinctive odour [4]. Although clinical studies are still nascent, terpenes are receiving increasing attention for their synergistic interactions with cannabinoids in the treatment of pain, inflammation, depression, anxiety, addiction, epilepsy, cancer, and infections [3]. Cannabis, as an herbal medicine, is suggested to be greater than the sum of its individual components [3].
Cannabis for medical purposes is becoming a global trend and is especially popular in North America. Canada and an increasing number of states in the US have legalized medical cannabis. As of January 2017, 28 states and Washington D.C. have legalized the medical use of cannabis [5] In Canada, 38 licensed producers are authorized to produce and sell dried marijuana, fresh marijuana, cannabis oil, or starting materials to eligible persons in Canada [6]. Effective as of August 2016, the new Access to Cannabis for Medical Purposes Regulations (ACMPR) permits self-production of a limited amount of cannabis for medical purposes as a supplement to purchasing cannabis from licensed producers [7]. However, due to the highly varied and complex composition of active components in cannabis, the suitability of each cultivar for treating particular conditions requires further investigation. In addition, a long history of hybridization has resulted in hundreds of cannabis cultivars, among which many have similar chemical compositions. In this respect, cannabis cultivar classification is a foundational requirement for standardizing and controlling the quality of cannabis for medical applications.
Currently, there are three classification systems for cannabis. The first, a botanical perspective, attempts to classify cannabis into different species or subspecies based on appearance, THC content, and geographical origins (gene pools) [8][9][10][11][12][13]. The second, a chemotaxonomic perspective, describes five chemotypes (chemical phenotypes) based on the ratio of two major cannabinoids THC and CBD, which is decided by their corresponding allelic loci [14][15][16][17][18][19][20][21]. Recently, a third perspective seeks to categorize cultivars based on both cannabinoids and terpenes for drug standardization and clinical research purposes [22,23]. However, there is no currently available systematic classification covering the majority of commercially available cultivars. In this project, we aim to classify a portion of cannabis cultivars currently marketed in Canada based on the content of dozens of potentially therapeutic cannabinoids and terpenes. Cannabis cultivar classification will be the foundation for industrial production, clinical research, and informative guidance for individual growers in Canada.
In this work, we used a HPLC method recommended by American Herbal Pharmacopoeia (AHP) [24] to quantify 10 cannabinoids (Cannabidiolic Acid (CBDA), Cannabigerolic Acid (CBGA), CBG, CBD, Tetrahydrocannabivarin (THCV), Cannabinol (CBN), Tetrahydrocannabinolic Acid (THCA), ∆ 9 -THC, ∆ 8 -THC, CBC). We also developed a GC-MS method to quantify 14 terpenes (α-Pinene, β-Myrcene, β-Pinene, Δ 3 -Carene, Limonene, p-Cymene, Eucalptol, Linalool, Fenchone, Fenchol, Borneol, α-Terpineol, Pulegone, β-Caryophyllene) that have been indicated for pharmacological activities [3,25,26]. We validated these methods for specificity (selectivity), linearity, accuracy, precision (repeatability and intermediate precision), limit of detection (LOD), and limit of quantification (LOQ). We then applied these two quantification methods on 32 medical cannabis samples provided by licensed producers, followed by two classification methods. Hierarchical cluster analysis was first carried out and principal component analysis (PCA) was applied to confirm whether the cultivars in the cluster analysis would also be grouped together by PCA. PCA also revealed the compounds that were responsible in grouping cultivars between clusters. These classification results may have value for clinical researchers in the discrimination and selection of cultivars. They may also assist licensed producers in optimizing cultivar selection with regards to medicinal effects.

Sample collection
A total of 32 cannabis samples (dried flower buds) were collected from two licensed producers in Canada. Sample names were provided by the licensed producers and different names may not necessarily represent distinct cultivars. Samples arrived in sealed plastic bags and were stored in a dry and cool storage facility prior to analysis. All samples were pulverized into fine powder. Approximately 1 g of each sample was used for extraction.

Solvents and chemicals
All 10 cannabinoid standards except CBDA were purchased from Cayman Chemical Co. (Ann Arbor, Michigan 48108 USA). CBDA and the internal standard (ISTD) diazepam were purchased from Sigma-Aldrich Company (Oakville, Ontario, Canada). All standards were analytical grade and were provided as 1 mg/mL solution in methanol or acetonitrile. All 14 terpenes standards and ISTD tridecane were purchased from Sigma-Aldrich Company (Oakville, Ontario, Canada). All standards were analytical grade and came as a pure liquid or white powder.
Hexanes, chloroform, and acetonitrile were purchased from Fisher Scientific Company (Ottawa, Ontario, Canada). Ammonium formate was purchased from Sigma-Aldrich Company (Oakville, Ontario, Canada). Methanol was purchased from EMD Millipore (Etobicoke, Ontario, Canada). Formic acid was purchased from Caledon Laboratory Chemicals (Halton Hills, Ontario, Canada). Water was HPLC grade, produced in-house using a Millipore filtration system which purified water to 18 mΩ resistivity.
The stock standard solution was prepared by adding 1 mL of 1 mg/mL THCA, CBDA, CBGA, ∆ 9 -THC, ∆ 8 -THC, CBD, CBG, CBC, THCV, and CBN standards into a 10 mL volumetric flask. This mixed standard solution was dried under a gentle stream of nitrogen, and then a 1:1 ratio of water and acetonitrile spiked with 20 ppm diazepam as ISTD was added to volume. The resulting concentration of each cannabinoid in the stock solution was 100 ppm (µg/mL). 100 ppm of the mixed standard solution was further diluted to create calibration standard solutions with cannabinoid concentrations of 50 ppm, 25 ppm, 5 ppm, 1 ppm, and 0.5 ppm.
The analytical method was adapted from a published method in AHP monograph (revision 2014) [27] by modifying the dilution factor. In the original method, 200 mg of sample was extracted with methanol/ chloroform (9/1, v/v) and the extract was diluted by a factor of 10. In this method, we diluted the extract by a factor of 40 in two steps. A 100 µL aliquot of the filtrate of the extract was first diluted to a volume of 1 mL. Then, a 30 µL aliquot of the diluted extract was evaporated under a gentle stream of nitrogen. The residue was then dissolved in 120 µL of a mixture of water/acetonitrile (5/5, v/v) with 20 ppm ISTD. Finally, 100 µL of the solution was transferred into an amber vial with a spring glass insert for HPLC analysis. Quantifications of cannabinoids were achieved by comparing the ratio of sample/ISTD with the ratio of the external standard (ESTD)/ISTD at the target concentration. Cannabinoid analysis results were reported as mass fraction (w/w %).

GC-MS systems and terpenes assay
The GC-MS system used in this study was an Agilent 7890A GC system comprised of the following components: Agilent 7890A GC (G3440A), Agilent 5975C inert MSD with Triple-Axis Detector, K`Prime GC Sample Injector (MXY 02-01B), and a GC Column (Phenomenex, Zebron, ZB-624 30 m × 0.25 mm ID, 1.40 µm film thickness). A temperature gradient program was used for the separation of terpenes ( Table 1). The injector temperature was 250°C. Injection volume was 2 µL. Split ratio is 20:1. The carrier gas (helium) flow rate was 1.2 mL/min. Run time was 20 minutes. SIM was carried out to quantify terpenes. ISTD was prepared by weighing 216.8 mg of tridecane and dissolving it into 1 L of extraction solvent, resulting in an ISTD concentration of 216.8 ppm. A stock standard solution of each terpene was prepared separately by weighing approximately 200 mg of each terpene and dissolving it into 10 mL of ISTD-spiked extraction solvent. A 500 ppm mixed working standard was prepared by taking a calculated volume of each of the stock standards and adding extraction solvent to a final volume of 100 mL. The 500 ppm standard solution was further diluted to create calibration standards with terpenes concentrations of 250 ppm, 100 ppm, 50 ppm, 25 ppm, 5 ppm, and 1 ppm, respectively.
About 500 mg dried sample was extracted with 5.0 mL 1:1 ratio of hexane and ethyl acetate and put on shakers for 20 minutes. After being centrifuged at 10,000 rpm for 5 minutes, a 100 µL of the supernatant

Quantification and classification
Each sample was quantified for both cannabinoid and terpene content, which were then subjected to cluster analysis and PCA in order to enable cannabis cultivar classification. The software used for both analysis was JMP ® 13.0.0. Observations (cultivars in this case) were grouped using hierarchical clustering. The distances between clusters were calculated using Ward's minimum variance equation [2]: where D KL is Ward's distance between clusters K and L; K and L subscripts are positive integers up to the number of observations; x K is the mean vector for the Kth cluster C K ; x L is the mean vector for the Lth cluster C L ; ||x|| is the square root of the sum of the squares of the elements of x (the Euclidean length of the vector x); N K is the number of observations in C K ; and N L is the number of observations in C L .
PCA is a commonly used multivariate technique to detect patterns in high-dimensional data. PCA can also identify the critical compounds for discriminating cannabis cultivars, which is useful in choosing cultivars with specific abundant bioactive components. PCA projects the original chemical data into a new coordinate system, which is produced by calculating eigenvalues and eigenvectors from the covariance matrix of the original matrix. The eigenvectors (principal components, shortened as PCs) are orthogonal to each other and are ordered by significance: the first PC explains the most variance and the last PC explains the least [29]. For better visual interpretation of the data, the first two or three PCs are reserved, resulting in a lossy data compression process.

Method validation for cannabinoids (HPLC)
A solvent blank was injected and no false signal peak was observed at the targeted retention time area. Five levels of cannabinoid standard was transferred into an amber vial with a spring glass insert for GC analysis. Fifteen compounds (including the ISTD tridecane) were divided into 12 groups in the SIM method, with each group assigned with corresponding quantifier and qualifiers. Quantifications of terpenes were achieved by comparing the ratio of sample/ISTD with the ratio of ESTD/ISTD at the target concentration. Terpene analysis results were reported as w/w%.

Method validation
Both HPLC and GC-MS methods were validated for specificity (selectivity), linearity, accuracy, precision (repeatability and intermediate precision), LOD and LOQ as instructed by the ICH Harmonised Tripartite Guideline for Validation of Analytical Procedures Q2 (R1) [28]. Specificity (selectivity) was determined by injecting a solvent blank to confirm that there were no false signal peaks at the targeted retention time. Each cannabinoid and terpene standard was individually injected to determine retention times. A linear regression (calibration) curve for each compound was constructed by plotting the peak-area ratio of STD/ISTD (y) against concentration (x, ppm). The slope, y-intercept and coefficient of determination were calculated from the standard curves.
To test accuracy (recovery), spiked samples were prepared by adding three levels of known concentrations of standards into three replicates of a known sample. Each spiked sample was injected three times (N=3). The spiked levels ranged from low, medium, to high concentrations of each analyte, as the contents of cannabinoids and terpenes vary significantly in the natural samples. Precision includes repeatability and intermediate precision, otherwise referred to as intraday precision and interday precision. Although ICH requires sampling from authentic samples, it was impractical in this case to obtain blank matrices completely free from cannabinoids and terpenes. In addition, contents of these compounds vary significantly in samples and some may below the LOQs. However, it was also impractical to spike every standard into samples to bring each analyte above the LOQ. In this work, we chose to spike mixed cannabinoid standard or mixed terpene standard into a solvent blank as blank matrices. Repeatability was determined by assaying a spiked blank matrices 12 times as intraday precision. Twelve data points (N=12) were used to calculate the relative standard deviation (%RSD) of the set. To obtain inter-day precision, six assays were repeated on two different days. Both intra-day and inter-day data were calculated together to determine intermediate precision (N=12).
Statistical analysis was applied to the linear regression line in order to determine the standard deviation (SD) and slope (S). From these values, LOD and LOQ were calculated by using the following equations: Regression (calibration) curves were visibly linear (Figure 2). Additionally, the correlation coefficients for all 10 cannabinoids ranged from 0.9993 to 1.0000 (Table 2). The %RSD of accuracy (recovery) ranged from 91.3% to 104.4%. The precision (%RSD) for all compounds were less than 1.39%. Inter-day injection precisions were found to be less than 1.45%. The method was precise in terms of repeatability and intermediate precision. Cannabinoids' LODs ranged from 0.07 to 0.99 ppm and LOQs ranged from 0.21 to 3.00 ppm (Table 2).

Method validation for terpenes (GC-MS)
A solvent blank was injected and no junk peak was observed at the targeted retention time area. Six concentrations of terpene standards were injected. Specificity was demonstrated by well-separated peaks ( Figure 3).
Regression (calibration) curves were visibly linear ( Figure 4). Additionally, the correlation coefficients for all 14 terpenes ranged from 0.9993 to 1.0000 (Table 3). The %RSD of accuracy (recovery) ranged from 93.0 to 104.8%. The method precisions (%RSD) for all compounds were less than 2.1%. Inter-day injection precisions were found to be less than 1.34%. The method was precise in terms of repeatability and intermediate precision. For all terpenes, LODs ranged from 0.82 to 3.69 ppm, and LOQs ranged from 2.47 to 11.2 ppm ( Table  3) respectively.

Quantification of cannabinoids and terpenes
In this work, each cultivar was labelled with an identifier for convenience (Table 4). Quantitative data for cannabinoids and terpenes are listed in Tables 4 and 5, respectively. Hierarchical cluster analysis was applied first and PCA was used to confirm the grouping results.

Cluster analysis
Total THC (∆ 9 -THC+THCA as T-THC) and total CBD (CBD+CBDA as T-CBD) were calculated and used in cluster analysis because THCA and CBDA become THC and CBD after decarboxylation [24]. The levels of T-THC ranged from 7.08% (LM20) to 0.24% (LM7). The levels of T-CBD ranged from undetectable amounts (LM2, 11,13,25,26) to 5.52% (LM7). Cluster analysis based only on T-THC and T-CBD classified 32 samples into four clusters, which is presented as a constellation plot in Figure 5. Cluster 1 and cluster 2 are THC dominant (chemotype I) with average T-THC more than 3% and average T-CBD less than 1% (Table 6)    of T-THC and T-CBD (chemotype II) at around 1.5% each (Table 6). Cluster 4 has only one cultivar and is CBD dominant (chemotype III) with T-CBD more than 5% and T-THC less than 1% (Table 6).

S. No Cannabinoids
However, if samples are grouped into clusters based on the full chemical profile (10 cannabinoids and 14 terpenes), the classification results changed (Figure 6). After involving more cannabinoids and terpenes, samples were classified into four clusters, with cluster 1, 2, 3 being THC dominant and cluster 4 being CBD dominant (Table 7). Furthermore, cultivars clustered together not only have similar THC and CBD content, but also have similar full profiles. Classification based on the full chemical profile may offer more flexible and reliable choices for clinical researchers and licensed producers in terms of choosing cannabis cultivars. For example, if LM28 is clinically studied and recommended for a particular condition, LM29 in the same cluster likely can be an alternative if LM28 is not available, due to the two cultivars' similarities.   of THCA and these terpenes, whereas separated clusters have distinct amounts of these compounds. For instance, Clusters 1, 2 and 3 are separated along PC1 (Figure 8) due to different combinations of THCA and these terpenes -this separation corresponds with cannabinoids and terpenes content in Figure 7. In addition, PC2 is more correlated with CBN, p-Cymene, CBC, CBDA, CBD, and THCV, which makes PC2 a "cannabinoids" item. For example, Clusters 1, 2, 3 (THC dominant) are separated from Cluster 4 (CBD dominant) along PC2 mostly due to the distinct CBDA content in Cluster 4. Additionally, Cluster 4 (LM7) may be related with higher percentages of p-Cymene (0.037%) compared to cultivars in Cluster 1, 2, and 3. However, Cluster 1 only contains one cultivar, which suggests that additional data is required to make a reliable conclusion. Finally, PC3 is more correlated with α-Pinene and ∆ 9 -THC, which explains the separation between Cluster 2 and Cluster 3 along PC3 (Figure 9). More specifically, LM20 in Cluster 2 has 0.038% α-Pinene and 0.14% ∆ 9 -THC while LM29 in  Table 4: Quantitative data for cannabinoids w/w (%). Cultivar  *T1  T2  T3  T4  T5  T6  T7  T8  T9  T10  T11  T12  T13

Principal component analysis (PCA)
Although Figure 7 gives a clear profile of all cannabinoids and terpenes levels in each cluster, some compounds are more important to the classification. In this case, 10 cannabinoids and 14 terpenes are the original 24 variables (24 dimensions) in PCA. By calculating the covariance matrix between these 24 dimensions, PCA can generate 24 new variables (24 PCs), that are orthogonal to each other and can explain 100% of the total variance of the original data. In this work, the first three PCs explain 65.3% of the total variance. Each PC is correlated with the original 24 variables. The first column in the loading matrix (Table 8) are the correlations of PC1 with each compound. The higher the absolute value, the high the correlation. For example, PC1 is more correlated with THCA, Limonene, Fenchol, Terpineol, Borneol, Linalool, β-Caryophyllene, Fenchone, β-Myrcene, which indicates that PC1 is more of a "THCA+terpenes" item. This conclusion indicates that cultivars within close proximity along PC1 have similar combination Cluster 3 has 0.337% α-Pinene and 0.59% ∆ 9 -THC, which also matches with average cluster content of α-Pinene and ∆ 9 -THC in Figure 7. The loading plot for PC1 and PC2 (Figure 8) gives an intuitive explanation whereby the longer the radial separation of the compound from the center, the more important the compound is in distinguishing cultivars in PC1 and PC2. The mathematical explanation is that the radial equals the square sum of the compound's correlations with PC1 and PC2 (Table 8). In conclusion, if cultivars are separated along PC1, they contain a distinct amount of THCA and terpenes (Limonene, Fenchol, Terpineol, Borneol, Linalool, β-Caryophyllene, Fenchone, β-Myrcene). If cultivars are separated along PC2, they contain different amount of cannabinoids (CBN, CBC, CBDA, CBD, and THCV) and p-Cymene. If they are separated along PC3, most likely the α-Pinene and THC contents are differentiable.  Table 6: Average levels of total THC and total CBD in each cluster in Figure 5.     After grouping these cultivars and visualizing the clusters in fewer dimensions using PCA, the grouping results were compared to the constellation plot from the cluster analysis (Figures 8 and 9). The cultivars circled in each group in the scatter plot were the same cultivars in each cluster as in the constellation plot. Because Cluster 1 and Cluster 2 and Cluster 4 were separated in PC1 and PC2 scatter plot, and Cluster 2 and Cluster 3 were separated in PC1 and PC3 scatter plot, visually the two grouping results match.

Conclusions
Existing approaches for cannabis classification may be inadequate because they analyze cannabis from botanical perspectives or based on only the two primary cannabinoids THC and CBD. In this work, an HPLC method for cannabinoids and a GC-MS method for terpenes were developed and validated. We quantified 10 cannabinoids and 14 terpenes in 32 medical cannabis samples from two licensed producers in Canada. Samples were classified using both cluster analysis and PCA. In cluster analysis, samples were grouped into four clusters, where clusters 1, 2 and 3 are THC dominant, and cluster 4 is CBD dominant. The result was different from cluster analysis using only THC and CBD content, which supports the hypothesis that classification based exclusively on THC and CBD may be insufficient when considering all medically relevant compounds in cannabis. PCA results confirmed the cluster results and also indicated which cannabinoids and terpenes are critical in discriminating cultivars. Currently, a systematic cultivar classification involving all commercially available cultivars in Canada has not been accomplished. However, this is necessary as  these relationships will allow clinicians to identify the right cannabis cultivar with the right components to achieve optimal treatment outcomes. The ultimate goal is to develop a systematic classification and standardization method using chemical and genetic analysis techniques in tandem that can link cultivars with morphological characteristics, chemical composition, and medicinal applications.