Molecular phylogeny of Artemisia (Asteraceae-Anthemideae) with emphasis on undescribed taxa from Gilgit-Baltistan (Pakistan) based on nrDNA (ITS and ETS) and cpDNA (psbA-trnH) sequences

1Department of Biological Sciences, International Islamic University Islamabad, 44000 Pakistan 2Department of Plant Sciences, College of Agricultural and Environmental Sciences, University of California Davis, 95616 USA 3School of Biological Sciences and Chemistry, Sungshin Women’s University, Korea 4Department of Plant Biotechnology, Atta-ur-Rahman School of Applied Biosciences, National University of Sciences and Technology Islamabad, 44000 Pakistan *Corresponding author: adil.phdbt31@iiu.edu.pk REGULAR PAPER


INTRODUCTION
The genus Artemisia (family Asteraceae; tribe Anthemideae) is a large taxonomically challenging group that includes ~500 species of both herbs and shrubs (Martin et al. 2003). Several species from this genus have a noteworthy economic status because they exhibit antispasmodic, antiseptic, antitumor antimicrobial, antimalarial, antirheumatic and hepato-protec-tive properties (Terra et al. 2007;Hussain et al. 2017). The genus is distributed primarily in the northern hemisphere's temperate zones; a few Artemisia species are also found in the southern hemisphere (Oberprieler et al. 2009). The centre of diversity for Artemisia is Central Asia. The earliest microfossils of the genus are known from the Miocene radiation (Wang 2004) and the Eocene end (Zaklinskaja 1957).
Since many years, the infrageneric classification of Artemisia has offered a challenge for researchers dealing with taxonomy. These historical studies were well acknowledged in the previous revelations of Torrell et al. (1999) and Vallès & McArthur (2001). From the studies of Tournefort (1700) to Bremer (1994) and Ghafoor (2002), all investigations regarding the classification and taxonomy of Artemisia were based on capitulum morphology. They documented four subgenera in the genus Artemisia (s. lat.) i.e. Artemisia, Absinthium, Seriphidium and Dracunculus as shown in table 1. During the course of this period, the position of Seriphidium as a separate genus or a subgenus of Artemisia (s. lat.) persisted and was a subject of discussion among taxonomists. For example, the generic recognition was implemented by Ling (1982), Bremer & Humphries (1993), Bremer (1994), Ling (1995) and Ghafoor (2002), whereas subgeneric status was followed by Kornkven et al. (1998Kornkven et al. ( , 1999, Torell et al. (1999), Watson et al. (2002) and D' Andrea et al. (2003). Kornkven et al. (1998) provided a pioneering molecular phylogenetic study of Artemisia based on nrDNA internally transcribed spacer (ITS) with the aim of resolving its interspecific associations. In their study, they supported the North American origin of Tridentatae. They concluded that Tridentatae could be restricted as a monophyletic group with the omission of A. palmeri A. Gray and A. bigelovii A.Gray. Subsequently, Torrell et al. (1999) revealed the phylogeny of genus based on ITS sequences, in which they found support for five subgenera of Artemisia: Artemisia, Seriphidium, Absinthium, Dracunculus and Tridentatae. These results were additionally confirmed by Watson et al. (2002) and followed by numerous other molecular phylogenetic revisions. The detailed history of molecular phylogenetic efforts on the genus is provided in table 2. These works here surveyed proposed numerous taxonomic reorganizations for the subgeneric classification based on molecular data and related the latter's outcomes with the traditional morphology-based classifications (table 1). However, the infrageneric classification has not yet been entirely fixed. This is because of the inconsistent assignments of some taxa during phylogenetic examinations with respect to the classification based on morphology.
In the flora of Pakistan, Ghafoor (2002) treated Artemisia (s. lat.) by two separate genera, Artemisia, with 25 species, and Seriphidium, with 13 species. All these 38 species are recorded from the arid and semi-arid areas of Baluchistan, Khyber Pakhtunkhwa, North Punjab and the temperate areas of Gilgit-Baltistan and Kashmir territory (Ghafoor 2002). Within Pakistan, the centre of diversity for the genus is the western Himalayan region (Hayat et al. 2009). Hayat (2011) initiated the phylogenetic study of Pakistani Artemisia using ITS and ETS sequences of nrDNA and found support for uniting the two genera. Malik et al. (2017) further confirmed this finding, treating Seriphidium as a subgenus of Artemisia. Mahmood et al. (2011) carried out a molecular phylogenetic study of Artemisia species collected from different localities of Pakistan based on restriction fragment length polymorphism of the chloroplast rps11 gene. They provided evidence that hybridization occurred at an infrageneric level during the evolutionary process, due to which the natural classification of the genus is still a challenging problem.
Here, we determine the phylogenetic position of ten undescribed Artemisia taxa from northern Pakistan, using nrDNA internal transcribed spacer (ITS), external transcribed spacer (ETS) and cpDNA intergenic spacer (psbA-trnH) regions.

Study area
Gilgit-Baltistan is a northeastern region of Pakistan situated between 74°-77.5°E and 34.6°-37.4°N, covering an area of about 45 224 km 2 . The altitude of this region ranges from ±1400 m to 8611 m. The area is divided into seven main districts, i.e. Gilgit, Skardu, Hunza-Nagar, Astore, Diamer, Ghizer and Ghanche. This region includes world-renowned mountain ranges like the Karakorum, Hindu Kush and the Himalayas. There are several peaks with heights above 7000 m, including Godwin Austin (K-2, 8611 m), Rakaposhi (7788 m) and Deran peak (7268 m). The world's largest glaciers are also found in this region, such as Baltoro Glacier, which extends for about 62 km with an area of 529 km 2 (Anonymous 2003). This area is well known for a great diversity of plants (Shinwari 2010) and is a centre for traditional medicinal herbs (Shinwari & Gilani 2003).

Plant collection and sampling
The plant samples employed for molecular phylogenetic analysis were taken from both herbarium specimens and silica gel dried samples collected during expeditions to various parts of Gilgit-Baltistan region of Pakistan as already given in our preceding paper (Hussain et al. 2019). Provenance of the different populations of Artemisia studied from Northern Pakistan, with their collection details are listed in table 3. Thus, covering all the Northeastern Pakistani endemic Artemisia taxa representing five subgenera of the genus Artemisia, including Artemisia, Absinthium, Dracunculus, Pacifica and Seriphidium, were included, except the North American endemic A. tridentata Nutt. of which we could not get the material.
Voucher specimens were deposited in the herbarium of Pakistan Museum of Natural History (PMNH) and the details are given in table 3. Earlier published ITS (Internal transcribed spacer), ETS (External transcribed spacer) of nrDNA and psbA-trnH (Intergenic spacer) of cpDNA sequences representing all subgenera of the genus Artemisia were retrieved from GenBank (supplementary file 1).

Genomic DNA extraction and quantification
After the leaves were cleaned up with ethanol (70%), genomic DNA was extracted from dried leaves by using CTAB method (Doyle & Doyle 1990) and when necessary, the plant DNeasy kit (QIAGEN) was used. Quantification of extracted genomic DNA was done on the basis of measuring A260/280 using a ND-2000 spectrometer (Nanodrop Technologies, Wilmington DE USA) as given by Urreizti et al. (2012). The visual quality of extracted DNA was checked with 1.5% agarose gel electrophoresis.
PCR conditions for the amplification of nuclear ITS9-6 region were: pre-denaturation at 95°C for 2 minutes, followed by 35 cycles of denaturation at 95°C for 30 seconds, annealing at 50°C for 1 minute or 55°C for 30 seconds, and extensions at 72°C for 1 minute, with final extension at 72°C for 5 minutes. PCR conditions for the amplification of nuclear ETS region were: pre-denaturation at 97°C for 2 minutes, followed by 36 cycles of denaturation at 97°C for 2 seconds, annealing at 55°C for 30 seconds, and extensions at 72°C for 30 seconds, with final extension at 72°C for 7 minutes. PCR conditions for chloroplast psbA-trnH region were: pre-denaturation at 94°C for 5 minutes, followed by 30 cycles of denaturation at 94°C for 1 minute, annealing at 55°C for 1 minute, extension at 72°C for 1.5 minute, with the final extension at 72°C for 7 minutes. The electrophoresis of PCR products was carried out at 100 voltages for 45 min in a 1.5% agarose gel prepared in 1xTBE (Trisborate-ethylenediaminetetraacetic acid) buffer and finally checked under the trans-illuminator with ultra violet light. The size of PCR product was observed on the gel by means of 1kb DNA standard size markers (N-3232L, Biolabs Company). After visualizing its size, the gel extraction of PCR product was performed with QIAquick gel extraction kit (QIAGEN) following the standard protocol.

Nucleotide sequencing and alignment
The amplified DNA regions were sequenced in both directions in the core UC Davis sequencing facility using capillary electrophoresis genetic analysers (ABI 3730) with Big-Dye terminator version 3.1 cycle sequencing (ABI) from both strands, using the primer set ITS (ITS9 and ITS6 ), ETS (ETS-AST-1 and 18SETS) and psbA-trnH (psbA3'f and trnHf) (table 4). The raw sequenced data from studied taxa were assembled using BioEdit version 7.1.9 (Hall 1999) and Sequencher version 5.4.6 software (Gene codes Co.). A total of four multiple sequence alignments (MSAs) generated from three markers for newly sequenced data of 28 Artemisia species from northern Pakistan with those of retrieved carefully from GenBank were nrDNA-ETS (n = 79) (supplementary file 2), nrDNA-ITS (n = 78) (supplementary file 3), and cpDNA-psbA-trnH (n = 65) (supplementary file 4). One multiple sequence alignment (MSA) was generated by concatenating these three markers with maximum species coverage but with missing data (CAT79; n = 79) (supplementary file 5

Model selection and phylogenetic analysis
At first, Artemisia ITS, ETS and psbA-trnH sequences were examined independently with the aim of evaluating congruence among the markers. Then, the sequences from the three regions were aligned separately (ITS with 657 characters, ETS with 397 characters and psbA-trnH with 396 characters) and concatenated (Haghighi et al. 2014;Holzmeyer et al. 2015) in the final data matrix of 1450 characters (table 5). This concatenated nuclear ribosomal and chloroplast dataset was scrutinized with maximum likelihood, maximum parsimony algorithms and Bayesian inference analyses to check the taxonomic relationships within the genus Artemisia. The best base substitution models were determined for the MSAs of each individual marker (ETS, ITS, and psbA-trnH) and were used for phylogeny reconstruction with ML and Bayesian approaches. In all cases the best models were predicted using jModelTest version 2.1.7 (Darriba et al. 2012) (options: -f -g 4 -i -s 203 -S BEST -t ML). The best model was designated on the basis of Bayesian information criterion (BIC). The estimated model was then passed on to GARLI version 2.0.1 (Zwickl 2006) to generate a maximum likelihood tree. GARLI was executed under default conditions except for the following options (options: genthreshforto-poterm = 100000, significanttopochange = 0.00001, treerejectionthreshold = 50.0). Parameters values were estimated by GARLI. Four parallel searches have been performed to get rid of choosing a tree lodged on local optimum. Branches with length less than 1x10 -8 substitution/site were collapsed. Bootstrap analysis was conducted with 1000 replicates. For the concatenated tree, region for each marker was partitioned and treated independently.
MrBayes version 3.2.1 software (Ronquist et al. 2012) was used for BI analyses for ITS, ETS and psbA-trnH substitution parameters estimated in different partitions for the pooled data. With four Metropolis Coupled Chains, two autonomous Markov Chain Monte Carlo (MCMC) analyses were run for 5 million generations, sampling every 100 groups (Malik et al. 2017). The best fitting DNA substitution model for BI analyses was nominated with Mr.Modeltest version 2.3 (Nylander 2004), GTR+I+G for the combined data set as well as for the individual cpDNA and nrDNA data sets was done. After the validation of average standard deviation of split frequencies to < 0.0, the first 25% trees were discarded as 'burn in' and 1.0 potential scale reduction factor was approached for all factors. The samples left were merged to construct a 50% majority rule consensus trees for posterior probabilities.
For the ETS, ITS, and psbA-trnH sequences, jModelTest predicted HKY+G, 012030+I+G with equal equilibrium base frequencies, and 012010+G as the best model respectively. For CAT79, the best model for the portions representing ETS, ITS, and psbA-trnH were HKY+I+G, 012010+G with equal equilibrium base frequencies, and 012010+G respectively.

RESULTS
Data on the lengths of amplified DNA regions, raw sequences, MSAs and the numbers of informative characters for sequences of nuclear ribosomal (ITS and ETS) and chloroplast (psbA-trnH) DNA for all investigated samples of Artemisia are provided in table 5. All trees attained from independent ML, MP and Bayesian analyses of psbA-trnH, ITS and ETS regions recovered similar topologies with no significant conflicts. Some discordance involving clades with lesser support were observed, which could be taken as soft incongruences. When the data from three different markers were concatenated, the Bayesian, maximum likelihood and maximum parsimony approaches of the combined dataset exhibited slightly different phylogenetic reconstructions (supplementary file 6). Nevertheless, the ML and Bayesian tree provides greater resolution than the tree attained with MP. Only a consensus tree with BS and BI values from ML, MP and Bayesian tree is provided in fig. 1. The inclusion of subgenus Seriphidium within the genus Artemisia is evident and strongly supported (PP = 1.00; ML-BS = 100%, MP-BS = 100%). In the resulting trees, maximum backbone nodes revealed better support (PP > 0.80; BS > 50%) except few lineages displayed poorly determined nodes.
We also observed ten new undescribed taxa of Artemisia from the Northeast (Gilgit-Baltistan) region of Pakistan. On the basis of our phylogenetic analysis, these undescribed taxa were categorized as new groups (Groups I, II & III). All clades which comprise undescribed taxa were also fully supported. One undescribed taxon (Artemisia sp.  . 4).

DISCUSSION
The data presented in the ML tree ( fig. 1) based on ITS, ETS and psbA-trnH marker genes shows the dispersion of northeastern Pakistani Artemisia throughout the clades corresponding to the subgenera. The tree indicated that all sampled species of genus Artemisia form a well-supported monophyletic group (PP = 1; ML-BS = 100%, MP-BS = 100%). From this study, some primary conclusions about the inclusion of Seriphidium within Artemisia genus and appearance of some undescribed taxa (Groups in fig. 1) can be made on the emerging pattern of the resultant phylogeny.
Subgenus Artemisia was also not supported as monophyletic and appeared as polyphyletic with its species placed in four major clades corresponding to subgenera Absinthium, Artemisia, Dracunculus and Seriphidium. Morphologically, subgenus Artemisia is different from the other subgenera on the basis of plesiomorphies (heterogamous, disciform capitula with pistillate ray florets and fertile disk florets) and this subgenus needs to be recircumscribed. In previous findings, the two subgenera like Absinthium and Artemisia both were previously pooled as subgenus Artemisia (Gray 1984;Watson et al. 2002;Shultz 2009). But some studies based on molecular data separated them as distant subgenera. Apparently, in this study, these two formed a clade. So, it requires further investigation with more species to decide whether these two could be merged within a single subgenus Artemisia or not. However, in their study, Gray (1884) and Watson et al. (2002) united these two subgenera in a single subgenus Artemisia.
Species from the subgenus Dracunculus formed a strongly supported clade (PP = 1; ML-BS = 96%, MP-BS = 80%) that is sister to a clade comprising two species of subgenus Artemisia, viz. A. biennis Willd. and A. tournefortiana Rchb. Watson et al. (2002) retained Dracunculus as a subgenus of the genus Artemisia but our study found subgenus Dracun-culus forming sister clade with subgenus Artemisia groups. Morphologically, subgenus Dracunculus possesses heterogamous flower heads with pistillate outer florets and sterile inner florets.
The subgenus Tridentatae formed a monophyletic group with strong support (PP = 1; ML-BS = 97%, MP-BS = 88%) in the ML tree obtained from combined sequenced data of the three markers. But, the monophyly of subgenus Tridentatae was not consistent in the trees generated with separate sequenced data. The monophyly of subgenus Tridentatae is confirmed in many previous studies (Kornkven et al. 1998(Kornkven et al. , 1999Torrell et al. 1999;Vallès et al. 2008).
Species from subgenus Pacifica also formed a strongly supported monophyletic group (PP = 1; ML-BS = 100%); its monophyly is confirmed, in agreement with Hobbs & Baldwin (2013) and Malik et al. (2017) retaining it as a subgenus. More studies of the diverse and large genus Artemisia (s. lat.) are crucial for the further unravelling of the phylogeny of the genus.
Besides the infrageneric classification of Artemisia, our phylogenetic investigation observed and placed some undescribed taxa of Artemisia as three unique groups (Group I, II & III) from the Northeast (Gilgit-Baltistan) region of Pakistan ( fig. 1).
One undescribed taxon (Artemisia sp. AD-H) (group I) appeared with high supporting values (PP = 1; ML-BS = 83 %, MP-BS = 100 %) within subgenus Dracunculus. Four undescribed taxa appeared as Group II with high supporting values (PP = 1; ML-BS = 98 %, MP-BS = 76 %) in the second clade of subgenus Absinthium. The undescribed taxa within Group II were placed with the A. rutifolia Steph. ex Spreng. lineage. This clade was therefore named "A. rutifolia complex". In the genus Artemisia, previous workers have already reported taxonomic complexes, for example the A. vulgaris complex, described in detail by Kaul & Bakshi (1984) and again reported by Sanz et al. (2008). A detailed morphological study of extensive sampling coupled with modern molecular techniques might resolve the taxa delimitation in the A. rutifolia complex, possibly leading to identification of new species.
In the first clade of subgenus Absinthium, five undescribed taxa were placed in Group III with strong PP support and moderate ML and MP support (PP = 1; ML-BS = 62%, MP-BS = 65%). If we compare a minimum branch length before the terminal node in a clade then it is clear that the five taxa are different from each other. This is because the branch lengths are too long in case of Group III. This is also the case for the sample observed as undescribed taxon in Group I.
The new groups of undescribed taxa of Artemisia shown in this study might represent putative new species. Koloren et al. (2016) observed two new haplotypes within Artemisia samples including both rare and common ones from the Ordu province of Turkey. In their resulting phylogenetic trees, the two haplotypes were placed with A. argyi H.Lév. & Vaniot, A. sylvatica Maxim., and A. verlotiorum Lamotte of subgenus Artemisia. Additionally, we agree with the conclusions made by Koloren et al. (2016) that the grouping of all new Artemisia haplotypes disjointedly from each other requires further multiple approach taxonomic examinations. Such inquiries must include an extensive number of samples in order to confirm and characterize potential new species or subspecies.

CONCLUSION
This study reports for the first time, molecular phylogeny of Artemisia from the northeastern region (Gilgit-Baltistan) of Pakistan using nrDNA (ITS and ETS) and cpDNA (psbA-trnH) sequences. The results confirmed polyphyletic appearance of subgenus Artemisia and Absinthium. Other subgenera including Tridentatae, Pacifica and Dracunculus were found to be monophyletic. Species of subgenus Seriphidium formed a single clade with annual species of subgenus Artemisia. The undescribed Artemisia taxa from Northeast region of Pakistan were placed in three groups within the resulting phylogenetic tree. One observed new group belongs to the subgenus Dracunculus, and the other two belongs to the subgenus Absinthium. Within these new groups, one undescribed taxon of Artemisia in group I was found with A. japonica and A. desertorum lineages. Four undescribed taxa within group II were designated with A. rutifolia lineage. Five undescribed taxa within group III were found in the same lineage with A. sieversiana. Based on the current data and all available in literature, it is concluded that the morphological studies coupled with modern molecular techniques may lead to the clear infrageneric classification of the genus Artemisia. It will also clarify and characterize the undescribed taxa reported in this study.