Molecular evolution and phylogenetic relationships of Ligusticum (Apiaceae) inferred from the whole plastome sequences

The genus Ligusticum belongs to Apiaceae, and its taxonomy has long been a major difficulty. A robust phylogenetic tree is the basis of accurate taxonomic classification of Ligusticum. We herein used 26 (including 14 newly sequenced) plastome-scale data to generate reliable phylogenetic trees to explore the phylogenetic relationships of Chinese Ligusticum. We found that these plastid genomes exhibited diverse plastome characteristics across all four currently identified clades in China, while the plastid protein-coding genes were conserved. The phylogenetic analyses by the concatenation and coalescent methods obtained a more robust molecular phylogeny than prior studies and showed the non-monophyly of Chinese Ligusticum. In the concatenation-based phylogeny analyses, the two datasets yielded slightly different topologies that may be primarily due to the discrepancy in the number of variable sites. Our plastid phylogenomics analyses emphasized that the current circumscription of the Chinese Ligusticum should be reduced, and the taxonomy of Ligusticum urgently needs revision. Wider taxon sampling including the related species of Ligusticum will be necessary to explore the phylogenetic relationships of this genus. Overall, our study provided new insights into the taxonomic classification of Ligusticum and would serve as a framework for future studies on taxonomy and delimitation of Ligusticum from the perspective of the plastid genome.

Next-generation sequencing technology provides more DNA sequencing data than before and can be employed for phylogenetic studies within angiosperms [22]. Meanwhile, plastome-scale data has been used successfully to address phylogenetic problems at various taxonomic levels. For example, Li et al. [23] used 2881 plastid genomes to construct angiosperm phylogeny and date the origin of the crown angiosperms to the Upper Triassic. Wang et al. [24] constructed the phylogeny of Angelica and demonstrated the power of plastid phylogenomics in resolving the phylogeny of this complex genus. Wen et al. [25] revealed a new backbone relationship of Apioideae from plastid phylogenomic analysis. At present, there are two major methods to construct phylogenetic trees: the concatenation method and the coalescent method. Generally, the coalescent method can construct a phylogenetic tree more accurately than the concatenation method, and the concatenation method may produce spuriously high bootstrap support but topologically incorrect phylogenetic trees with the addition of more data [26,27]. Recent studies have shown that it is necessary to construct the phylogenetic tree of plastid protein-coding genes by the coalescent method [28][29][30]. Hence, we utilized these two methods to estimate the phylogeny of Ligusticum.
Here, 26 Ligusticum plastomes (including 14 newly sequenced) representing all four currently identified clades in China were used for molecular evolutionary analysis and phylogenetic reconstruction. Our aims were to (1) describe the diversity of plastome characteristics and the evolutionary pattern of plastid protein-coding genes within Ligusticum; (2) obtain a robust Ligusticum phylogeny and assess the power of the plastome-scale data for resolving the phylogeny of this genus; (3) comment on the current taxonomy of Ligusticum in China based on the plastome sequences.

Molecular evolutionary pattern of plastid protein-coding genes
Fifty-three protein-coding sequences (CDSs) of each Ligusticum species were selected to determine the codon    Table S2). We identified 55-60 RNA editing sites for 20-23 protein-coding genes from each Ligusticum species. (Additional file 5: Table S3). Further analysis found that most RNA editing events occurred in the ndh gene (22)(23)(24). Although Ligusticum appeared to have a similar pattern of RNA editing, several specific editing sites have been picked out: petD (1 site; only identified in L. involucratum) and rps8 (1 site; only identified in L. jeholense).
The ω values of CDSs for 79 plastid protein-coding genes ranged from 0.0001 to 0.9065 (Fig. 4), suggesting conservation of plastid protein-coding genes in Ligusticum. Most genes were under strong purifying selection with a very low ω value (ω < 0.5), yet the ω values in the range of 0.5 to 1.0 (indicating relaxed selection) were observed for seven genes petG, ccsA, rps8, rpl33, psaJ, ycf1, and ycf2 ( Fig. 4). However, we found that only three genes (rps8, ycf1, and ycf2) were under relaxed selection due to their significance (P < 0.05) after the likelihood ratio test (LRT) (Additional file 6: Table S4). Nucleotide diversity (Pi) of these 79 CDSs was calculated to assess the sequence divergence level (Additional file 6: Table S4). Among these, Pi values ranged from 0 to 0.02071 (Fig. 4). Six CDSs had relatively higher Pi values, including matK, cemA, ycf1, psbK, ndhF, and atpF genes, which revealed that these genes were more divergent and evolving more rapidly than other genes (Fig. 4). Conversely, CDS of rpl36, psbF, psaI, psbL, psbI, and psbZ genes shared very low Pi values, suggesting that these genes are highly conserved (Fig. 4). Collectively, the low Pi values also indicated that the plastid protein-coding genes were conserved in Ligusticum.

Phylogenetic relationships
We performed a series of phylogenetic analyses using two datasets (complete plastome sequences and 76 common CDSs) and two methods (concatenation and coalescent-based analyses) for 66 species of Apiaceae (Additional file 7: Table S5). The aligned two datasets were 124,230 bp and 65,979 bp long, with 23,598 and 8932 variable sites, and proportions of 18.99% and 13.54%, respectively. As expected, our analyses obtained robust support at most nodes. All phylogenetic analyses produced largely identical tree topologies, the incongruence mainly occurred in the interspecific relationships within clades, and the relationship between the clades was congruent except for the systematic position of Cachrys Clade (Figs. 5, 6; Additional file 2: Fig.  S2). Cachrys Clade was resolved as sister to Sinodielsia Clade + Selineae ((Selineae, Sinodielsia Clade), Cachrys Clade) in the ML tree based on dataset-2 (BS = 54), while it was sister to Apieae in the other four phylogenetic trees with moderate-to-high support (Figs. 5, 6; Additional file 2: Fig. S2). For Ligusticum, it was still a non-monophyletic taxon, and the clades of these species were consistent with previous studies. Although we have enriched the plastome data of Ligusticum, the systematic position of L. pteridophyllum is still unclear in this study. L. pteridophyllum belonged to Sinodielsia Clade based on dataset-1 (BS = 98, PP = 1), while it was resolved as sister to Sinodielsia Clade + Selineae ((Selineae, Sinodielsia Clade), (L. pteridophyllum, L. pteridophyllum_DL)) in the other three phylogenetic trees (BS = 100, PP = 1, LPP = 1) (Fig. 6, Additional file 2: Fig. S2). Given that the variation level of the 76 CDSs and the incongruent topologies of dataset-2 (76 CDSs) obtained by different analyses (ML and BI), as well as the positions of L. pteridophyllum and Cachrys Clade were distinct from dataset-1, we then used 76 CDSs to perform a phylogenetic analysis according to the multi-species coalescent model by ASTRAL v5.7.3 [45] (Fig. 6). Thus, we used this coalescent-based phylogeny and concatenation-based phylogeny (dataset-1) as the basis in this study (Figs. 5,6).

Discussion
The diversity of plastome characteristics By combining the 14 newly sequenced plastome sequences with the 12 published sequences of Ligusticum, we can represent all four currently identified clades in China. The Ligusticum plastomes were variable among the four clades, as well as the plastomes of Selineae and Sinodielsia Clade were significantly different from that of Acronema Clade and East-Asia Clade, which might have phylogenetic and taxonomic significance. Previous studies concluded that the total length of angiosperm plastomes is usually influenced by the contraction and expansion of the IRs [31,32]. Similarly, we noticed that the longer total lengths of six Ligusticum (L. angelicifolium, L. tenuissimum, L. litangense, L. weberbauerianum, L. delavayi, and L. tachiroei) plastomes were determined by the expansion of IRs. We detected four types of LSC/ IRb border (ycf2, petB, rpl22, and rps19 genes) and four types of LSC/IRa border (petB-trnH-GUG , trnL-CAA-trnH-GUG, rps19-trnH-GUG , and rpl2-trnH-GUG ), which were also reported in Apiaceae and other plant lineages [25,28,[33][34][35]. Compared to the dynamically shifted LSC/IR border, the SSC/IR border was more conserved, as most SSC/IR borders were ycf1 genes with a few exceptions in the 26 Ligusticum plastomes. In addition to IR border shifts, IR has been significantly increased, reduced, or even eliminated, such as in Pelargonium × hortorum [36], Cephalotaxus oliveri [37], some species of Erodium and Pinaceae [38,39]. Gene content of Ligusticum was not conserved, mainly due to the increase of gene number caused by the expansion of IRs [40]. For example, L. angelicifolium possessed the most genes. The rpl22, rps3, rpl16, rpl14, rps8, infA, rpl36, rps11, rpoA, and petD genes located in LSC regions of other Ligusticum species, have moved to IR regions to become double-copy genes in L. angelicifolium. GC content of Ligusticum plastomes was close to other Apiaceae [25,41]. High GC content was observed in IRs, which is probably due to the presence of the four rRNA genes [42,43] as they had a GC content of up to 54.9-55.3% (Fig. 2).

The evolutionary conservation of plastid protein-coding genes
Codon usage bias is an important evolutionary feature in the genome that can be influenced by many evolutionary processes [44]. Therefore, codon usage bias provides useful information for studying molecular evolution. GC content is generally the product of directional mutation pressure and is a critical factor affecting codon usage [38,44,45]. All 26 Ligusticum plastomes had a strong bias toward A/T at the third codon position as observed in other angiosperm species [46,47]. High AT content in plastomes is the major reason for bias codons ending with A/T [48]. RNA editing is one of the posttranscriptional maturation processes of primary transcripts, which allows nucleotide insertion/deletion and conversion to alter transcripts [49,50]. The first chloroplast RNA editing was discovered in maize rpl2 transcript, in which an initiation codon ACG changes to AUG [51]. After that, RNA editing has been found in a growing number of higher plant chloroplasts. The ndh genes encode subunits of the plastid NDH (NADH dehydrogenase-like) complex, which contained most RNA editing sites for Ligusticum species. The ndh genes play an important role in mediating cyclic electron flow around photosystem I and facilitating chlororespiration [52]. Therefore, RNA editing on the ndh genes is more likely to ensure the physiological and biochemical processes of the plant. Similar codon usage and RNA editing patterns for 26 Ligusticum plastomes possibly because of the evolutionary conservation of plastomes among angiosperms. The synonymous and non-synonymous nucleotide substitution pattern is a major indicator in the study of gene evolution. The ratio (ω) of dN/dS is generally interpreted as: purifying selection (ω < 1, especially less than 0.5), positive selection (ω > 1), neutral evolution (ω = 1), whereas ω value close to 1 indicates relaxed selection [53,54]. All protein-coding genes held low ω and Pi values, suggesting the conservation of plastid genes in Ligusticum. Three genes (rps8, ycf1, and ycf2) were under relaxed selection. The rps8 is one of the genes that encodes a protein for the small ribosomal subunits, therefore essential for the plastid ribosome [55]. The rps8 gene was under positive selection in Curcuma [56]. The plastid gene rps8 RNA editing defect accounted for the low-temperature sensitivity in rice and maize [57,58]. This indicated that rps8 gene is very important for plant adaptability. The ycf1 gene, the second-largest gene in the plastome, is indispensable for photosynthetic protein import and is therefore vital for plant viability [59]. Positive selection or relaxed selection on ycf1 have been observed in Bulbophyllum [60] and Lennoaceae [61]. ycf2 is a conserved open reading frame with the exact function still unknown, although its putative gene product is a protein of 2280 amino acids [55,62]. Ligusticum is mainly distributed in the alpine and subalpine regions of Southwest China. Consequently, we speculated that the possible relaxed selection pressure on these three genes may be related to adapting to high-altitude living environments. Ren et al. BMC Ecology and Evolution (2022) 22:55

Phylogenetic relationships and taxonomic implications
Early studies have shown that the genus Ligusticum was a non-monophyletic group [14][15][16][17][18][19][20][21], and was divided into six clades: Acronema Clade, Conioselinum chinense Clade, Pyramidoptereae, Selineae, Sinodielsia Clade, and East-Asia Clade [8]. Here, the Ligusticum plastomes of 20 species (26 accessions) representing all four currently recognized clades in China were used to reconstruct the phylogenetic trees. We used different datasets and methods to perform phylogenetic analyses to obtain robust phylogenetic relationships of this genus, which revealed that the plastome-scale data is a promising tool for resolving the phylogeny of the controversial taxon. In the concatenation-based phylogeny analysis, dataset-1 and dataset-2 yielding slightly different topologies may be primarily due to the discrepancy in the number of variable sites. On consideration, we finally decided to use coalescent-based phylogeny and concatenation-based phylogeny (dataset-1) as the basis to explore the phylogeny of Ligusticum.
Among the four clades of Ligusticum, only Sinodielsia Clade was not monophyletic, owing to the two accessions of L. pteridophyllum not being clustered with other Sinodielsia Clade species in the coalescent-based result. This was also observed in previous studies [16]. In fact, the systematic position of L. pteridophyllum has not been correctly described. L. pteridophyllum was once placed in the Selineae [14] or Sinodielsia Clade [8]. Zhou et al. [8] involved more Ligusticum species in their study, and the results were more reliable. Morphologically, L. pteridophyllum does not share several general characteristics of Ligusticum in Selineae, such as pinnate bracteoles, and with fibrous remnant sheaths at the stem bases (Additional file 8: Table S6). Taken together, we agreed that L. pteridophyllum belongs to Sinodielsia Clade, while more species in Sinodielsia Clade should be involved to verify this result. Six other accessions of Ligusticum clustered with C. officinale and formed a clade in Sinodielsia Clade, with C. officinale being closely related to L. sinense. C. officinale has been referred to as L. officinale [8], and the strong cross-hybridization of genomes was found between C. officinale and L. sinense [63].
Selineae contained most Ligusticum species, whereas they did not group together in this tribe. Two accessions of L. thomsonii and L. angelicifolium were not clustered with eleven other accessions of Ligusticum, which may be explained by morphology. The most obvious difference among them is bracteole and fruit. L. thomsonii and L. angelicifolium have linear or lanceolate bracteoles, as well as the prominent dorsal and intermediate ribs, winged lateral ribs. However, the eleven other accessions of Ligusticum have pinnate bracteoles, as well as raised dorsal and intermediate ribs, winged lateral ribs (Additional file 8: Table S6). The genus Ligusticopsis with 14 species was separated from Ligusticum because of its prominent calyx teeth [7], however, these 14 species did not form a monophyletic group. Pimenov [64] proposed several new nomenclarural combinations in Ligusticopsis, which included seven species (L. brachylobum, L. capillaceum, L. daucoides, L. hispidum, L. involucratum, L. likiangense, and L. scapiforme) analysed in this study. We approved this conclusion and suggested that L. oliverianum should be also incorporated into the genus Ligusticopsis based on molecular and morphological evidence (Figs. 5, 6; Additional file 2: Fig. S2; Additional file 8: Table S6).
L. weberbauerianum (= H. weberbaueriana) and L. litangense fell into East-Asia Clade. The two accessions of H. weberbaueriana (= L. weberbauerianum) were sisters in the phylogenetic trees, thus we agreed with this treatment that L. weberbauerianum is a synonym of H. weberbaueriana [65,66]. Several studies have shown that L. litangense should be placed in Hansenia rather than Ligusticum [64]. L. litangense was related to H. phaea within Hansenia, which implied that L. litangense should be merged into Hansenia [8,64,67].
Two species (L. tachiroei and L. delavayi) that were in Acronema Clade formed a clade with strong support. However, the two species should be transferred from Ligusticum to another genus according to prior research [64]. It is worth mentioning that the generic type of Ligusticum (namely, Ligusticum scoticum) was placed in Acronema Clade [8,14]. In the present study, most of the Ligusticum species were not fell into Acronema Clade, except for L. tachiroei and L. delavayi. Moreover, Zhou et al. [8] found that L. scoticum and L. scoticum subsp. hultenii occurred in Acronema Clade formed a monophyletic group with high support, and they were separated from L. tachiroei and L. delavayi. Consequently, in the light of the plastome's results, we concluded that the current circumscription of the Chinese Ligusticum should be reduced, which is consistent with Zhou et al. 's [8] study based on ITS sequences.
Ligusticum is one of the most taxonomically difficult taxa within Apiaceae, largely due to the diversity of flowers, leaves, bracteoles, and mericarps. Ligusticum is described as a dustbin genus, as it contains several species that cannot be classified correctly [8]. In addition, fruit is the most important taxonomic character of Ligusticum, yet most species of the genus grow at high elevations with late fruiting. As a perennial herb, the genus sometimes does not blossom and bear fruit in a year, which was encountered many times during our field sampling. These factors make it difficult to sample the fruits of Ligusticum, resulting in a lack of unique taxonomic characters. Thus, the fruits of Ligusticum should also be collected to provide a morphological foundation for the taxonomic revision of Ligusticum. Together with the molecular phylogenetic analyses, including the use of traditional molecular markers and plastome-scale data [8,[14][15][16][17][18][19][20][21], we therefore strongly argue that a revision of Ligusticum taxonomy is necessary. Further studies will require more taxa of Ligusticum and its allied genera, as well as combine molecular and morphological evidences to resolve the taxonomy and delimitation of Ligusticum. Overall, our study provides new insights into the taxonomic classification of Ligusticum and will serve as a framework for future studies on the taxonomy and delimitation of Ligusticum from the perspective of the plastid genome.

Taxon sampling and DNA extraction
We newly sequenced 14 plastomes, including 13 species covering four clades of Ligusticum. We also recovered 12 plastomes from the NCBI database (https:// www. ncbi. nlm. nih. gov/). In total, we sampled 20 species (26 accessions) within Ligusticum (Additional file 3: Table S1). Fresh leaves from adult plants of each newly sequenced species were collected in the field and immediately dried with silica gel for future DNA extraction. These plants are not protected, therefore permission is not required for sample collection. The species identification of the plant material was undertaken by Xingjin He (Sichuan University, Chengdu, China). Voucher specimens were deposited at the herbarium of Sichuan University (Chengdu, China) (Additional file 3: Table S1). Total genomic DNA was extracted from silica-dried leaves with a CTAB protocol [68]. The quality and concentration of the DNA products were assessed using 1% agarose gel electrophoresis and a Quant-iT PicoGreen dsDNA Assay Kit.

Illumina sequencing, assembly, and annotation
The DNA library with an insert size of 400 bp was constructed using the TruSeq DNA Sample Preparation Kits (Illumina) according to the manufacturer's protocol. The DNA library was sequenced using Illumina NovaSeq platform with an average paired-end read length of 150 bp at Shanghai Personal Biotechnology Co., Ltd. (Shanghai, China). The quality of the newly generated sequencing data was assessed using the FastQC v0.11.9 software [69]. The obtained raw reads were adapter-trimmed and quality-filtered by AdapterRemoval v2 (trimwindows = 5 and minlength = 50) [70], yielding at least 5 GB clean reads for each species. Clean reads were then used to perform a de novo assembly by NOVOPlasty v2.6.2 (K-mer = 39) [71]. The seed sequence was the rbcL gene from the reference plastome sequence of L. delavayi (NC_049052) [16]. The annotation of the 14 plastomes was completed using GeSeq [72], and we manually adjusted the positions of start and stop codons and the exon/intron boundaries in Geneious v9.0.2 (Biomatters Ltd., Auckland, New Zealand) against its congeneric species. The 14 newly obtained plastome sequences are available at the Gen-Bank (Accession numbers: MZ532560-MZ532573). The circle plastome map was generated using the online program OrganellarGenomeDRAW (OGDRAW) [73].

Molecular evolutionary analysis
To identify the codon usage patterns, MEGA6 [74] was employed for the codon usage bias analyses using protein-coding genes with CDS lengths greater than 300 bp to avoid sampling bias [75]. The heatmap was drawn using TBtools [76]. The total GC content and the GC content for the first, second, and third codon positions of these CDSs were also calculated by MEGA6 [74]. To reveal the composition and characteristics of RNA editing, the potential RNA editing sites in protein-coding genes of 26 Ligusticum plastomes were predicted using the PREP-Cp program [77] with a cutoff value of 0.8.
To explore the selection patterns on the plastid proteincoding genes, the nonsynonymous (Ka) and synonymous (Ks) nucleotide substitution rates of 79 protein-coding genes were calculated using a site-specific model implemented in Codeml program (seqtype = 1, model = 0, NSsites = 0, 1, 2, 3, 7, 8) [78] of PAML4.9 software [79]. Codon frequencies were determined using the F3 × 4 model and gapped regions were excluded with the parameter "cleandata = 1" option. For PAML analyses, the ML tree constructed using RAxML v8.2.8 [80] based on 79 plastid protein-coding genes was used as the input treefile. Likelihood ratio test (LRT) with a Chi-square distribution was used to confirm the model fit. The Bayes Empirical Bayes (BEB) analysis was used to statistically identify selected sites with posterior probabilities ≥ 95%. The nucleotide diversity (Pi) of the CDSs of 79 proteincoding genes was also calculated using DnaSP v5.1 [81].

Phylogenetic analysis
Sixty-six species of Apiaceae were used to infer the phylogeny of Ligusticum, among which, two Bupleurum species served as the outgroups (Additional file 7: Table S5). Both concatenation and coalescent-based analyses were carried out. For the concatenation-based approach, two datasets were used to conduct the phylogenetic analysis: dataset-1 was the complete plastomes (excluding one inverted repeat region); dataset-2 encompassed the 76 common protein-coding sequences (CDSs) (Genes list used in the phylogenetic analyses was provided in Additional file 7: Table S5). The number of variable sites of the two datasets was calculated by MEGA6 [74]. To avoid duplicate regions increasing the phylogenetic signal, the second IR was removed from the first dataset. Sequence alignment was achieved using the MAFFT v7.221 [82] and ambiguously aligned areas were removed using Gblocks v0.91b [83] with the default setting. The nucleotide sequences of the 76 common CDSs were extracted and then concatenated into a supermatrix using Phylo-Suite v1.2.1 [84]. The maximum likelihood (ML) analysis was conducted in RAxML v8.2.8 [80] with 1000 bootstrap replicates and GTRGAMMA model. Bayesian inference (BI) was carried out using MrBayes v3.1.2 [85] with the best-fitting evolutionary model determined by Modeltest v3.7 [86]. The selected models for complete plastomes and 76 common CDSs in BI analyses were TVM + I + G and GTR + I + G, respectively. Markov chain Monte Carlo (MCMC) algorithm was run for 5,000,000 generations, with one tree sampled every 1000 generations. The MCMC reached stationarity when the average standard deviation of the split frequencies was less than 0.01. The initial 25% of the sampled data was discarded as burn-in, and the consensus tree was generated using the remaining trees.
Given that the variation level of different genes (Additional file 7: Table S5), and to provide the best estimate of the phylogeny of Ligusticum, we also undertook a coalescent-based analysis using ASTRAL v5.7.3 [87]. This approach inferred a species tree using individual gene trees. The gene trees were separately generated for 76 CDSs using RAxML v8.2.8 [80] with 500 bootstraps and GTRGAMMA model. The 76 RAxML best ML gene trees were used as input for ASTRAL v5.7.3 [87] to estimate a species tree with local posterior probability (LPP) [88].

Conclusions
In this study, we integrated 26 plastomes (including 14 newly sequenced plastomes) to perform molecular evolutionary analysis and phylogenetic reconstruction. These plastid genomes exhibited diverse plastome characteristics. The analyses of codon usage, RNA editing, dN/dS, and nucleotide variability (Pi), have demonstrated the conservation of the protein-coding genes in Ligusticum. The phylogenetic analyses obtained a more robust molecular phylogeny than prior studies and showed the nonmonophyly of Ligusticum containing four clades. Our results emphasized that the current circumscription of the Chinese Ligusticum should be reduced. Wider taxon sampling including related species of Ligusticum will be necessary to explore the phylogenetic relationships of Ligusticum. Overall, our study provided new insights into the phylogenetic relationships of Ligusticum and would serve as a framework for the taxonomy and delimitation studies of this genus.