Comparative plastome analysis of Blumea, with implications for genome evolution and phylogeny of Asteroideae

Abstract The genus Blumea (Asteroideae, Asteraceae) comprises about 100 species, including herbs, shrubs, and small trees. Previous studies have been unable to resolve taxonomic issues and the phylogeny of the genus Blumea due to the low polymorphism of molecular markers. Therefore, suitable polymorphic regions need to be identified. Here, we de novo assembled plastomes of the three Blumea species B. oxyodonta, B. tenella, and B. balsamifera and compared them with 26 other species of Asteroideae after correction of annotations. These species have quadripartite plastomes with similar gene content, genome organization, and inverted repeat contraction and expansion comprising 113 genes, including 80 protein‐coding, 29 transfer RNA, and 4 ribosomal RNA genes. The comparative analysis of codon usage, amino acid frequency, microsatellite repeats, oligonucleotide repeats, and transition and transversion substitutions has revealed high resemblance among the newly assembled species of Blumea. We identified 10 highly polymorphic regions with nucleotide diversity above 0.02, including rps16‐trnQ, ycf1, ndhF‐rpl32, petN‐psbM, and rpl32‐trnL, and they may be suitable for the development of robust, authentic, and cost‐effective markers for barcoding and inference of the phylogeny of the genus Blumea. Among these highly polymorphic regions, five regions also co‐occurred with oligonucleotide repeats and support use of repeats as a proxy for the identification of polymorphic loci. The phylogenetic analysis revealed a close relationship between Blumea and Pluchea within the tribe Inuleae. At tribe level, our phylogeny supports a sister relationship between Astereae and Anthemideae rooted as Gnaphalieae, Calenduleae, and Senecioneae. These results are contradictory to recent studies which reported a sister relationship between “Senecioneae and Anthemideae” and “Astereae and Gnaphalieae” or a sister relationship between Astereae and Gnaphalieae rooted as Calenduleae, Anthemideae, and then Senecioneae using nuclear genome sequences. The conflicting phylogenetic signals observed at the tribal level between plastidt and nuclear genome data require further investigation.

Certain regions of the plastome are more predisposed to mutations than others and show a differential evolution rate Iram et al., 2019). Polymorphism of the plastome is suitable for investigating population genetics, phylogenetics, and barcoding of plants (Ahmed, 2014;Shahzadi et al., 2020;Teshome et al., 2020). The plastome has a uniparental inheritance and appropriate polymorphism (Palmer, 1985). These properties make polymorphism of the plastome a suitable molecular marker for resolving taxonomic issues and infer the phylogeny of plants with high resolution (Daniell et al., 2016). Moreover, the plastome markers can be robust, authentic, and cost-effective (Ahmed et al., 2013;Nguyen et al., 2018). Therefore, several studies have focused on the identification of suitable polymorphic loci to resolve the taxonomic discrepancies of various plant lineages Iram et al., 2019;Shahzadi et al., 2020;Yang et al., 2019). Comparative analyses of whole plastomes not only provide in-depth insight into the evolution of the plastome but are also helpful in the identification of polymorphic loci (Abdullah, Henriquez, Mehmood, Shahzadi, et al., 2020;Henriquez et al., 2020b).
The family Asteraceae is among three megadiverse families. The number of estimated species of the family range from 25,000 to 35,000, and they comprise up to 10% of all the flowering plant species (Mandel et al., 2019). Species of the family Asteraceae exist in every type of habitat and every continent including Antarctica (Barreda et al., 2015;Mandel et al., 2019). This family is divided into 13 subfamilies (Mandel et al., 2019;Panero & Crozier, 2016;Panero et al., 2014).
The species of Asteroideae are also very diverse in distribution, similar to the family Asteraceae, and are distributed in America, Asia, Africa, Europe, Oceania, and the Pacific Island region (Mandel et al., 2019;Panero & Crozier, 2016). Watson et al. (2020) referred to the five tribes Senecioneae, Astereae, Anthemideae, Gnaphalieae, and Calenduleae as the Fab(ulous) Five. They are taxonomically difficult tribes, and conflicting phylogenetic signals were recorded for these tribes based on the plastid, nuclear, and transcriptomic data (for details see the discussion) (Fu et al., 2016;Mandel et al., 2019;Panero & Crozier, 2016;Panero et al., 2014;Watson et al., 2020).
The genus Blumea DC. belongs to the tribe Inuleae of the subfamily Asteroideae (Pornpongrungrueng et al., 2016). This genus contains 100 species of herbs, shrubs, and small trees (Pornpongrungrueng et al., 2009). The species are distributed throughout the Old World tropics (Pornpongrungrueng et al., 2009) and are most diverse in Australia, Africa, and Asia (Pornpongrungrueng et al., 2016).
Blumea balsamifera (L.) DC. is considered to be one of the most important medicinal species (Pang et al., 2014).
The genus Blumea is monophyletic if the genera Blumeopsis Gagnep.
and Merrittia Merr. are included (Pornpongrungrueng et al., 2007(Pornpongrungrueng et al., , 2009). The genus is divided into three clades by inferring phylogeny using plastid markers (trnL-F and trnH-psbA), the nuclear ribosomal internal transcribed spacer (ITS) region, and the 5S-NTS (nontranscribed spacer) (Pornpongrungrueng et al., 2009;Zhang et al., 2019). Certain discrepancies still exist at the intra-genus level, and several relationships are unresolved at the species level (Chen et al., 2009). Low bootstrapping support was observed for various nodes. Therefore, researchers have suggested identifying suitable polymorphic loci to elucidate the phylogenetic relationships (Pornpongrungrueng et al., 2007).
Here, we are interested in (a) providing new insight into the plastome of the genus Blumea and performing comparative plastid genomics with other species of the subfamily Asteroideae; (b) reconstructing the phylogeny within the subfamily Asteroideae; (c) identifying suitable polymorphic loci for the phylogenetic inference of the genus Blumea; (d) elucidating the role of repeats as a proxy for the identification of mutational hot spots.

| Genome assembly and coverage depth analysis
We downloaded the short reads of all three Blumea species from the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) available under project numbers PRJNA438407 and PRJNA522689. The whole genome shotgun data of B. oxyodonta and B. balsamifera were generated using BGISEQ-500 with short reads of 100 bp from a single end (Liu et al., 2019), whereas the genome of B. tenella was sequenced using Illumina True-Seq with short reads of 250 bp. Data accession numbers and quantity are presented in Table 1. To assemble the plastome, we first used the BWA-MEM algorithm with default parameters (Li & Durbin, 2009) and mapped all reads for each species to Aster hersileoides C.K. Schneid., Tagetes erecta L., Ambrosia trifida L., Bidens torta Sherff, and Artemisia ordosica Krasch. This approach was used to avoid contamination of the nuclear and mitochondrial origin reads. The plastomes were de novo assembled by Velvet v.1.2.10 (Zerbino & Birney, 2008) integrated in Geneious R8.1 (Kearse et al., 2012) using extracted reads. All generated contigs of Velvet were combined in a specific order using the de novo assembly option of Geneious R8.1 (Kearse et al., 2012). This helped us to assemble the plastome of the Blumea species in a single contig from total contigs generated by Velvet. The single contig of the Blumea species was assembled from 5-10 long contigs without any gap. The LSC, IR, and SSC regions were defined based on the manual inspection of scaffolds. The integrity of the assembled plastome was validated, and coverage depth analysis was performed by mapping the original reads of each genome to their respective genome using BWA-MEM (Li & Durbin, 2009). We did not predict any gap in the assembled plastome, which further confirmed the high quality of the assembled plastome. However, through mapping of short reads we avoid some nucleotide mismatches in the genomes.
The circular map of the Blumea plastome was drawn using Chloroplot to show genome organization and gene content (Zheng et al., 2020).

| Reannotations of plastomes and comparative genomics among the species of the subfamily Asteroideae
For comparative genomics among species of the subfamily Asteroideae with the species of the genus Blumea, we retrieved 26 other species of the subfamily Asteroideae from the NCBI (Table 2). These species were selected to cover all the main tribes of Asteroideae, including Anthemideae, Astereae, Gnaphalieae, the Heliantheae alliance, and Senecioneae, whereas the species of Blumea cover the tribe Inuleae. For a good comparison, we first reannotated all the 26 species using an approach similar to that used for the annotation of Blumea species as previous studies showed certain errors exist in the annotation of the plastome available in a public database (Amiryousefi et al., 2018a). After that, we compared genome features including the size of the complete plastome, LSC, SSC, and IR along with the gene and intron content. The arrangement of genes in the genomes was determined using Mauve (Darling et al., 2004), whereas contraction and expansion of IRs were visualized using IRscope (Amiryousefi et al., 2018b).

| Comparison of codon usage, amino acid frequency, microsatellites, and oligonucleotide repeats among the species of Blumea
The amino acid frequency and relative synonymous codon usage (RSCU) were determined among the Blumea species using Geneious R8.1. Microsatellite repeats were determined in Blumea plastomes using MISA-web (Beier et al., 2017) with the following parameters: ≥10 for mononucleotide, ≥5 for dinucleotide, ≥4 for trinucleotide, and ≥3 for tetra-, penta-, and hexanucleotide repeats. REPuter (Kurtz et al., 2001) was used to determine oligonucleotide repeats with a minimum size of ≥30 bp and at least 90% similarity. The maximum repeat count was set to 500.

| Identification of polymorphic loci in the genus Blumea
Multiple alignment was formed among plastomes of all the Blumea species, after removal of IRa, using MAFFT (multiple alignments using fast Fourier transform) (Katoh & Standley, 2013). First, we manually checked for small inversions and removed them from the alignment to avoid false results. The intergenic spacer regions, intronic regions, and protein-coding regions were extracted from the alignment in Geneious R8.1 (Kearse et al., 2012) and visualized in DnaSP v.6 to determine the nucleotide diversity of each region (Rozas et al., 2017). The rates of transition and transversion substitutions were also determined from the pairwise alignment of MAFFT TA B L E 1 Accession numbers, quantity of raw data, and coverage depth of de novo assembled plastomes Species SRA accession no.

| Phylogeny in the subfamily Asteroideae
We retrieved 95 species of the subfamily Asteroideae to include species of all the seven main tribes from the NCBI (Table S1) and used them in the inference of phylogeny along with the three species of Blumea. Sonchus acaulis Dum. Cours. was also retrieved from the NCBI and used as an out-group from the sub-  Kalyaanamoorthy et al., 2017;Nguyen et al., 2015). The IQ-TREE2 analyses were based on searching for the best maximum likelihood (ML) tree under the edge-unlinked partition model (Lopez et al., 2002) and a nonpartitioned dataset with general heterogeneous evolution on a single topology (GHOST) (Crotty et al., 2020).
Branch supports values were estimated using the ultrafast bootstrap approximation approach with 10,000 bootstrap replicates (Hoang et al., 2018). The tree structure was drawn and improved by using Interactive Tree Of Life version 4 (Letunic & Bork, 2019).

| Correction of annotations
The initial annotations of plastomes of the subfamily Asteroideae  (Table S2).

| Plastome organization and features of the genus Blumea and the subfamily Asteroideae
The plastomes were obtained with high average coverage depth ranging from 98× to 130× ( The plastome of every species contained 113 unique genes, including 29 tRNA, 4 ribosomal RNA (rRNA), and 80 protein-coding genes.
The guanine-cytosine (GC) content showed high similarities among the three species throughout the plastome structure and genetic features ( Table 2). The ycf1 gene also left a truncated copy at the junction of IR/SSC along with a functional copy. The genetic organization of the plastomes in Blumea species was shown as a circular map using Chloroplot (Figure 1).
The species of Asteroideae included in the analysis showed simi-

| Relative synonymous codon usage and amino acid frequency
RSCU and amino acid frequency revealed high similarities among species of Blumea. RSCU analysis showed high encoding efficacy of the codon that contained A/T at 3′ with an RSCU ≥1 compared with codons ending with C/G at 3′ with an RSCU <1 (Table S3). We found leucine to be the most frequent amino acid, whereas cysteine was the rarest (Table S4).

| Analysis of substitutions and indels
We analyzed the substitution types of the complete plastome and the extent of substitutions and indels in three central plastome regions.
We observed a great extent of transversion substitutions relative to transition substitutions and found a transition-to-transversion substitution ratio of 0.84-0.87 (Table 3). Most of the substitutions were found in the LSC, followed by SSC and IR (Table 3). Similar distributions were observed for indels. Most indels existed in LSC, followed by SSC and IR (Table 4).

| Analysis of microsatellites and oligonucleotide repeats
We found 55-60 microsatellites among three Blumea species. Most of the microsatellites were made up of A/T motifs. Most of the repeats were located in LSC instead of IR and SSC regions (Figure 4a).
The mononucleotide microsatellites were most abundant, followed by tetranucleotides (Figure 4b, Most of the repeats were present in LSC, followed by IR, regions. Moreover, some repeats were also shared between the LSC, SSC, and IR regions (Figure 4c). Forward repeats showed an abundance relative to other types of repeats (Figure 4d). The intergenic spacer regions contained more repeats than the intronic and coding regions ( Figure 4e). Most of the repeats were between 30 bp and 34 bp in size (Figure 4f). Details about oligonucleotide repeats are provided in Table S6.

| Identification of polymorphic loci
We recorded the highest average polymorphism for intergenic spacer regions (0.0121) as compared with intronic regions (0.0096) or protein-coding sequences (0.0047). The polymorphism of all regions is shown in Figure 5. We ignored loci <200 bp and selected 10 polymorphic regions with nucleotide diversity >0.02, of which 6 belonged to intergenic spacer regions, 1 to intronic, 1 to proteincoding, and 1 to both intergenic spacer and coding regions. The 730 bp region of ycf1 was selected instead of the complete gene using the oligonucleotide repeat as a proxy. The chosen part showed a nucleotide diversity of 0.0252 and contained 28 substitutions events with zero missing data. A similar approach was used for ndhF-rpl32, selecting an 841 bp region, which had a nucleotide diversity F I G U R E 1 Circular map of plastomes. The color of genes indicates their function. The genes present outside the circle are transcribed counterclockwise, whereas the genes present inside the circle are transcribed clockwise. The gene content and organization are similar for all species; therefore, one figure was drawn as representative of all three species F I G U R E 2 Mauve alignment represents organization of the plastome based on collinear blocks. The figure represents high similarity in all 29 plastomes of Asteroideae while the inversion of the small single copy is also visible from the green block. The small blocks of various colors represent genes. Black = transfer RNA (tRNA); red = ribosomal RNA; white = proteincoding; green = intron-containing tRNA of 0.0206 and contained 26 substitutions. The selected regions may act as suitable and cost-effective markers (Table 5).

| Phylogenetic inference of the species of the genus Blumea with 95 other species
The species of the genus Blumea lay on the same node and share a node with a high bootstrapping support of 100 with Pluchea indica. Our result was based on sequences of the complete plastome and showed the placement of Blumea in the tribe Inuleae. Moreover, the phylogenetic relationship was also described between the seven tribes of the subfamily Asteroideae ( Figure 6). Our phylogenetic inference showed that the Heliantheae alliance is the most recently diverged tribe of the subfamily Asteroideae, which forms a common node with the tribe Inuleae. The tribe Astereae was closely related to Anthemideae, whereas Gnaphalieae forms the first branching node of these two tribes. The tribe Calenduleae roots Gnaphalieae, which is rooted finally by Senecioneae. Hence, the species of the tribe Senecioneae lie in the first branching node of the subfamily Asteroideae in our phylogeny.

| D ISCUSS I ON
We de novo assembled the plastomes of three Blumea species and compared them with 26 other Asteroideae species. We provided insight into plastome structure, IR contraction and expansion, and suitable polymorphic loci.

| Plastome comparison of Blumea and Asteroideae
The   certain errors in annotations as stated previously in a detailed study of the family Solanaceae (Amiryousefi et al., 2018a) and the comparative genomics of the two species of Malvaceae . The gene content was found to be the same in all the species of Asteroideae after correction of annotations, which is also in agreement with gene features in the previously reported plastome of other subfamilies such as Cichorioideae, Pertyoideae, and Carduoideae (Jung et al., 2021;Kim et al., 2019;Lin et al., 2019).

Indel average length
Hence, these data showed the highly conservative plastomes of Asteraceae. If we considered the previously reported annotations accurately without repeating the annotations, large variations were evident in the analyzed species. Hence, based on these observations, together with a previous report on the Solanaceae family (Amiryousefi et al., 2018a), we suggest a correction of annotations before comparative genomics to ensure accurate data regarding the plastome structure, genetic content, intron content, and for various other analyses related to the study of plastome evolution.
The expansion and contraction of IRs showed much similarity among the species of the subfamily Asteroideae. This result agrees with previous studies of other angiosperms such as Malvaceae  and Solanaceae (Amiryousefi et al., 2018a). The pseudogene of ycf1 originated at the junction of IR. The origination of the pseudogene due to IR expansion and contraction is expected and observed in other angiosperms (Abdullah, Henriquez, Mehmood, Carlsen, et al., 2020;Iram et al., 2019;. Previous studies have   Mehmood, Carlsen, et al., 2020;Abdullah, Henriquez, Mehmood, Shahzadi, et al., 2020). The expansion and contraction of IRs also led to duplication of single-copy genes (genes that travel from SSC or LSC to IRs become duplicated) or conversion of otherwise duplicated genes to single-copy genes (genes that move from IRs to LSC or SSC become single copy) (Zhu et al., 2016). This traveling of the gene also affects the rate of mutations; mostly, the genes that travel from LSC or SSC to IRs showed a low rate of evolution or vice versa (Abdullah, Henriquez, Mehmood, Carlsen, et al., 2020;Zhu et al., 2016). The high similarities in junctions also indicate the presence of the same genes in all the species, and the total number of genes does not vary due to IR expansion and contraction.

| Repeat analysis and utilization of oligonucleotide repeats as proxy to identify polymorphic loci
Microsatellites are very important for the study of population genetics. Besides hexanucleotide, we detected mononucleotide, dinucleotide, trinucleotide, tetranucleotide, and pentanucleotide microsatellites in which mononucleotide repeats were abundant, followed by tetranucleotide repeats. A similar pattern of repeats was observed in other Asteraceae (Sablok et al., 2019). Most of the nucleotides were made up of A/T motifs instead of C/G motifs. This might be due to the A/T-rich plastome structure, as observed in other angiosperms (Iram et al., 2019;. The identified microsatellites in the current study may be helpful in population genetic studies of Blumea. Oligonucleotide repeats exist widely in the plastome Ahmed et al., 2012Ahmed et al., , 2013. These repeats play a role in generating mutations and have been suggested as a proxy to identify mutational hotspots Ahmed et al., 2012).  recently reported the co-occurrence of up to 90% of repeats with substitutions, whereas 36%-91% co-occurrence was recorded at the genus level. In the current study, we identified 10 highly polymorphic loci. Among these, five loci belong to the regions where repeats are present, including rps16-trnQ and ycf1, which showed the highest incidence of polymorphisms. Here, our findings support the use of repeats as a proxy, and this approach may also be helpful for the identification of suitable polymorphic loci for phylogenetic inference of other taxonomically complex genera. This approach is promising since the plastome of a single species can be used to identify polymorphic regions. Repeated coding regions and IR regions need to be avoided, however, due to the purifying selection pressure of protein-coding genes (Henriquez et al., 2020b) and the fact that copy-dependent repair mechanisms (Zhu et al., 2016) lead to low rates of mutation.

| Suitable polymorphic loci for resolving phylogenetic discrepancies of Blumea
Regions of the plastome showed different polymorphisms, and certain regions are more predisposed to mutations Henriquez et al., 2020b;Shahzadi et al., 2020). All regions are therefore not equally important for phylogenetic inference and barcoding of plant species (Daniell et al., 2016;Li et al., 2014). Pornpongrungrueng et al. (2007)  Hence, to resolve the phylogeny, the identification of suitable polymorphic loci was suggested by the authors. The complete plastome was suggested for barcoding and phylogenetic inference (Li et al., 2014), but the high cost hinders its use. Therefore, identifying species-specific suitable polymorphic loci can provide a quality resource for the phylogenetic inference and barcoding of plant species (Ahmed et al., 2013;Li et al., 2014). Here, we identified 10 polymorphic loci that are different and more polymorphic than previously employed loci (trnH-psbA and trnL-F) (Pornpongrungrueng et al., 2007;Zhang et al., 2019) of the plastome. Our identified loci also included intergenic spacer regions along with coding and intronic regions. Our approach agrees with recent studies in which intergenic spacer regions were also suggested at the low taxonomic level for phylogenetic inference in Bignonieae of the family Bignoniaceae and the genus Artemisia of F I G U R E 6 Phylogenetic inference among 98 species belonging to 7 tribes of the subfamily Asteroideae using Sonchus acaulis as outgroup. Species of each tribe are shown by different color for clarity. The bootstrapping value equal to 100 is omitted from each node and is not shown Asteraceae Thode et al., 2020). Our identified polymorphic loci showed low polymorphism compared with ITS regions. Still, the generation of no/low missing data of our identified polymorphic loci relative to the ITS makes these loci appropriate for phylogenetic inference and barcoding of Blumea species. Moreover, the ITS regions show low amplification success during polymerase chain reaction and possible contamination with fungus sequences, rendering the downstream processes costly and time-consuming (Li et al., 2014). The species-specific markers of the plastome show high amplification success and are not contaminated with fungus sequences, being found to be authentic, robust, and cost-effective in recent studies Abdullah, Henriquez, Mehmood, et al., 2021;Ahmed, 2014;Ahmed et al., 2013;Li et al., 2014;Nguyen et al., 2018). Hence, our identified polymorphic loci may also be authentic, robust, and cost-effective for the barcoding and phylogenetic inference of the genus Blumea.

| Conflicting signals in the phylogeny of Asteroideae
Phylogenetic analysis of 98 species of the subfamily Asteroideae based on the complete plastome shows that the genus Blumea lies in the tribe Inuleae and is closely related to the genus Pluchea. The same relationship was previously observed based on the ndhF gene of plastid (Anderberg et al., 2005). The tribal-level phylogenetic relationship of our study is similar to the previous studies of the subfamily Asteroideae reported based on plastome sequences (Fu et al., 2016;Panero & Crozier, 2016). However, similar to previous studies, the phylogenetic analysis in our recent study was based on the complete plastome, which also conflicts with the phylogenetic inference of the family performed based on the nuclear genome (Mandel et al., 2019). Mandel et al. (2019) stated a sister relationship between "Senecioneae and Anthemideae" and "Astereae and Gnaphalieae." However, our study and previous reports (Fu et al., 2016;Panero & Crozier, 2016;Watson et al., 2020) based on plastome data show the sister relationship between "Astereae and Anthemideae," while Gnaphalieae roots these two tribes which is rooted by Calenduleae, whereas Senecioneae presents at the base of Asteroideae. A recent phylogeny based on nuclear data showed a sister relationship between Astereae and Gnaphalieae rooted as Calenduleae, Anthemideae, and then Senecioneae (Watson et al., 2020). These results of nuclear phylogeny are also similar to the previous report (Huang et al., 2016) (Panero & Crozier, 2016). Hence, the accelerated rate of diversification may be responsible for the conflicting signals due to the loss of some of the earliest lineages of the Asteroideae (Watson et al., 2020). Vargas et al. (2017) also observed conflicting signals for the data set of nuclear, plastome, and mitochondrial genomes in Diplostephium and in aligned genera of Astereae. They provided evidence for reticulate evolution in events of rapid diversification in the analyzed species of Astereae and suggested that the phylogeny based on plastome and mitochondria sequences contradict with nuclear due to uniparental inheritance of these genomes. In the current study, the conflicting signal among the aforementioned tribes may also be due to reticulate evolution in events of rapid diversification. Moreover, the uniparental inheritance of the plastome may also confound phylogenetic inference, which might require further investigation.
In conclusion, our study provides insight into plastome structure evolution of the genus Blumea and the subfamily Asteroideae. The identified polymorphic loci were linked to the location of oligonucleotide repeats and confirm the role of repeats as a proxy for the identification of polymorphic loci. The 10 identified loci may facilitate barcoding and phylogenetic inference of the genus Blumea.
However, some practical validation may be required of the identified loci. Our study shows the conflicting signals between plastome and nuclear phylogeny at tribal levels, which also requires further investigation.

ACK N OWLED G M ENTS
We These figures are available without copyright, and no permission is required for use.

CO N FLI C T O F I NTE R E S T
The authors have no conflicts of interest to declare.  Tables 1 and S1. The detailed methodology is mentioned in the manuscript, which makes the study reproducible.