Complete Mitogenome and Phylogenetic Analysis of the Carthamus tinctorius L.

Carthamus tinctorius L. 1753 (Asteraceae), also called safflower, is a cash crop with both edible and medical properties. We analyzed and reported the safflower mitogenome based on combined short and long reads obtained from Illumina and Pacbio platforms, respectively. This safflower mitogenome mainly contained two circular chromosomes, with a total length of 321,872 bp, and encoded 55 unique genes, including 34 protein-coding genes (PCGs), 3 rRNA genes, and 18 tRNA genes. The total length of repeat sequences greater than 30 bp was 24,953 bp, accounting for 7.75% of the whole mitogenome. Furthermore, we characterized the RNA editing sites of protein-coding genes located in the safflower mitogenome, and the total number of RNA editing sites was 504. Then, we revealed partial sequence transfer events between plastid and mitochondria, in which one plastid-derived gene (psaB) remained intact in the mitogenome. Despite extensive arrangement events among the three mitogenomes of C. tinctorius, Arctium lappa, and Saussurea costus, the constructed phylogenetic tree based on mitogenome PCGs showed that C. tinctorius has a closer relationship with three Cardueae species, A. lappa, A. tomentosum, and S. costus, which is similar to the phylogeny constructed from the PCGs of plastid genomes. This mitogenome not only enriches the genetic information of safflower but also will be useful in the phylogeny and evolution study of the Asteraceae.


Introduction
As the center of plant energy metabolism, mitochondria are involved in many aspects of plant growth and development, such as respiration, programmed cell death, and even cytoplasmic male sterility [1]. Based on the endosymbiotic theory, researchers generally believe that plant mitochondria evolved from free-living bacteria [2]. However, compared with animal mitogenomes, plant mitogenomes are more variable and show rich diversity in genome size, structure, gene content, and evolutionary rate [3]. Plant mitogenomes exist not only in ring form but also in other different forms [1]. Compared with microbial and animal mitogenomes, plant mitogenomes are larger in size and more complex in structure due to the migration of large amounts of exogenous DNA and the recombination of repetitive sequences [4][5][6]. The mitogenome has a low synonymous substitution rate compared with the plant nuclear and chloroplast genomes, and many homologous fragments in the plant mitogenomes are derived from the chloroplast and nucleus [7]. For example, in Cucumis melo (2.74 Mb) [8] and Larix sibirica (11.7 Mb) [9], which had the large plant mitogenome, many DNA homologous sequences originate from both the nuclear and chloroplast genomes.
Horizontal gene transfer (HGT) is a relatively rare phenomenon across plant evolution but has been shown to occur in some plant species [10]. Through HGT, plants can acquire new genes or functional modules that enhance their ability to adapt to their environment,

Assembly and Annotation of Mitogenome
We filtered the short reads using fastp [23] to obtain clean short reads. We used GetOrganelle v.1.7.5 [24] to assemble the chloroplast genome of safflower from clean short reads with the following parameters: '-R 15 -k 21,45,65,85,105 -F embplant_pt'. For the mitogenome, we used GetOrganelle v.1.7.5 [24] for de novo assembly and then removed chloroplast genome fragments to obtain a draft mitogenome with the specific parameters '-R50 -k 21,45,65,85,105,127 -F embplant_mt'. Then, minimap2 [25] was used to map the draft mitogenome to the PacBio SMRT reads (~380 GB), the long sequence fragments of mitogenome were extended using samtools v. 1.7 [26], and Canu v. 2.2 [27] was used for the correction and assembly of the long reads. Subsequently, the genomic sequence was further assembled using Flye v.2.9 [28] based on the corrected sequences from Canu v. 2.2 [27]. We evaluated and complemented the two results. Finally, two large overlapping groups were assembled into the mitogenome and formed a mitogenome with two rings.

Collinear Analysis and Rearrangement Analysis of Mitogenomes between Safflower and Two Other Asteraceae Species
To compare the similarity and collinearity between the mitogenomes of safflower and other species in Asteraceae, we performed a collinear analysis. We used the Asteraceae mitogenomes of safflower, Arctium lappa and Saussurea costus, for collinear analysis in NGenomeSyn v.1.4.1 [32] (https://github.com/hewm2008/NGenomeSyn, accessed on 20 April 2023). Then, we used mauve v.2.4.0 software [33] for genomic rearrangement analysis.

Analysis of Codon Usage Bias Patterns for Protein-Coding Genes (PCGs)
Relative synonymous codon usage (RSCU) and effective codon number (ENC) can be used to characterize the pattern of codon usage preference. Among them, ENC was also used to detect the effect of the base composition of genes on codon usage preference. Therefore, we used a cloud platform (http://112.86.217.82:9919/#/tool/alltool/detail/214, accessed on 20 April 2023) to analyze the RSCU and ENC for mitogenomic PCGs [34].

Identification of RNA Editing Sites
To further count the RNA editing sites of PCGs in the safflower mitogenome, we used an online cloud platform (http://112.86.217.82:9919/#/tool/alltool/detail/336, accessed on 20 April 2023) to predict the possible RNA editing sites in the safflower mitochondrial genome, setting a threshold value of C = 0.2. A lower threshold value will predict more complete RNA editing sites but may also increase the probability of misidentification.

Intracellular Gene Transfer Analysis
To identify potential horizontal transfer events that occurred between the chloroplast and mitogenome of safflower, we compared the safflower chloroplast genome (MK983238.1) and mitogenome (OQ621745 and OQ621746) using BLASTn program. The potential transferred DNA sequences were extracted by their genomic position and analyzed and further visualized using Circoletto v.07.09.16 (http://tools.bat.infspire.org/tools/circoletto/, accessed on 20 April 2023).

Phylogenetic Analysis
To understand the phylogeny of safflower in the plant kingdom, a total of 26 complete mitochondrial genomes from different plants were obtained from the GenBank database with Arabidopsis thaliana as the outgroup. We used PhyloSuite v. 1.2.2 [35] to extract common PCGs from both the mitogenomes and chloroplast genomes of these species. Each PCG sequence was then aligned using MAFFT v. 7.4 with default parameters [36]. After the manual corrections, the aligned sequences for each species were further concatenated using PhyloSuite v. 1.2.2 [35]. Based on the matrix of concatenated sequences, the maximumlikelihood (ML) tree was constructed using IQ-TREE v. 2.1.2 [37], and modelfinder was used to find the best-fit model as TVM+F+R2. For the Bayesian inferred (BI) tree, we used PhyloSuite v. 1.2.2 [35] to find the best-fit model as GTR and then used MrBayes v. 3.2.6 software to perform the analysis with the following parameters: ngen = 1,000,000, nruns = 2, nchains = 4, samplefreq = 1000, nst = 6, and rates = invgamma. Then, the first 25% of the trees were discarded as burn-in. Tree visualization was achieved in Figtree v. 1.4.3 (https://github.com/rambaut/figtree/releases, accessed on 20 April 2023). Finally, the support and posterior probabilities on the species branch nodes are manually integrated according to the topology of the ML and BI trees.
The total size of the 34 PCGs was 32,352 bp, accounting for 10.05% of the safflower mitogenome. Each PCG varied from 261 bp (atp9) to 2253 bp (psaB) in length. Except for the mttB and nad3 genes using TTG as the starting codon, other PCGs used ATG as the starting codon. Moreover, eight genes having an intron structure were discovered, and they were nad1, nad2, nad4, nad5, nad7, ccmFC, cox2, and rps3. Furthermore, three trans-spliced introns were observed in nad1, nad2, and nad5 in different chromosomal coding regions. In addition, most of the total identified 18 tRNA genes had only one copy, except for trnQ-UUG with two copies, as well as both trnM-CAU and trnS-GCU with three copies. Furthermore, a total of 241 open reading frames (ORFs), accounting for 35.29% (113,592 bp) of the safflower mitogenome, were identified with the sequence length greater than 100 using Geneious Prime. Most of these function-unknown ORFs had one copy with a variety of length from 300 to 2757 bp, among which 14 ORFs were greater than 1000 bp.
In this study, our reported safflower mitogenome was similar to most Asteraceae plants in terms of the sequences of coding and noncoding regions. We further comparatively analyzed the basic features of the mitogenomes of safflower and other Asteraceae species and found that safflower differed from other closely related species in the presence of a nearly complete psaB gene in the safflower mitogenome. The psaB gene is mainly present in the chloroplast genome, and the existence in the safflower mitogenome may be related to the adaptive evolution of safflower. In addition, eight intron-containing genes were also identified in the safflower mitogenome, including ccmFC, cox2, nad1, nad2, nad5, nad7, and rps3. Meanwhile, the positions and splicing styles of introns in the eight genes showed high similarity to those in the other Asteraceae mitogenomes [38][39][40].
starting codon. Moreover, eight genes having an intron structure were discovered, and they were nad1, nad2, nad4, nad5, nad7, ccmFC, cox2, and rps3. Furthermore, three transspliced introns were observed in nad1, nad2, and nad5 in different chromosomal coding regions. In addition, most of the total identified 18 tRNA genes had only one copy, except for trnQ-UUG with two copies, as well as both trnM-CAU and trnS-GCU with three copies. Furthermore, a total of 241 open reading frames (ORFs), accounting for 35.29% (113,592 bp) of the safflower mitogenome, were identified with the sequence length greater than 100 using Geneious Prime. Most of these function-unknown ORFs had one copy with a variety of length from 300 to 2757 bp, among which 14 ORFs were greater than 1000 bp.
In this study, our reported safflower mitogenome was similar to most Asteraceae plants in terms of the sequences of coding and noncoding regions. We further comparatively analyzed the basic features of the mitogenomes of safflower and other Asteraceae species and found that safflower differed from other closely related species in the presence of a nearly complete psaB gene in the safflower mitogenome. The psaB gene is mainly present in the chloroplast genome, and the existence in the safflower mitogenome may be related to the adaptive evolution of safflower. In addition, eight intron-containing genes were also identified in the safflower mitogenome, including ccmFC, cox2, nad1, nad2, nad5, nad7, and rps3. Meanwhile, the positions and splicing styles of introns in the eight genes showed high similarity to those in the other Asteraceae mitogenomes [38][39][40].    Table 2. Functional annotation of genes in C. tinctorius mitogenome.

Repeats, SSRs, and Collinear Analysis in the C. tinctorius Mitogenome
Repeats are an important data source for developing molecular markers for population analysis. They include tandem, short, and large repeats and are extensively present in the mitochondrial genomes [41][42][43]. Mitogenome repeats are often critical for intermolecular recombination, which can alter the genome size and generate genomic structural variation [3,44]. A total of 47 repeats were identified in the safflower mitogenome, ranging from 31 to 15,273 bp in length (Figure 2a; Table S1). Two types of repeats, including palindromic and forward repeats, were mainly detected in the safflower mitogenome. Of them, three large repeats (R1-R3) were greater than 1000 bp, while three relatively small repeats (R4-R6) varied from 104 to 734 bp. Most large repeats contained two copies of each repeat. In addition, we found a total of 137 SSRs presented in the safflower mitogenome, with 22 mononucleotides, 36 dinucleotides, 13 trinucleotides, 53 tetranucleotides, 9 pentanucleotides, and 4 hexanucleotides, with the largest proportion of tetranucleotides up to 38.69% (Figure 2b). In addition, according to the collinearity analysis result (Figures 3  and S1), there were obvious intragenomic rearrangement events among the different mitogenomes. There was no obvious synteny observed between the mitogenomes of C. tinctorius, A. lappa, and S. costus.
In total, these repeats accounted for 7.75% of the safflower mitogenome. In genomes, repetitive sequences are common, and duplications can often lead to changes in the size of the genome. Especially in angiosperms, it is commonly used to explain the diversity of the mitochondrial genome size and structure [5]. However, these large repeats (>1000 bp) account for only 6.70% (21,579 bp) of the safflower mitogenome, which may not be the major cause of the expanded mitogenome size in safflower. Rearrangements of plant mitogenomes were relatively common, and the mitogenome structure was not as stable as that of the chloroplast genome probably due to the abundance of repetitive sequences and the complex genome structure in mitogenomes. the mitochondrial genome size and structure [5]. However, these large repeats (>1000 bp) account for only 6.70% (21,579 bp) of the safflower mitogenome, which may not be the major cause of the expanded mitogenome size in safflower. Rearrangements of plant mitogenomes were relatively common, and the mitogenome structure was not as stable as that of the chloroplast genome probably due to the abundance of repetitive sequences and the complex genome structure in mitogenomes.

Codon Usage Bias Patterns and RNA Editing Sites Analysis for Protein-Coding Genes (PCGs)
To investigate the codon preferences of protein-coding genes in the safflower mitogenome, we performed ENC-plot, PR2-plot, and RSCU analyses. ENC-plot analysis provides a visual indication of the extent to which codon usage patterns and preference formation are affected by natural mutation and selection. Theoretical ENC values vary from 20 to 61, with values closer to 20 indicating that the gene is less influenced by natural selection and vice versa [34]. The results of the ENC-plot analysis of the safflower mitogenome showed that most of the genes were some distance away from the standard curve, and the actual ENC values were somewhat different from the expected values, indicating that the effect of natural selection on the codons was greater. However, some genes were located around the standard curve, and the actual ENC values were basically similar to the expected values, indicating that mutations had a greater effect on these codons ( Figure S2a, Table S2). If the genes within the PR2-plot are evenly distributed, it indicates that the A, T, C, and G of each base in the codon are used with the same

Codon Usage Bias Patterns and RNA Editing Sites Analysis for Protein-Coding Genes (PCGs)
To investigate the codon preferences of protein-coding genes in the safflower mitogenome, we performed ENC-plot, PR2-plot, and RSCU analyses. ENC-plot analysis provides a visual indication of the extent to which codon usage patterns and preference formation are affected by natural mutation and selection. Theoretical ENC values vary from 20 to 61, with values closer to 20 indicating that the gene is less influenced by natural selection and vice versa [34]. The results of the ENC-plot analysis of the safflower mitogenome showed that most of the genes were some distance away from the standard curve, and the actual ENC values were somewhat different from the expected values, indicating that the effect of natural selection on the codons was greater. However, some genes were located around the standard curve, and the actual ENC values were basically similar to the expected values, indicating that mutations had a greater effect on these codons ( Figure S2a, Table S2). If the genes within the PR2-plot are evenly distributed, it indicates that the A, T, C, and G of each base in the codon are used with the same frequency. The results of the PR2-plot analysis of the safflower mitogenome showed an irregular distribution of mitochondrial genes, most of which were distributed in the lower part of the plot ( Figure S2b, Table S2). The different frequencies of G and C, A, and T suggested that the bias in the use of codons in the safflower mitogenome may be influenced by several factors.
After , and Tyr (UAC))) were higher than 1.0. This phenomenon was also commonly found in the mitochondria of most Asteraceae species. Of the 10,784 codons, 6660 codons (61.76%) had RSCU values above 1.0, indicating that they were used more frequently than other codons (Figure 4a; Table S3). The codon usage frequencies of mitogenomes of four Cardueae species (C. tinctorius, A. lappa, A. tomentosum, and S. costus) suggested that the number of codons with RSCU over 1 in these four species ranged from 5960 to 5970 (Table S4). The most commonly used codons in these four Cardueae plants were GCU (Ala), UAU (Tyr), and CAU (His), all of which had RSCU values over 1.5. RNA editing is a widespread phenomenon in mitochondria, chloroplast, and three RNA-coding genes (mRNA, rRNA, and tRNA) of the nucleus. RNA editing events involve in the additional insertion of bases or substitutions of a single base, resulting in the occurrence of the start or stop codons [45]. Here, we performed further detection of RNA editing sites in 34 mitochondrial PCGs, of which 31 mitochondrial PCGs had RNA editing sites (Figure 4b; Table S5). In the safflower mitochondrial PCG, we identified a total of 504 RNA editing sites, of which the genes with more than 30 RNA editing sites were ccmB, ccmC, ccmFN, mttB, and nad7. In addition, 14 kinds of RNA editing sites were also detected from PCGs in the safflower mitochondrial genome. As for editing efficiency, we found that these mitochondrial PCGs with editing efficiencies higher than 80% were predominated (Table S5).

Intracellular Gene Transfer of C. tinctorius Organelle Genomes
The foreign DNA insertion into the mitogenome, including intracellular gene transfer (IGT) of nuclear and chloroplast DNA or the horizontal gene transfer (HGT) of other mitochondrial DNA, is a frequent phenomenon in the plant kingdom. Both IGT and HGT events contribute to the genomic expansion of mitochondria, eventually resulting in the adaptive evolution of mitochondria. Here, to explore the IGT occurrence between the safflower chloroplast and mitochondria, we identified chloroplast-derived sequences in the safflower mitogenome using BLASTn. In total, 26 similar sequence fragments were identified, all of which are located on chromosome 1, which had high nucleotide sequence identity with the corresponding chloroplast sequence of safflower (Figure 5a). The total length of these chloroplast-derived fragments was 11,156 bp, occupying a proportion of

Intracellular Gene Transfer of C. tinctorius Organelle Genomes
The foreign DNA insertion into the mitogenome, including intracellular gene transfer (IGT) of nuclear and chloroplast DNA or the horizontal gene transfer (HGT) of other mitochondrial DNA, is a frequent phenomenon in the plant kingdom. Both IGT and HGT events contribute to the genomic expansion of mitochondria, eventually resulting in the adaptive evolution of mitochondria. Here, to explore the IGT occurrence between the safflower chloroplast and mitochondria, we identified chloroplast-derived sequences in the safflower mitogenome using BLASTn. In total, 26 similar sequence fragments were identified, all of which are located on chromosome 1, which had high nucleotide sequence identity with the corresponding chloroplast sequence of safflower (Figure 5a). The total length of these chloroplast-derived fragments was 11,156 bp, occupying a proportion of 3.47% of the safflower mitogenome. The length of chloroplast-derived fragments varied from 28 to 2556 bp (Table S5). Meanwhile, some chloroplast-derived genes, such as the psaB and tRNA-related genes, mainly functioned in the processes of photosynthesis, transcription, and translation. These results suggested that extensive IGT events occurred between the safflower mitogenome and chloroplast genome.
3.47% of the safflower mitogenome. The length of chloroplast-derived fragments varied from 28 to 2556 bp (Table S5). Meanwhile, some chloroplast-derived genes, such as the psaB and tRNA-related genes, mainly functioned in the processes of photosynthesis, transcription, and translation. These results suggested that extensive IGT events occurred between the safflower mitogenome and chloroplast genome.
The IGT events between organelle genomes were relatively common in most terrestrial plants. However, the number of transferred sequences and sequence lengths extremely varied among the diverse plant species [46,47]. For example, about 0.35% of chloroplast sequences were transferred to the mitogenome in safflower, which was much less than the numbers of chloroplast DNA sequences in the mitogenomes of some plants (3-6%) [48]. In addition, most of the chloroplast-derived sequences were partially nonfunctional gene fragments with the exceptions for some tRNA genes, which was in agreement with most other terrestrial plants [49]. The almost complete chloroplastderived psaB gene was identified in the safflower mitogenome (Figure 5b), almost identical to the psaB gene in the chloroplast, which was also frequently found in the mitogenomes of most angiosperms. Therefore, the identification of mutually transferred DNA fragments between safflower organelle genomes can help us further understand the evolution of plants. The IGT events between organelle genomes were relatively common in most terrestrial plants. However, the number of transferred sequences and sequence lengths extremely varied among the diverse plant species [46,47]. For example, about 0.35% of chloroplast sequences were transferred to the mitogenome in safflower, which was much less than the numbers of chloroplast DNA sequences in the mitogenomes of some plants (3-6%) [48]. In addition, most of the chloroplast-derived sequences were partially non-functional gene fragments with the exceptions for some tRNA genes, which was in agreement with most other terrestrial plants [49]. The almost complete chloroplast-derived psaB gene was identified in the safflower mitogenome (Figure 5b), almost identical to the psaB gene in the chloroplast, which was also frequently found in the mitogenomes of most angiosperms. Therefore, the identification of mutually transferred DNA fragments between safflower organelle genomes can help us further understand the evolution of plants.

Phylogenetic Analysis
To analyze the phylogeny of safflower in Asteraceae species, the PCGs of the chloroplast and mitochondrial genome were used for phylogenetic analysis. In total, we selected 27 angiosperm species, including 26 Asteraceae species, and A. thaliana as an outgroup. Phylogenetic trees were built based on Bayesian inference (BI) and maximum-likelihood (ML) methods using common PCGs in the chloroplast and mitochondrial genomes, respectively. We found that our phylogenetic results were in general agreement with the previous phylogenetic position of safflower. Our phylogenetic results showed that safflower was more closely related to the three Cardueae species, A. lappa, A. tomentosum, and S. costus ( Figure 6). Our phylogenetic results were almost consistent with those of previous studies, indicating that the phylogenetic trees based on the mitochondrial PCGs were reliable like chloroplast PCGs. Hence, the safflower mitogenome will provide a reference for future phylogenetic studies of other Asteraceae [20,40].

Phylogenetic Analysis
To analyze the phylogeny of safflower in Asteraceae species, the PCGs of the chloroplast and mitochondrial genome were used for phylogenetic analysis. In total, we selected 27 angiosperm species, including 26 Asteraceae species, and A. thaliana as an outgroup. Phylogenetic trees were built based on Bayesian inference (BI) and maximumlikelihood (ML) methods using common PCGs in the chloroplast and mitochondrial genomes, respectively. We found that our phylogenetic results were in general agreement with the previous phylogenetic position of safflower. Our phylogenetic results showed that safflower was more closely related to the three Cardueae species, A. lappa, A. tomentosum, and S. costus ( Figure 6). Our phylogenetic results were almost consistent with those of previous studies, indicating that the phylogenetic trees based on the mitochondrial PCGs were reliable like chloroplast PCGs. Hence, the safflower mitogenome will provide a reference for future phylogenetic studies of other Asteraceae [20,40].

Conclusions
The topological structure of the plant mitogenome is highly dynamic. It has a diversity of conformations, including linear, circular, and branched structures. Usually, a main circle containing all mitochondrial genes represents the main plant mitogenome. In this study, we analyzed and reported the safflower mitogenome, which mainly contained two circular molecular sequences with a total length of 321,872 bp, encoding 55 different genes. In the safflower mitogenome, there were 24,953 bp of repeats over 30 bp in length, accounting for 7.75% of the whole genome. We compared the chloroplast genome and mitogenome of safflower and found partial sequence transfer events, with one gene from the chloroplast (psaB) remaining intact in the mitogenome. Although there were extensive genome rearrangement events between the mitogenomes of C. tinctorius, A. lappa, and S. costus, phylogenetic trees based on the organelle genomes support that safflower was more closely related to A. lappa, A. tomentosum, and S. costus. This safflower mitogenome not only enriched the genetic information but also contributed to the phylogenetic and evolutionary studies of Asteraceae.

Data Availability Statement:
The genomic data that support the findings of this study were available in GenBank of NCBI (http://www.ncbi.nlm.nih.gov/, accessed on 20 April 2023) under the accession no. OQ621745-OQ621746. The associated BioProject, BioSample, and SRA numbers are PRJNA642978, SAMN15402361, and SRR12189953-SRR12189989 of Pacbio SMRT, SRR12142474, and SRR12160977 of Illumina HiSeq, respectively.