Complete Chloroplast Genome of Megacarpaea megalocarpa and Comparative Analysis with Related Species from Brassicaceae

Megacarpaea megalocarpa, a perennial herbaceous species belonging to the Brassicaceae family, has potential medicinal value. We isolated and characterized the chloroplast (cp) genome of M. megalocarpa and compared it with closely related species. The chloroplast genome displayed a typical quadripartite structure, spanning 154,877 bp, with an overall guanine–cytosine (GC) content of 36.20%. Additionally, this genome contained 129 genes, 105 simple sequence repeats (SSRs), and 48 long repeat sequences. Significantly, the ycf1 gene exhibited a high degree of polymorphism at the small single copy (SSC) region and the inverted repeat a (IRa) boundary. Despite this polymorphism, relative synonymous codon usage (RSCU) values were found to be similar across species, and no large segment rearrangements or inversions were detected. The large single copy (LSC) and SSC regions showed higher sequence variations and nucleotide polymorphisms compared to the IR region. Thirteen distinct hotspot regions were identified as potential molecular markers. Our selection pressure analysis revealed that the protein-coding gene rpl20 is subjected to different selection pressures in various species. Phylogenetic analysis positioned M. megalocarpa within the expanded lineage II of the Brassicaceae family. The estimated divergence time suggests that M. megalocarpa diverged approximately 4.97 million years ago. In summary, this study provides crucial baseline information for the molecular identification, phylogenetic relationships, conservation efforts, and utilization of wild resources in Megacarpaea.


Introduction
Brassicaceae is a large group of angiosperms, consisting of 52 tribes, 321 genera, and approximately 4000 species.They are distributed worldwide, except for Antarctica, mainly in temperate regions [1,2].Most of the Brassicaceae plants have significant economic and medicinal value [3] and are utilized as adjuvant therapy to treat major illnesses such as cancer [4,5].For instance, Brassica oleracea has been demonstrated to reduce the risk of bladder cancer, as well as other types of cancer and cardiovascular disease incidence [6].However, taxonomic classification has remained controversial in Brassicaceae due to the utilization of various molecular markers.Previous studies categorized the phylogenetic relationships of Brassicaceae into four lineages (basal LI-III and expanded LII) [2,7] or six major clades (A-F) [8], based on the internal transcribed spacer of nuclear ribosomal DNA (ITS) or single-copy nuclear markers; however, statistical support was generally low.Subsequent studies utilizing chloroplast DNAs and nuclear genes identified five strongly supported lineages (LI-V) [9], but some tribes remained unassigned to any lineage.Therefore, phylogenetic relationships within and between these lineages in Brassicaceae still need to be further studied.
Megacarpaea is a perennial herb that is primarily found in Central Asia and the Himalayan region.In China, there are three species, four varieties, and one variant of For example, Megacarpaea delavayi is renowned for its heat-clearing and stomachic effects, and the rhizomes of Megacarpaea polyandra are utilized as a coolant in fever treatment or as an antidote for scorpion stings and snake bites by Bai and Tibetan people [11,12].Megacarpaea megalocarpa (Fisch.ex DC.) Schischk.ex B. Fedtsch., a perennial herb, grows in desert areas at altitudes of 200-3600 m.It grows 20-40 cm tall, with erect stems, and its basal leaves have oblanceolate leaf blades with pinnatisect margins.The inflorescences are paniculate, with sessile bracts at branching.The petals of M. megalocarpa are lavender in color (Figure 1).This species is found in the sandy deserts and alkaline plains in Kazakhstan, Kyrgyzstan, Russia, Uzbekistan, and China (Qinghai and Xinjiang).M. megalocarpa, a congener, may hold promise for its medicinal value.Previous studies have classified the phylogenetic relationships of Megacarpaea in the Brassicaceae family as part of the extended lineage II or clade C [7,13].However, recent research has left Megacarpaea unassigned to any specific lineage [9], making it difficult to determine its exact phylogenetic position.
Contrary to the mitochondrial and nuclear genomes, the chloroplast genome exhibits high conservation, characterized by a slow variation rate, maternal inheritance, and sequence stability [14,15].As a result, the chloroplast genome is extensively utilized for reconstructing phylogenetic analyses in angiosperms, identifying species, and determining the origin and divergence timelines of species [16].Nonetheless, research on the chloroplast genomes of Megacarpaea species has been limited, primarily focusing on M. polyandra and M. delavayi.There have been no reports on the chloroplast genomes of M. megalocarpa, which impedes our understanding of phylogenetic relationships within the Megacarpaea genus.Hence, we propose the hypothesis that the chloroplast genome of M. megalocarpa exhibits similar characteristics to other Megacarpaea species, and the M. megalocarpa phylogenetic relationship is more closely related to M. delavayi.Consequently, this research aims to sequence, assemble, and analyze M. megalocarpa through high-throughput sequencing to uncover its gene-level features.Our primary goals are as follows: (1) to characterize and compare the chloroplast genome of the tribe Megacarpaeeae, including M. megalocarpa, M. delavayi (GenBank ID: KX886349), M. polyandra (MK637758), Pugionium cornutum (KT844941), Pugionium dolabratum (KT844940), and Pugionium pterocarpum (MK637779); (2) to examine simple sequence repeats (SSRs) and repeat structures in the whole cp genome of Megacarpaeeae to provide markers for phylogenetic and genetic studies; and (3) to explore the phylogenomic position of M. megalocarpa.This study will offer a molecular foundation for the species identification of M. megalocarpa and the genetic evolution of Megacarpaeeae species.
example, Megacarpaea delavayi is renowned for its heat-clearing and stomachic effec the rhizomes of Megacarpaea polyandra are utilized as a coolant in fever treatment o antidote for scorpion stings and snake bites by Bai and Tibetan people [11,12].Mega megalocarpa (Fisch.ex DC.) Schischk.ex B. Fedtsch., a perennial herb, grows in dese at altitudes of 200-3600 m.It grows 20-40 cm tall, with erect stems, and its basal have oblanceolate leaf blades with pinnatisect margins.The inflorescences are pan with sessile bracts at branching.The petals of M. megalocarpa are lavender in color 1).This species is found in the sandy deserts and alkaline plains in Kazakhstan, K stan, Russia, Uzbekistan, and China (Qinghai and Xinjiang).M. megalocarpa, a con may hold promise for its medicinal value.Previous studies have classified the ph netic relationships of Megacarpaea in the Brassicaceae family as part of the extende age II or clade C [7,13].However, recent research has left Megacarpaea unassigned specific lineage [9], making it difficult to determine its exact phylogenetic position Contrary to the mitochondrial and nuclear genomes, the chloroplast genome e high conservation, characterized by a slow variation rate, maternal inheritance, quence stability [14,15].As a result, the chloroplast genome is extensively utilized constructing phylogenetic analyses in angiosperms, identifying species, and deter the origin and divergence timelines of species [16].Nonetheless, research on the plast genomes of Megacarpaea species has been limited, primarily focusing on M. po and M. delavayi.There have been no reports on the chloroplast genomes of M. mega which impedes our understanding of phylogenetic relationships within the Mega genus.Hence, we propose the hypothesis that the chloroplast genome of M. mega exhibits similar characteristics to other Megacarpaea species, and the M. megalocarp logenetic relationship is more closely related to M. delavayi.Consequently, this re aims to sequence, assemble, and analyze M. megalocarpa through high-through quencing to uncover its gene-level features.Our primary goals are as follows: (1) t acterize and compare the chloroplast genome of the tribe Megacarpaeeae, includ megalocarpa, M. delavayi (GenBank ID: KX886349), M. polyandra (MK637758), Pug cornutum (KT844941), Pugionium dolabratum (KT844940), and Pugionium ptero (MK637779); (2) to examine simple sequence repeats (SSRs) and repeat structures whole cp genome of Megacarpaeeae to provide markers for phylogenetic and studies; and (3) to explore the phylogenomic position of M. megalocarpa.This stu offer a molecular foundation for the species identification of M. megalocarpa and netic evolution of Megacarpaeeae species.

Survey Site Sampling and DNA Sequencing
Fresh, healthy leaf specimens of M. megalocarpa were collected from the desert region near Dure Town (88 • 32 ′ 15 ′′ E,46 • 30 ′ 36 ′′ N), Altay, Xinjiang.Following collection, all leaves were promptly preserved in liquid nitrogen and stored in a −80 • C ultra-low temperature freezer.The leaf samples were then dispatched to Genepioneer Biotechnologies (Nanjing, China) for genetic sequencing.DNA extraction was carried out using the Plant Genomic DNA Kit (Tian gen Biotechnology, Beijing, China).Paired-end libraries with insert sizes of 350 bp were created according to Illumina's standard protocol for genomic DNA library preparation, with subsequent quality assessments to verify sequencing precision.The whole genome of M. megalocarpa was sequenced on the Illumina Novaseq 6000 PE150 platform (Illumina, San Diego, CA, USA), utilizing the sequencing by synthesis (SBS) technology.

Chloroplast Genome Assembly and Annotation Analyses
High-quality clean reads were obtained by utilizing Trimmomatic v.0.39 [17] to filter out sequences with low quality (where the quality value was Q ≤ 5 and N bases > 5%).The assembly of the M. megalocarpa chloroplast genome was conducted using the GetOrganelle v.1.7.5 [18] pipeline, with the M. delavayi chloroplast genome serving as the reference.Automatic annotations of the chloroplast genomes were performed by Cp-GAVAS2 [19] and subsequently manually curated with reference to previously published chloroplast genomes using Geneious v.2021.1.1 [20].The structure maps of the chloroplast genome were visualized using the OGDRAW v.1.3.1 [21] online tool (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html,accessed on 20 December 2023).The chloroplast genome sequence for M. megalocarpa was submitted to the NCBI platform under accession number PP234616.

Inverted Repeats Boundary Analysis, Codon Usage Bias, and Genome Comparison
Comparisons were made between the IR/SSC and IR/LSC junctions of six different species by utilizing the online tool IRscope software (https://irScope.shinyapps.io/Irapp/,accessed on 4 December 2023) [24].An analysis was conducted on the boundaries and identities of the inverted repeat (IR) and single copy (SC) regions for M. megalocarpa, as well as for the aforementioned five species.The estimation of codon usage bias was carried out using CodonW v.1.4.2 [25].RSCU values for each codon were determined based on the coding sequences (CDSs) of the protein-coding genes from M. megalocarpa and other species of tribe Megacarpaeeae.Additionally, the comparison of six Brassicaceae chloroplast genomes was performed using the mVISTA [26] program in Shuffle-LAGAN mode, using M. delavayi as the reference genome.Results from dynamic visualization helped in observing gene conservation within the chloroplast genomes of the species, as well as identifying the presence or absence of variants and the locations where these variations occur.

Ka/Ks Analysis and Nucleotide Diversity
In order to evaluate the impact of evolutionary selection pressure on the chloroplast genome of the studied species, homologous protein sequences between M. megalocarpa and other species of the tribe Megacarpaeeae were obtained using BLASTN.The alignment of shared protein-coding genes was conducted using MAFFT v7.427 [27] software.The Ka/Ks ratios were determined by comparing non-synonymous (Ka) and synonymous (Ks) mutations between M. megalocarpa and other species through KaKs_Calculator2.0 [28].The calculation of nucleotide diversity (Pi values) was performed using DnaSP v.5.10 [29] with a sliding window analysis, where the window size was adjusted to 600 bp, with a step size of 200 bp.

Phylogenetic Analysis and Estimation of Divergence Times
The chloroplast genome sequences of 37 species from nine genera within Brassicaceae were analyzed for phylogenetic reconstruction.Data were sourced from the National Center for Biotechnology Information (NCBI) (Table S1).Aethionema arabicum and Aethionema grandiflorum were selected to serve as outgroups.Alignment of the sequences was performed using the MAFFT program with default parameters [27].Subsequently, phylogenetic trees were constructed using two methods: Maximum Likelihood (ML) and Neighbor Joining (NJ).For the ML tree, the GTR+I+G nucleotide replacement model was employed, along with 1000 bootstrap replicates through Phylosuite v.1.2.2 [30].Visualization of the ML tree was carried out using FigTree v.1.4.2 (download link: http://tree.bio.ed.ac.uk/software/figtree/, accessed on 16 April 2024).As for the NJ tree, the Kimura 2-parameter model was utilized with 1000 bootstrap replicates, analyzed using MEGA v11.0.13 [31].

SSRs and Long Repeat Sequences
The simple sequence repeats (SSRs) generally consisted of DNA sequences containing tandem repeats of one to six bases and were widely distributed at various locations throughout the chloroplast genome.In the genome of M. megalocarpa, 105 SSRs were discovered, comprising of 68 mononucleotide repeats, 20 dinucleotide repeats, 5 trinucleotide repeats, 11 tetranucleotide repeats, and 1 pentanucleotide repeat.Pentanucleotide repeats were only present in M. polyandra and M. megalocarpa, with M. polyandra having three and M. megalocarpa having one (Figure 3A).Hexanucleotide repeats were absent in six species, with mononucleotide repeats, primarily A/T bases, being the most common, indicating a bias in base composition.
The long repetitive sequences were relatively long DNA sequences that repeat in the chloroplast genome.Within the genome of M. megalocarpa, 48 long repetitive sequences

SSRs and Long Repeat Sequences
The simple sequence repeats (SSRs) generally consisted of DNA sequences containing tandem repeats of one to six bases and were widely distributed at various locations throughout the chloroplast genome.In the genome of M. megalocarpa, 105 SSRs were discovered, comprising of 68 mononucleotide repeats, 20 dinucleotide repeats, 5 trinucleotide repeats, 11 tetranucleotide repeats, and 1 pentanucleotide repeat.Pentanucleotide repeats were only present in M. polyandra and M. megalocarpa, with M. polyandra having three and M. megalocarpa having one (Figure 3A).Hexanucleotide repeats were absent in six species, with mononucleotide repeats, primarily A/T bases, being the most common, indicating a bias in base composition.were identified, consisting of 27 forward repeats, 15 palindromic repeats, 3 compleme tary repeats, and 3 reverse repeats.M. megalocarpa displayed the highest count of forwa and palindromic repeats among the six species, whereas P. dolabratum and P. cornutu lacked complementary and reverse repeats (Figure 3B).Notably, these sequences were n shorter than 30 bp, as the parameter was set to a minimum value of 30 bp.Mostly 30bp repeats were found in all species studied, with only M. polyandra containing one repe of 46-50 bp in length (Figure 3C).
(A)  The long repetitive sequences were relatively long DNA sequences that repeat in the chloroplast genome.Within the genome of M. megalocarpa, 48 long repetitive sequences were identified, consisting of 27 forward repeats, 15 palindromic repeats, 3 complementary repeats, and 3 reverse repeats.M. megalocarpa displayed the highest count of forward and palindromic repeats among the six species, whereas P. dolabratum and P. cornutum lacked complementary and reverse repeats (Figure 3B).Notably, these sequences were not shorter than 30 bp, as the parameter was set to a minimum value of 30 bp.Mostly 30-35 bp repeats were found in all species studied, with only M. polyandra containing one repeat of 46-50 bp in length (Figure 3C).

Expansion and Contraction of the Inverted Repeat Boundaries
Differences in genome size among plant species may arise from changes in the IR boundary region of the chloroplast genome.The analysis showed that six species had the same number of rps19 and ndhF genes at the borders of IRb/LSC and IRb/SSC, with gene crossover occurring.The genes rpl22, psbA, and trnH were exclusively located in the LSC region, while the rpl2 gene in the IR region and the ycf1 gene in the IRb region did not undergo gene crossover.In contrast to the other four species, the ycf1 gene was only present in the SSC region in M. megalocarpa and M. polyandra.Generally, the chosen species displayed a high level of conservation at the boundaries of LSC/IRb, IRb/SSC, and IRa/LSC, whereas the SSC/IRa boundary was more susceptible to mutations (Figure 4).crossover occurring.The genes rpl22, psbA, and trnH were exclusively located in the LSC region, while the rpl2 gene in the IR region and the ycf1 gene in the IRb region did not undergo gene crossover.In contrast to the other four species, the ycf1 gene was only present in the SSC region in M. megalocarpa and M. polyandra.Generally, the chosen species displayed a high level of conservation at the boundaries of LSC/IRb, IRb/SSC, and IRa/LSC, whereas the SSC/IRa boundary was more susceptible to mutations (Figure 4).

Codon Usage Bias Analysis
The analysis of codon usage bias in the chloroplast genome of M. megalocarpa revealed almost the same RSCU values among the six species examined.The amino acids with the highest codon diversity across all species were Arginine (Arg), Leucine (Leu), and Serine (Ser), each consisting of six different codons.Conversely, Methionine (Met) and Tryptophan (Trp) were represented by only one codon each.Among the 64 codons studied, 30 exhibited RSCU values exceeding 1, indicating a relatively high frequency of use.Notably, two codons, AUG and UGG, had RSCU values of 1, suggesting no specific bias in their utilization.Additionally, the analysis showed that codons ending in A/U tended to have RSCU values higher than 1, while those ending in C/G had values below 1, which was consistent with the situation in genomes with lower GC content (Figure 5).

Codon Usage Bias Analysis
The analysis of codon usage bias in the chloroplast genome of M. megalocarpa revealed almost the same RSCU values among the six species examined.The amino acids with the highest codon diversity across all species were Arginine (Arg), Leucine (Leu), and Serine (Ser), each consisting of six different codons.Conversely, Methionine (Met) and Tryptophan (Trp) were represented by only one codon each.Among the 64 codons studied, 30 exhibited RSCU values exceeding 1, indicating a relatively high frequency of use.Notably, two codons, AUG and UGG, had RSCU values of 1, suggesting no specific bias in their utilization.Additionally, the analysis showed that codons ending in A/U tended to have RSCU values higher than 1, while those ending in C/G had values below 1, which was consistent with the situation in genomes with lower GC content (Figure 5).

Comparative Analysis of Chloroplast Genome Sequences
A comparative analysis of the chloroplast genome sequences from six species indicated no significant rearrangements or inversions in any of the four regions (Figure 6).However, sequence variation was greater in the SSC and LSC regions compared to the IR

Comparative Analysis of Chloroplast Genome Sequences
A comparative analysis of the chloroplast genome sequences from six species indicated no significant rearrangements or inversions in any of the four regions (Figure 6).However, sequence variation was greater in the SSC and LSC regions compared to the IR region.The high conservation of the IR region may be attributed to the conserved properties of the rRNA genes situated within it.In the six species, the coding regions exhibited higher conservation than the non-coding regions.The coding region variants of the gene included rpoC2, psbG, accD, rpoA, rps11, rpl22, ndhF, ycf1.Meanwhile, variants in the gene's spacer region were primarily found in regions such as trnS-trnT, atpF-atpH, atpH-atpI, psbM-trnD, trnT-psbD, psaA-ycf3, trnT-trnL, trnF-ndhJ, rbcL-accD, petA-psbJ, and rps15-ycf1.

Selective Pressure Analyses
Calculating Ka/Ks between species revealed that the Ka/Ks values could not be calculated for some genes (atpA, atpH, ndhC, petB, petG, petL, petN, etc.) because Ka or Ks were 0, indicating that these genes were relatively conserved and had no nucleotides to replace.The genes with more than three NA values (Ka tends to infinity) or 0 values (Ks tends to 0) were excluded, and the Ka/Ks values of the remaining 41 genes were analyzed and visualized (Figure 7).The majority of protein-coding genes exhibited Ka/Ks values below one, suggesting that these genes were subject to purifying selection.The Ks/Ks values of rpl20 were greater than one, indicating that this gene was under positive selection.

Selective Pressure Analyses
Calculating Ka/Ks between species revealed that the Ka/Ks values could not be calculated for some genes (atpA, atpH, ndhC, petB, petG, petL, petN, etc.) because Ka or Ks were 0, indicating that these genes were relatively conserved and had no nucleotides to replace.The genes with more than three NA values (Ka tends to infinity) or 0 values (Ks tends to 0) were excluded, and the Ka/Ks values of the remaining 41 genes were analyzed and visualized (Figure 7).The majority of protein-coding genes exhibited Ka/Ks values below one, suggesting that these genes were subject to purifying selection.The Ks/Ks values of rpl20 were greater than one, indicating that this gene was under positive selection.

Phylogenetic Analyses and Estimation of Divergence Times
Studies on the taxonomic status and evolutionary relationships of M. megaloc show that the two phylogenetic methods (ML/NJ) had almost similar topological st tures, with generally high support values (Figure 9).Four major lineages were identi lineages I (Microlepidieae, Erysimeae, Arabidopsideae, Lepidieae), lineages (Chorisporeae, Dontostemoneaae, Hesperideae, Euclidieae), lineages II (Isatideae, B siceae), and expanded lineage II (Megacarpaeeae, Anastaticeae, Cochlearieae, Arabid Biscutelleae).The phylogenetic analysis confirmed the phylogenetic position of M. m locarpa in the expanded lineage II of Brassicaceae.The evolutionary trees confirmed Megacarpaea was closely related to Pugionium, and that M. megalocarpa and M. delavayi w the most closely related species, with bootstrap support values higher than 97 for their ML and NJ trees.

Phylogenetic Analyses and Estimation of Divergence Times
Studies on the taxonomic status and evolutionary relationships of M. megalocarpa show that the two phylogenetic methods (ML/NJ) had almost similar topological structures, with generally high support values (Figure 9).Four major lineages were identified: lineages I (Microlepidieae, Erysimeae, Arabidopsideae, Lepidieae), lineages III (Chorisporeae, Dontostemoneaae, Hesperideae, Euclidieae), lineages II (Isatideae, Brassiceae), and expanded lineage II (Megacarpaeeae, Anastaticeae, Cochlearieae, Arabideae, Biscutelleae).The phylogenetic analysis confirmed the phylogenetic position of M. megalocarpa in the expanded lineage II of Brassicaceae.The evolutionary trees confirmed that Megacarpaea was closely related to Pugionium, and that M. megalocarpa and M. delavayi were the most closely related species, with bootstrap support values higher than 97 for both their ML and NJ trees.

Phylogenetic Analyses and Estimation of Divergence Times
Studies on the taxonomic status and evolutionary relationships of M. megalocarpa show that the two phylogenetic methods (ML/NJ) had almost similar topological structures, with generally high support values (Figure 9).Four major lineages were identified: lineages I (Microlepidieae, Erysimeae, Arabidopsideae, Lepidieae), lineages Ⅲ (Chorisporeae, Dontostemoneaae, Hesperideae, Euclidieae), lineages II (Isatideae, Brassiceae), and expanded lineage II (Megacarpaeeae, Anastaticeae, Cochlearieae, Arabideae, Biscutelleae).The phylogenetic analysis confirmed the phylogenetic position of M. megalocarpa in the expanded lineage II of Brassicaceae.The evolutionary trees confirmed that Megacarpaea was closely related to Pugionium, and that M. megalocarpa and M. delavayi were the most closely related species, with bootstrap support values higher than 97 for both their ML and NJ trees.Estimated divergence times, using phylogenetic relationships as a reference, showed that the core Brassicaceae and Aethionemeae began to split at 36.77Mya during the Eocene boundary (Figures 10 and S1), while the origins of the major lineages chloroplast genome sequences or clades occurred between the Oligocene and Miocene.The divergences within lineages I and lineages III were dated to Oligocene, while those in the expanded lineage II were estimated to have occurred around 28.02 Mya .The divergence between Pugionium and Megacarpaea was estimated around 8.18 Mya (5.77-12.50Mya).M. megalocarpa diverged from M. delavayi about 4.97 Mya (2.82-6.76Mya), whereas M. megalocarpa diverged from M. polyandra about 6.63 Mya (3.65-8.54Mya).
Genes 2024, 15, x FOR PEER REVIEW 13 of 19 Estimated divergence times, using phylogenetic relationships as a reference, showed that the core Brassicaceae and Aethionemeae began to split at 36.77Mya during the Eocene boundary (Figures 10 and S1), while the origins of the major lineages chloroplast genome sequences or clades occurred between the Oligocene and Miocene.The divergences within lineages I and lineages III were dated to Oligocene, while those in the expanded lineage II were estimated to have occurred around 28.02 Mya ).The divergence between Pugionium and Megacarpaea was estimated around 8.18 Mya (5.77-12.50Mya).M. megalocarpa diverged from M. delavayi about 4.97 Mya (2.82-6.76Mya),whereas M. megalocarpa diverged from M. polyandra about 6.63 Mya (3.65-8.54Mya).

Architecture of Chloroplast Genomes in Megacarpaeeae
In this study, we present the first assembly and annotation of the M. megalocarpa whole chloroplast genome.Five previously reported closely related species were used in a basic comparative study to confirm the close relationships found within the genus Megacarpaea and with other genera.The size, structure, and gene content of the chloroplast genome in this species were highly similar to those of P. dolabratum, P. cornutum [34], and other Brassicaceae species [35,36], indicating a high conservation of the chloroplast genome structure.The guanine and cytosine (GC) content in the IR a/b region of M. megalocarpa was found to be higher than that in the LSC and SSC regions, which is consistent with previous chloroplast genome studies in species such as Sinapis alba and Eutrema

Architecture of Chloroplast Genomes in Megacarpaeeae
In this study, we present the first assembly and annotation of the M. megalocarpa whole chloroplast genome.Five previously reported closely related species were used in a basic comparative study to confirm the close relationships found within the genus Megacarpaea and with other genera.The size, structure, and gene content of the chloroplast genome in this species were highly similar to those of P. dolabratum, P. cornutum [34], and other Brassicaceae species [35,36], indicating a high conservation of the chloroplast genome structure.The guanine and cytosine (GC) content in the IR a/b region of M. megalocarpa was found to be higher than that in the LSC and SSC regions, which is consistent with previous chloroplast genome studies in species such as Sinapis alba and Eutrema japonicum [37,38].This suggests a high conservation of the IR region, possibly due to the conserved nature of the rRNA genes located in this region, resulting in a higher GC content in the IR region than in other regions.Moreover, the GC content exhibited variation across different species, which has influenced the distribution, environmental adaptability, and lifestyles of species [39].
Simple sequence repeats (SSRs) can be extensive applications across various biological fields, such as genetic map development and crop improvement, and is an important tool applied in genetic relationships, population structure, and phylogenetic analysis among species [40,41].The SSRs in the genomes of the six species primarily consisted of single nucleotide repeats.These repeats showed a bias towards A/T base compositions, which could be attributed to the higher susceptibility of A/T to change compared to G/C.This bias may be linked to the evolutionary history of the species or their environmental adaptations [42].Among the species, only M. polyandra and M. megalocarpa possessed three and one pentanucleotide repeat sequences, respectively.These variants were valuable for identifying polymorphic regions at the individual level and serve as specific markers for genetic diversity analysis [43].In this study, most of the long repetitive sequences were forward (F) and palindromic (P) repeats, which have also been observed in other studies of angiosperms [35,44,45].This further suggests that forward and palindromic repeat sequences play an important role in maintaining structural and functional stability in the genome and contribute to maintaining the integrity and stability of the genome [46].Additionally, the analysis of long repetitive sequence length showed a gradual decrease in the number of repetitive sequences as the sequence length increased, a phenomenon that was also found in the study of Stemona parviflora [47].
The differences in chloroplast genome length and structure were attributed to the expansion and contraction of inverted repeat (IR) boundaries [48].The contraction of the ycf1 gene at the M. megalocarpa and M. polyandra SSC/IRa boundaries was observed, and this variability has also been observed in other species, such as Rheum, Quercus, and Camellia [49][50][51].This variability is attributed to the high variability and susceptibility to mutation of the ycf1 gene, which has multiple mutation sites, and it encodes a protein that is a component of the chloroplast inner envelope membrane protein translocon [50,52].However, further validation was required to confirm the potential of the highly polymorphic ycf1 gene as a core DNA barcode [53].The boundaries between LSC/IRb (JLB), SSC/IRb (JSB), and LSC/IRa (JLA) were identical, indicating a closer relationship, which has been supported by subsequent phylogenetic studies.Additionally, significant differences in codon usage between species were observed [54].The amino acids with the highest variety of codon usage in this study were Arg, Leu, and Ser, while Met and Trp had only one codon.These findings were consistent with the results observed in S. parviflora and Cyathula officinalis [47,55].Among all the codons studied, Leu with the codon UUA exhibits the highest usage bias, while Leu with the codon CUG exhibits the lowest usage bias.Codons AUG and UGG show no usage preference (Table S4).A comparison revealed that almost all codons ending in A/U have RSCU values greater than one, while those ending in C/G have RSCU values less than one.This phenomenon might be attributed to the higher content of A and T bases, resulting in an obvious bias for A or T termination codons, a preference that may arise from evolutionary pressures and genetic alterations [56].
The inheritance pattern of the chloroplast genome was matrilineal, with a relatively low incidence of base substitutions and genome structure rearrangement events [57].This was supported by the absence of gene rearrangements and inversions in M. megalocarpa and other closely related species.While there were differences in the mVISTA analyses of the M. megalocarpa chloroplast genome, most of these differences were found in the intergenic spacer regions, with overall conservation remaining consistent.Not only do Brassicaceae plants exhibit this phenomenon, but Orchidaceae and Betulaceae also show similar results, supporting the conservation of the chloroplast genome [58,59].The ratio of non-synonymous-to-synonymous mutations (Ka/Ks) in genes encoding proteins was an important indicator of selection pressure in molecular evolution [60].In this study, only the rpl20 gene showed evidence of positive selection, was associated with transcription and translation, and suggested its involvement in adaptive evolution and environmental adaptation [61,62].The Ka/Ks values of the rpl20 gene showed positive selection in M. delavayi, P. cornutum, and P. pterocarpum, but purifying selection in M. megalocarpa, M. polyandra, and P. dolabratum, indicating that different species experienced varying evolutionary pressures on this gene.Nucleotide diversity studies have shown that the IR region of the species was less polymorphic, compared to the LSC and SSC regions.This was attributed to the lower variability of the conserved rRNA genes in the IR region [63].The 13 regions with high Pi values (>0.025) were more susceptible to nucleotide substitutions during evolution.Notably, the psaJ gene (Pi > 0.03) in the LSC region could serve as an effective molecular marker for species identification, providing valuable data and phylogenetic information for genetic evolutionary analyses.

Phylogeny of Chloroplast Genome of Megacarpaeeae
In phylogenetic trees, the ML and NJ trees constructed exhibited similar topologies.However, the lack of a robust, densely sampled Brassicaceae Tree of Life has resulted in a variety of different phylogenetic relationships at present.A system of four major lineages, as opposed to a system of three [64,65] or five [2,9] major lineage divisions, provides a good indication of the phylogenetic position of Megacarpaeeae.Therefore, four major lineages were identified, as well as the phylogenetic position of M. megalocarpa in the expanded lineage II of Brassicaceae, which is consistent with the findings of Kiefer et al. [13].Notably, Megacarpaea and Pugionium formed a highly supported monophyletic taxon, which is consistent with previous studies [7,37,64].This finding further confirms the close relationship between the Megacarpaea and Pugionium species.Additionally, M. delavayi was found to be more closely related to M. megalocarp than to M. polyandra.
The ages of the major Brassicaceae splits are in agreement with previously published results; most Brassicaceae species mainly diverged in the middle Miocene to Pleistocene [7].The divergences within M. megalocarpa with M. delavayi and M. polyandra were estimated to have occurred around 6.63 Mya (3.65 Mya-8.54Mya), which was basically consistent with the time-differentiation results of previous analyses, based on the chloroplast genomes of M. delavayi and M. polyandra [9].M. megalocarpa and M. delavayi diverged separately about 4.97 Mya (2.82 Mya-6.76Mya).This might have been due to the rapid uplift of the Tibetan Plateau in the Neogene, resulting in the formation of mountains such as the Tian Mountains and Qilian Mountains, and a harsh drought in the northwest from the late Miocene to Pliocene, leading to the differentiation of M. megalocarpa [66,67].M. megalocarpa grows in the sandy deserts and alkaline plains of northwestern China (Qinghai and Xinjiang).M. delavayi grows in the swampy meadows, on the steep grassy slopes, and in the open thickets of southwest China (Gansu, Qinghai, Sichuan, Xizang, Yunnan) at elevations of 3300-4800 m [10].Accordingly, M. megalocarpa's and M. delavayi's divergence could be inferred to be possibly related to violent geological movements from the massive Tibetan Plateau uplift, as well as the aridification of the northwestern region [67,68].

Conclusions
In this study, the chloroplast genome of M. megalocarpa was assembled and characterized and compared with other species of the tribe Megacarpaeeae.The results confirmed the previously proposed hypothesis that the chloroplast genome (154,877 bp) of M. megalocarpa shares similarities with the Megacarpaea species in terms of characteristics.It is worth noting that the psaJ gene in the LSC region can be used as a molecular marker for species identification.Phylogenetic analysis confirmed that M. megalocarpa and M. delavayi are closely related and differentiated independently around 4.97 Ma, suggesting that this may be related to the violent geological movement associated with the large-scale uplift of the Tibetan Plateau.This provides valuable genetic resources for understanding phylogenetic relationships within the genus and refining the complex classification and species identification of Brassicaceae plants.

Figure 2 .
Figure 2. Chloroplast genome map of Megacarpaea megalocarpa.Genes within and outside the circle consist of forward-coding genes and reverse-coding genes.The varying shades of gray in the innermost circle represent the GC and AT contents.

Figure 2 .
Figure 2. Chloroplast genome map of M. megalocarpa.Genes within and outside the circle consist of forward-coding genes and reverse-coding genes.The varying shades of gray in the innermost circle represent the GC and AT contents.
Genes 2024, 15, x FOR PEER REVIEW 7 of

Figure 3 .
Figure 3. SSRs (simple sequence repeats) and long repetitive sequences in M. megalocarpa and other five species.(A): types and number of SSRs; (B): the quantity of four types of long repetitive sequences; (C): length of long repetitive sequences.

Figure 4 .
Figure 4. Comparison of the LSC, SSC, and IR region boundaries in the chloroplast genomes of Megacarpaea megalocarpa and other five species.Different boxes represent different gene names.Different colors represent the four regions and gene names.

Figure 4 .
Figure 4. Comparison of the LSC, SSC, and IR region boundaries in the chloroplast genomes of M. megalocarpa and other five species.Different boxes represent different gene names.Different colors represent the four regions and gene names.

Genes 2024 , 19 Figure 5 .
Figure 5.The relative synonymous codon usage of 20 amino acids and the stop codon in the CDS of the chloroplast genome for Megacarpaea megalocarpa and other species of the tribe Megacarpaeeae.The sequence from left to right consists of Megacarpaea megalocarpa, Megacarpaea polyandra, Megacarpaea delavayi, Pugionium dolabratum, Pugionium.cornutum,and Pugionium pterocarpum.Different colors represent different codons encoding amino acids.

Figure 5 .
Figure 5.The relative synonymous codon usage of 20 amino acids and the stop codon in the CDS of the chloroplast genome for M. megalocarpa and other species of the tribe Megacarpaeeae.The sequence from left to right consists of M. megalocarpa, M. polyandra, M. delavayi, P. dolabratum, P. cornutum, and P. pterocarpum.Different colors represent different codons encoding amino acids.

Genes 2024 , 19 Figure 6 .
Figure 6.Sequence identity plot comparing the chloroplast genomes of Megacarpaea megalocarpa with those of five other species.Exons, UTR, CNS, and mRNA are marked with different colors.The yaxis represents the percentage of sequence identity from 50% to 100%.Gray arrows above the alignment indicate gene transcription direction.Arrows indicate the annotated genes in the reference genome of Megacarpaea delavayi and their transcription directions.

Figure 6 .
Figure 6.Sequence identity plot comparing the chloroplast genomes of M. megalocarpa with those of five other species.Exons, UTR, CNS, and mRNA are marked with different colors.The y-axis represents the percentage of sequence identity from 50% to 100%.Gray arrows above the alignment indicate gene transcription direction.Arrows indicate the annotated genes in the reference genome of M. delavayi and their transcription directions.

Figure 7 .
Figure 7.The Ka/Ks values of 41 genes.Ka/Ks values for 41 genes between Megacarpaea megalocarpa and five other species, two-by-two.

Figure 7 .
Figure 7.The Ka/Ks values of 41 genes.Ka/Ks values for 41 genes between M. megalocarpa and five other species, two-by-two.

Figure 7 .
Figure 7.The Ka/Ks values of 41 genes.Ka/Ks values for 41 genes between Megacarpaea megal and five other species, two-by-two.

Figure 9 .
Figure 9. Phylogenetic tree constructed based on 37 species.Numbers at nodes are Maximum lihood and Neighbor Joining bootstrap values (BS), separated by "/".Blue letters represent the groups; red letters represent study species; pink letters represent different lineages.

Figure 9 .
Figure 9. Phylogenetic tree constructed based on 37 species.Numbers at nodes are Maximum Likelihood and Neighbor Joining bootstrap values (BS), separated by "/".Blue letters represent the outgroups; red letters represent study species; pink letters represent different lineages.

Figure 9 .
Figure 9. Phylogenetic tree constructed based on 37 species.Numbers at nodes are Maximum Likelihood and Neighbor Joining bootstrap values (BS), separated by "/".Blue letters represent the outgroups; red letters represent study species; pink letters represent different lineages.

Figure 10 .
Figure 10.BEAST-derived chronograms of Brassicaceae based on the chloroplast genome sequences with three calibration points (red pentagram) derived from previous studies.Abbreviations of geological time are as follows: Q: Quaternary, P: Pleistocene, Pli: Pliocene.

Figure 10 .
Figure 10.BEAST-derived chronograms of Brassicaceae based on the chloroplast genome sequences with three calibration points (red pentagram) derived from previous studies.Abbreviations of geological time are as follows: Q: Quaternary, P: Pleistocene, Pli: Pliocene.

Table 1 .
Chloroplast genome information of M. megalocarpa and other species of the tribe Megacarpaeeae.

Table 2 .
Functional classification of the chloroplast genome of M. megalocarpa.Note: gene*: gene with one intron; gene**: gene with two introns; gene (2): number of copies of multi-copy genes.