Comparitive Analysis of the Chloroplast Genomes of Three Houpoea Plants

The genus Houpoea belongs to the family Magnoliaceae, and the species in this genus have important medicinal values. However, the investigation of the correlation between the evolution of the genus and its phylogeny has been severely hampered by the unknown range of species within the genus and the paucity of research on its chloroplast genome. Thus, we selected three species of Houpoea: Houpoea officinalis var officinalis (OO), Houpoea officinalis var. biloba (OB), and Houpoea rostrata (R). With lengths of 160,153 bp (OO), 160,011 bp (OB), and 160,070 bp (R), respectively, the whole chloroplast genomes (CPGs) of these three Houpoea plants were acquired via Illumina sequencing technology, and the findings were annotated and evaluated. These three chloroplast genomes were revealed by the annotation findings to be typical tetrads. A total of 131, 132, and 120 different genes were annotated. The CPGs of the three species had 52, 47, and 56 repeat sequences, which were primarily found in the ycf2 gene. A useful tool for identifying species is the approximately 170 simple sequence repeats (SSRs) that have been found. The border area of the reverse repetition region (IR) was studied, and it was shown that across the three Houpoea plants, it is highly conservative, with only changes between H. rostrata and the other two plants observed. Numerous highly variable areas (rps3-rps19, rpl32-trnL, ycf1, ccsA, etc.) have the potential to serve as the barcode label for Houpoea, according to an examination of mVISTA and nucleotide diversity (Pi). Phylogenetic relation indicates that Houpoea is a monophyletic taxon, and its genus range and systematic position are consistent with the Magnoliaceae system of Sima Yongkang-Lu Shugang, including five species and varieties of H. officinalis var. officinalis, H. rostrata, H. officinalis var. biloba, Houpoea obovate, and Houpoea tripetala, which evolved and differentiated from the ancestors of Houpoea to the present Houpoea in the above order. This study provides valuable information on the genus Houpoea, enriches the CPG information on Houpoea genus, and provides genetic resources for the further classification of and phylogenetic research on Houpoea.


Introduction
Houpoea was the name given to Magnolia sect. Rytidospermum Spach by N.H. Xia and C.Y. Wu in 2008 [1]; Sect. Rytidospermum was the name given to Rytidospermum in the 1996 edition of Flora Republicae Popularis Sinicae. Houpoea plants include magnolol, magnocurarine, isomagnolol, and other medicinal components [2]. The dried bark, root, and branch bark of H.  [3]. Additionally, they may be employed as afforestation tree species and garden decorative tree species due to their benefits in terms of wood, color, and tree shape [4]. However, because of excessive collection, the natural population of Houpoea species has rapidly declined [5].
In addition, despite extensive study on the pharmacological components and breeding of Houpoea species [6][7][8], little is known about their genetic traits. In addition, the species composition and range of Houpoea is not clear enough: in the Xia Nianhe System [9][10][11], there are nine species of Houpoea, but the Sima Yongkang-Lu Shugang System [12] only includes five species and varieties, while in the Figlar-Nooteboom System [13] and the Wang Yubing System [14], Houpoea belongs to the Mognolia section Rytidospermum, a group of Houpoea, and only includes four species.
The chloroplast genome (CPG) is a superior option to the nuclear genome for studying nucleotide diversity and reconstructing the phylogeny of related species because it has obvious advantages over the nuclear genome (a smaller genome size, lower nucleotide substitution rate, parthenogenesis, and haploid characteristics) [15]. The cost of obtaining a genome sequence has significantly decreased as a result of the next-generation sequencing technology's quick development [16][17][18]. Inferring evolutionary links at a higher taxonomic level is therefore increasingly carried out using data from the chloroplast genome scale, and significant progress has been made even at a lower taxonomic level [19]. Species identification, structural change detection, nucleotide diversity assessment, phylogenetic connection resolution, and evolutionary history reconstruction have all been made possible with the use of CPG comparison and phylogenetic analysis [20]. The complexity of the nuclear genome [21][22][23][24][25] and the physical similarities of the Houpoea species make CPG more appropriate for phylogenetic analysis, species identification, and determining Houpoea preservation strategies [26].
In order to achieve these goals, we chose three endemic Houpoea taxa from China: H. officinalis var officinalis, H. officinalis var. biloba, and H. rostrata (W.W. Smith) N.H. Xia & C.Y. Wu. Although they are mostly located in diverse regions, including Hunan, Sichuan, and southern Yunnan, and have various growing environments, their chemical and pharmacological components are quite comparable [2,27,28]. As a way to filter out several distinct nucleotide sequences, we examined the CPGs of these three Houpoea taxa after obtaining their CPGs using high-throughput Illumina sequencing. The phylogenetic tree was then built to determine the genetic link between the three different taxa of Houpoea. These studies will serve as invaluable tools for the categorization of Houpoea as well as for the identification of varieties, assessment of quality, and genetic enhancement of Houpoea. For operation and extraction, a DNA kit extraction technique was employed [29]. The chloroplast DNA extracted in this study was purified via the high salt-low PH method, and the extracted DNA was detected as follows. First, 2 µL of DNA and 2 µL of 10 × loading buffer were absorbed, labeled with a 2000 bp DNA Maker, and electrophoresis was carried out at 150 V for 25 min. Then, detection was carried out and recorded with a gel imager. Sangon Biotechnology Company (Shanghai, China) used the Illumina platform to carry out the sequencing. Using the TruseqTM RNA sample preparation kit, a DNA library was created. The result was a library with an average insertion size of 400 bp. A total of 4.5 GB of 150 bp paired terminal readings was acquired and saved in FastQ format following library sequencing. The quality of the raw data was regulated with FastQC. After removing the low-quality reading and adapter, assembled the high-quality reader with the latest version of GetOrgenelle. 1.7. 4 We applied [30], and then used sequin, version 16.0 [31], for manual correction.

Annotation and Physical Map Drawing
Prokka [32] carried out the gene prediction. The CDD, KOG, COG, NR, and NT databases were compared using NCBI Blast+ [33], GO annotations were obtained using Uniprot, and the KEGG database's genes were annotated using KAAS [34] (KEGG Automatic Annotation Server). Additionally, it was compared to Magnolia officinalis, GenBank number is MW373503, which was included in Geneious 2022.0.1, in the NCBI repository (https://www.ncbi.nlm.nih.gov/ accessed on 31 October 2022) belonging to the authors of [35]. The physical maps were drawn online using CHLOROPLOT (https://irscope.shinyapps. io/Chloroplot/ accessed on 31 October 2022).

SSR and Repeat Sequences Analysis
The minimum repetitive size was chosen to be 30 bp, and REPuter [36] recognized repeat sequences comprising forward, backward, and palindromes with a Hamming distance of 3 bp. The settings were specified as default parameters when using the tandem repeats finder to identify tandem repetitions [37]. MISA detected SSR loci with the following parameters: the repeats of a single nucleotide, dinucleotide, trinucleotide, tetranucleotide, pentanucleotide and hexanucleotide, which are 8, 4, 4, 3, 3, and 3, respectively [38].

Genomic Comparative Analysis
Utilizing the mVISTA program, the complete CPGs of H. officinalis, H. officinalis var. biloba, and H. rostrata were compared. [39]. The Zhang et al. approach yielded variable characters with lengths greater than 200 bp in both coding and non-coding areas. [40].

Phylogenesis Analysis
The three species of Houpoea's CPG sequences were used, and we downloaded 26 other CPG data sets of Magnoliaceae from the NCBI database. The CPGs of these species were aligned using MAFFT [42], Geneious (2022) built the phylogenetic tree via the ML (maximum likelihood approach) approach [35].

CPG Organizations
The NCBI database received complete CPGs for the plants H. officinalis var. officinalis, H. officinalis var. biloba, and H. rostrata that were acquired via sequencing. Their respective login numbers are OM912809, OM912810, and MW800876. We compared and examined three CPG sequences. As shown in Figure 1, the findings demonstrate that the three CPGs each have a typical four-partition structure and range in size from 160,011 bp (H. officinalis var. biloba) to 160,153 bp (H. rostrata). The LSC (88,136-88,234) and SSC (18,781) divide the IRs (26,569). The GC content of the H. officinalis var. officinalis, H. officinalis var. Biloba, and H. rostrata CPGs is 39.26%, 39.22%, and 39.26%, respectively (Table 1), showing similar levels (~39%). The content and sequence of non-protein genes in Houpoea CPGs are similar: they all have 8 rRNA genes and 36 or 37 tRNA genes, but the contents and sequences of the protein-coding genes are different: H. officinalis var. officinalis, H. officinalis var. biloba, and H. rostrata contain 86, 87, and 76 protein-coding genes, respectively. In other types of genes, H. rostrata is also very different from the other two kinds of Houpoea plants: clPp and ycf3 are deleted ( Table 2). officinalis; (C) H. rostrata. Two IRs, one SSC, and one LSC are the four partition structures displayed. The inner ring is black, designating the GC content, and is marked on the product.

Analysis of the Repeat and the SSRs
The examination of the repeated sequences in Houpoea CPGs reveals that the distribution and quantity of the four different types of repetitive sequences (tandem, forward, reverse, and palindrome repetitive sequences) are comparable (Figure 2a). H. officinalis var. officinalis has the most tandem repetitions among them (26), whereas H. officinalis var. biloba has the fewest tandem repeats (21). Although H. rostrata (13,17) has the most forward and palindrome repetitions compared to the other two (11,14), the number of reverse repeats is the same (1). The three repetitions all follow the same pattern in terms of length: the tandem repeats are often 10-20 bp long (Figure 2d), but the forward and palindrome repeats are typically 30-40 bp long (Figure 2b,c). In addition, the repeat sequences of the three Houpoea plants are mainly distributed in the ycf2 gene and intergenic regions such as ycf3-ycf19 (Table S1 Repeat sequences of chloroplast genome of Three Houpoea officinalis).

Comparative Genomics
The IR regions are the most conservative areas in the CPGs in most angiosperms. The length of an IR region is inversely proportional to the genome's length. [44]. In order to evaluate and study the IR, LSC, and SSC border regions of these three Houpoea species, the findings revealed that the IR region's length varied between 26,558 and 26,569 BP (Table 1), and while variances across the three species were minimal, the IR border region did exhibit some noticeable disparities. Figure 4 illustrates how H. rostrata deleted the rpl2 gene in comparison to the other two in the IRs/LSC border areas. This demonstrates that the H. rostrata CPGs have a more conservative structure and function.   In the mVISTA comparison of three CPGs using H. rostrata as a reference, three Houpoea CPGs have a significant degree of sequence conservation, as seen in Figure 5, particularly in the coding area. Compared to gene areas, intergenic regions have mutation sites that are easier to find. The most highly variable areas are in the psbM-petN, trnfM-rps14, trnL-trnT, and psbL-psbJ, etc., which are the conserved non-coding sequence (CNS) regions.
The nucleotide diversity value (Pi) was determined using Genepioneer in order to intuitively explain the high variation region ( Figure 6). According to the findings, the percentage range is 0 to 2.5%, with an average of 0.16%. Ten areas with a Pi greater than 1% were discovered, which were various places for nine intergenic regions (rpl32-trnL, petA-psbJ, rpl32-trnL, petA-psbJ, trnC-petN, rps3-rps19, trnC-petN, ccsA-ndhD, and rps3-rps19) and one gene (trnL). The pl32-trnL has the highest Pi value (2.5%). Additionally, only the LSC and SSC areas include these highly variable regions, which is consistent with the structural flexibility of CPGs.

Phylogenetic Analysis
It has been established that the plastid offers more phylogenetic signals than fragment DNA markers, which is crucial for establishing the deep relationships in plant ancestry [45]. To examine the genetic connections between H. officialis var. officinalis, H. officialis var. biloba, and H. rostrata in more detail, Liriodendron Chinense (Hemsley) Sargent was used as the outer group, and 25 CPG sequences from the Magnoliaceae family were chosen. The maximum likelihood approach was used to build the phylogenetic tree. As seen in Figure 7, Houpoea obovata and H. officinalis var. officinalis were clustered together. This branch was then clustered with H. officinalis var. biloba and H. tripetala (Linnaeus) Sima, S.G. Lu, N.H. Xia & C.Y. Wu in turn, all of which had a 100% support rate. Meanwhile, H. rostrata is at the base of the genus Houpoea branch, formed sister group with the other three branches, and also had a 100% support rate. This phylogenetic analysis also provides a reference for the species composition and range of Houpoea. In the Sima -Lu's System, Houpoea includes five species and varieties which are also reflected in the ML tree of this study, while the Xia Nianhe system has a wider range, including four groups: Paramagnolia fraseri var. fraseri     Figure 7. Based on the entire sequence of 29 CPGs, a maximum likelihood tree was created. The species name and the sequence's GenBank login number are indicated in the illustration. A number for boot support based on 1000 copies is displayed next to the node. Each genus is labeled according to Sima-Lu's system and distinguished by color. Table 4 shows that the physical traits of the flowers and fruits are identical, while the color of the leaf coat and the form of the apex are the key variations between the three species of Houpoea plants. The morphological characteristics of H. rostrata are the most different from those of the other two plants, among which the leaves of H. rostrata are reddish brown with long hair, wide and round apexes, and are short and acute. However, H. officinalis var. biloba and var. officinalis have long white leaves and short sharp or obtuse tips. The difference between them is that the apex of an H. officinalis var. biloba leaf is concave and forms two obtuse, shallow lobes.

Discussion
High-quality DNA sample preparation is the first step in genome sequencing research. Compared with the nuclear genomic DNA, the extraction of chloroplast genomic DNA is more difficult, and the complete chloroplast must be obtained first. In the process of chloroplast DNA separation, there may be problems such as nuclear DNA, mitochondrial DNA pollution, and chloroplast integrity, which are the key factors restricting chloroplast DNA extraction. To obtain the high-quality chloroplast genomic DNA required for this research, chloroplast DNA was isolated in this work, purified via the high salt-low PH approach [46] to eliminate interfering DNA, and identified via gel electrophoresis.
Most angiosperms have CPGs that are between 120 and 160 kb in size [47]. The findings demonstrate that the CPGs of the three species of Houpoea are comparable in extent (approximately 160 kb) and composition (tetrad structure) to other Magnoliaceae plants [48] and other higher plants [49]. However, there are differences in the types and quantities of genes, among which H. rostrata (120) is the most different from the other two species (131-132). We hypothesize that H. officinalis var. biloba and var. officinalis have given rise to more variants throughout the course of evolution, while H. rostrata, the most ape-like species of Houpoea [50], has a more ape-like and conservative chloroplast genome. In particular, the GC content is greater in early differentiated lineages such as the Magnoliaceae. It has been shown that the total GC content is connected to the phylogenetic position [51]. Our findings agree with earlier research. The overall GC concentration of CPGs in the three species of Houpoea plants is roughly 39.2%, which is greater than the median GC content of most angiosperms (35%) [52].
Due to their structural variety in various CPG areas and their crucial involvement in plastid recombination, repeated sequences are frequently employed in phylogeny, population genetic analysis, and evolutionary studies [53]. The interspersed repeat sequence and tandem repeat sequence were detected in three species of Houpoea. All three types of Houpoea species lack complement repeats, which are the most common kind of interspersed repeat sequence. The difference in CPGs is caused by various ratios of forward, palindrome, and reverse repetitions [54]. The results of this study conform to this model: H. officinalis var. biloba (11,14,1) and var. officinalis (11, 14, 1) are very different from H. rostrata (13,17,1). Previous research has demonstrated that intergenic spacer regions, followed by coding areas, are where repeat sequences are most commonly found [55]. Our study demonstrates that genes such as ycf2 and intergenic spacer regions are primarily associated with the repetitive sequences of the three Houpoea plants (Table S1). The distribution of repeat sequences in CPGs is a significant source of structural diversity and a key factor in genome evolution [56]. As a result, the ycf2 gene, which contains a significant number of repeat sequences, may be the key to understanding the variations between the three types of Houpoea.
SSRs are molecular markers with significant application potential that have been extensively utilized in phylogenetic research, species breeding and protection, species identification, and other domains [57]. H. rostrata carries the most SSR markers among the three species of Houpoea, and single nucleotide repeats predominate in all of them. Pentanucleotide repeats are absent in all three CPGs, which is typical with other species of Magnoliaceae, and tetranucleotide repeats are often somewhat more prevalent than trinucleotide repeats and hexanucleotide repeats [58][59][60]. A strong foundation for related research, such as species identification, is provided by the SSRs of the CPGs of the three Houpoea species discovered in this work.
Through a comparison and analysis of the full CPG sequences of the three Houpoea, we discovered that the differences between the SSC and LSC regions are larger than those in the IR area, and the area that does not code is larger than the programming region. This finding was in line with previous studies on other Magnoliaceae plants [61]. This study shows that there are obvious differences among the three species in the IR boundary region: when comparing H. officinalis var. biloba and var. officinalis, H. rostrata, with a long beak, has lost the rpl2 gene at the IRb-LSC region. Our best opinion is that during evolution, changes in the living environment caused the Houpoea species to evolve. Moreover, through mVISTA comparison, we identified 10 regions showing significant differences in the CPGs of these three plants: nine areas with intergenic spacers (psbA-trnK, trnS-trnG, psbM-petN, trnfM-rps14, trnL-trnT, trnF-ndhJ, psbL-psbJ, rps3-rps19, and rpl32-trnL) and one gene (ycf1). Numerous evolutionary details are provided by the highly variable areas, which may also be employed as possible molecular markers to distinguish between related taxa [62]. Many highly variable areas, such as matk and ycf1, have been exploited to create DNA barcodes in Magnoliaceae. [63]. Nucleotide diversity (Pi) and mVISTA allowed us to identify ten highly variable areas, such as one gene (ccsA) and nine areas with intergenic spacers (rpl32-trnL, petA-psbJ, trnC-petN, rps3-rps19, trnC-petN, and CCSA). These regions may serve as Houpoea species identification barcode labels.
In resolving the evolutionary connection between angiosperms, the CPGs have demonstrated tremendous power [64]. Houpoea species' distributions are unclear [11]; hence, further proof is required to confirm their range. Consequently, the CPG sequences of 29 groupings of Magnoliaceae plants were used to create a phylogenetic tree via ML. The findings demonstrate a monophyly among all Houpoea sequences which is compatible with the classification of Houpoea under the Sima -Lu's System [12]. Among them, H. rostrata is located at the base of Houpoea, and then H. officinalis var. biloba, H. officinalis var. officinalis, H. obovata, and H. tripetala are separated in turn. It also shows the evolution of Houpoea: the species of Houpoea originated in the Himalayas, spread northeast to Central China, evolved into H. officinalis var. Biloba, and differentiated into H. officinalis var. officinalis. Through the Bering Strait, ocean currents, and birds, it migrated to the eastern and southern regions of North America at the same time as it continued to travel northeast to Japan, where it differentiated into H. obovata.

Conclusions
In this work, the CPGs of three species of Houpoea plants (H. officinalis var. biloba, H. officinalis and H. rostrata) were sequenced, annotated, and compared. The findings indicate that these three distinct species' CPGs are very conservative in terms of structure and gene content, but they also exhibit distinct variances that reflect their genetic link. A total of 170 SSR loci were also discovered as molecular markers to investigate the variety of Houpoea. The DNA barcode of the Houpoea species may consist of 10 extremely variable regions (rps3-rps19, rpl32-trnL, ycf1, ccsA, etc.). The division of Houpoea by the Sima Yongkang-Lu Shugang System is supported by the phylogenetic relationship, which demonstrates the clear evolutionary background of Houpoea and the grouping of all species of Houpoea into a single line. All the aforementioned information improved the genus Houpoea's genome data and offers valuable resources for future studies on the phylogeny and species identification of the genus.

Supplementary Materials:
The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/genes14061262/s1. Author Contributions: Q.X.: design and inspiration, ideas and methods, investigation, chart production, interpreting information, writing-original draft, and writing-review and editing; Z.L.: interpretation of information, writing-original draft, ideas and methods, and investigation; N.W.: investigation and interpretation of data; J.Y.: investigation and interpretation of data; L.Y.: investigation and interpretation of data; T.Z.: investigation and interpretation of data; Y.S.: design and concept, review-ideas and methods, financing acquisition T.X.: design and concept, writing-review and editing, financing acquisition, and administration. Each author has viewed and agreed upon the final manuscript draft. All authors have read and agreed to the published version of the manuscript.