Structural and Comparative Analysis of the Complete Chloroplast Genome of a Mangrove Plant: Scyphiphora hydrophyllacea Gaertn. f. and Related Rubiaceae Species

Scyphiphora hydrophyllacea Gaertn. f. (Rubiaceae) is an endangered mangrove species found in China, and its only known location is in Hainan Island. Previous studies conducted on S. hydrophyllaceae have mainly focused on its location, biological characteristics, and medical effects. However, to date, there has been no published report regarding the genetics or genome of this endangered mangrove species. In this study, we developed valuable chloroplast genome-related molecular resources of S. hydrophyllaceae by comparing with it related Rubiaceae species. The chloroplast genome of S. hydrophyllaceae was found to be a circular molecule with a total size of 155,132 bp, and it is observed to have a quadripartite structure. The whole chloroplast genome contains 132 genes, of which 88 and 36 are protein-coding and transfer RNA genes, respectively; it also contains four ribosomal RNA genes with an overall GC content of 37.60%. A total of 52 microsatellites were detected in the S. hydrophyllacea chloroplast genome, and microsatellite marker detection identified A/T mononucleotides as majority simple sequence repeats in all nine Rubiaceae chloroplast genomes. Comparative analyses of these nine chloroplast genomes revealed variable regions, including matK, rps16, and atpF. All nine species shared 13 RNA-editing sites distributed across eight coding genes. Phylogenetic analyses based on the complete sequences of the chloroplast genomes revealed that the position of S. hydrophyllaceae is closer to the Coffeeae genus than to Cinchoneae, Naucleeae, Morindeae, and Rubieae in the Rubiaceae family. The genome information reported in this study could find further application in the evolution and population genetic studies, and it helps improve our understanding of the endangered mechanism and the development of conservation strategies of this endangered mangrove plant.


Introduction
Scyphiphora hydrophyllacea is a shrub mangrove that belongs to the Scyphiphora genus (family: Rubiaceae), a monotypic genus whose distribution range extends south India and as well as Ceylon, Indochina, and Hainan in China through the Malay Archipelago and Philippines to Australia and New Caledonia and northward to the Solomon Islands and Palau [1]. Its distinguishing characteristics include rounded glossy leaves, fringed stipules, small white flowers, and eight-ribbed drupe-like fruits. Its terminal nodes and shoots are also distinctively covered by a resinous substance [2]. This species is often located along the high intertidal zones of the midestuarine reaches where it grows in pockets of scattered isolated shrubs and is often regarded as a minor constituent of the mangrove habitat. Based on the categories and criteria of the International Union for the Conservation of Nature Red List of Threatened Species, S. hydrophyllacea has been classified into the Least Concern category, which has a global loss of 20% [3]. In China, this species is found only in Hainan Island, and it has been included in a list of key wild plants under provincial protection in Hainan (2006). Although the incidences of fruit set in this mangrove is high, only a low percentage of seed germination has been reported [1]. Today, S. hydrophyllacea continues to be regarded as an important medical plant because of its medical properties such as antihepatocarcinogenic and antioxidant effects [4]. Several phytochemicals including flavonoids, terpinoids, and iriddoids have been reported in this species [5]. However, so far, no studies have been conducted to investigate its genetic background. The chloroplast (cp) genome of S. hydrophyllacea reported in this paper provides valuable information for further studies of the cp molecular biology of the species. These data will also promote work on genetic breeding and germplasm protective research, and are projected to help clarify the molecular evolution status of S. hydrophyllacea in Rubiaceae.
Cp are well known as the main site of photosynthesis, which is a process that provides the energy required for the synthesis of glucose, important fatty acids, starch, and pigments [6]. The size of 120-210 kb is typical for a higher plant cp genome, which usually encodes 120-140 genes. The typical quadripartite structure of the cp genome consist of a small single-copy region (SSC), a large single-copy region (LSC), and two inverted repeat (IR) regions [7]. Chloroplasts are independent genetic systems with a highly conserved genomic structure. Unlike the nuclear genome, cp DNA has the characteristics of multiple copies, low molecular weight, and a simple structure, which is considered to be beneficial and is rather conservative [8]. With ongoing developments in DNA sequence technologies, and a booming increase in the number of researchers focused on cp genome research, approximately 3621 plant cp genomes are now publicly available in the National Center for Biotechnology Information (NCBI) database. Rubiaceae is one of the largest families of angiosperms and consists of approximately 600 genera and more than 10,000 species [9], and yet only a few cp genome sequences are registered in the NCBI. In Rubiaceae, only two species-S. hydrophyllacea and Rustia occidentalis-belong to mangrove plants. In keeping with the important role that cp plays in the salt tolerance of higher plants [10], the whole cp genome information will provide the molecular data to explain this mangrove adaptation for tidal habitat.

Plant Material, DNA Extraction, and Sequencing
Mature and healthy leaves of S. hydrophyllacea from Sanya (18 • 13 21.09 N, 109 • 36 59.73 E), Hainan, China, were collected and then preserved in ice for further study. The corresponding voucher specimens of S. hydrophyllacea were deposited at the Hainan Normal University herbarium (BHM-001). The total DNA of leaves was extracted by using a plant DNA extraction Kit (Tiangen, Beijing, China) and following the manufacturer's instructions. The Illumina HiSeq platform was used to sequence the total DNA, which was carried out by a genome sequencing company (TGS, Shenzhen, China). The average 350-bp paired-end library was manufactured and sequenced using the Illumina Genome Analyzer (Hiseq PE150, Shenzhen, China).

Genomic Assembly, Annotation and Validation
To evaluate the quality of sequenced raw reads, the software FastQC (0.11.7) was used. Then, the cp genome related reads were filtered by mapping all the raw reads to the published cp genome sequences in Rubiaceae. The SPAdes (3.9.0) software was used to assemble the contig sequence [11]. All the transfer RNA sequences were verified using the software tRNAscan-SE version (2.0) [12]. Then, the Ribosome RNA sequences were analyzed with RNAmmer 1.2 Server. For the annotation of the S. hydrophyllacea cp genome, the DOGMA program was used [13]. Furthermore, the annotation results were checked manually, and then the codon positions were also adjusted via comparison to homologs from other cp genomes in Rubiaceae. The structural features of the S. hydrophyllacea cp genome were illustrated using the software OGDRAW (1.3.1) [14].

Codon Usage Analysis
The condon usage of the S. hydrophyllacea cp genome was analyzed using codonW software. The following conditions were used to minimize deviation in the results: (1) the length of every sequence coding for amino acids in protein must be more than 300 nucleotides (nt); and (2) repeat sequences were removed [15]. Possible RNA-editing sites in the S. hydrophyllacea protein-coding genes were predicted using the program predictive RNA editor for plants (PREPSuite) with the cutoff value set to 0.8 [16].

Phylogenetic Analysis
To gain an insight into the position of Scyphiphora in Gentianales and in an attempt to hypothesize when changes have taken place between/among the species and major clades, 40 cp genomes of Gentianales (data present in NCBI on 20 July 2019), including S. hydrophyllacea, were compared with each other. Furthermore, three mangroves species in Combretaceae-Lumnitzera littorea (Jack) Voigt, Lumnitzera racemosa Willd., and Laguncularia racemosa Gaern. f.-were chosen as out-groups. To minimize the overrepresentation of duplicated sequences, one of the IR regions in each plasmid was removed before analyses. Using the software MAFFT v7.427 [19], multiple sequence alignment was performed using default values. The software IQ-TREEv1.6.10 (http://www.iqtree.org) was used to select the best model, TVM+F+R3, and build the maximum likelihood tree with default parameters. For the insertion and deletion events analysis in Rubiaceae, multiple sequence alignment was analyzed among 10 cp genomes. According to the annotation results of the S. hydrophyllacea cp genome, all compared exons including indel regions were manually extracted using MEGA v6.0. The main indel of the indel length >10 bp was kept [20].

Basic Characteristics of cp GENOME of S. hydrophyllacea
The typical tetrad structure of the cp genome found in most plants [21] was also found in the S. hydrophyllacea cp genome with paired IR sequences encoded in opposite directions and LSC and SSC regions, as shown in Figure 1. The cp genome sequence of S. hydrophyllacea was deposited in GenBank under accession number MN390972. The total cp genome of S. hydrophyllacea was 153,132 bp in length, similar to other Rubiaceae cp genomes [22,23]. The LSC region was 85,239 bp, the SSC region was 18,165 bp, and the IR regions were 25,864 bp in the cp genome of S. hydrophyllacea.
(rRNA) genes ( Figure 1, Table 1). Of these, eight protein-coding genes, four rRNAs, and seven tRNAs are found to be duplicated in the IR regions. The protein-coding genes present in the S. hydrophyllacea cp genome include nine genes encoding large ribosomal proteins, in which rpl2 and rpl23 have two gene copies in IRs and furthermore, rpl2 has one intron; 12 small ribosomal protein genes; five genes encoding photosystem I components, 15 genes related to photosystem II, and six genes encoding adenosine triphosphate (ATP) synthase and electron transport chain component ( Table 1). The gene rps12 is a trans-spliced gene with its 5′ terminal located at the LSC region and the 3′ end with a copy located in each of the two IR regions, which is a common phenomenon in higher plants [24]. Similar patterns of protein-coding genes are also present in other Rubiaceae plants [22,23].  In the S. hydrophyllacea cp genome, a total of 132 genes were found, of which 113 are unique consisting of 80 protein-coding genes, 29 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes ( Figure 1, Table 1). Of these, eight protein-coding genes, four rRNAs, and seven tRNAs are found to be duplicated in the IR regions. The protein-coding genes present in the S. hydrophyllacea cp genome include nine genes encoding large ribosomal proteins, in which rpl2 and rpl23 have two gene copies in IRs and furthermore, rpl2 has one intron; 12 small ribosomal protein genes; five genes encoding photosystem I components, 15 genes related to photosystem II, and six genes encoding adenosine triphosphate (ATP) synthase and electron transport chain component ( Table 1). The gene rps12 is a trans-spliced gene with its 5 terminal located at the LSC region and the 3 end with a copy located in each of the two IR regions, which is a common phenomenon in higher plants [24]. Similar patterns of protein-coding genes are also present in other Rubiaceae plants [22,23].

Functions Category
Group of Genes Gene Name

Genes of Unknown Function
Conserved open reading frames ycf1 a , ycf2 a , ycf3 c , ycf4, ycf15 a A-Two gene copies in inverted repeat (IRs); b-Gene containing a single intron; c-Gene containing two introns; d-Pseudogene; e-Gene divided into two independent transcription units.

SSR Analysis
SSRs are accepted as important molecular markers for population variation studies in higher plants and are usually composed of 1-6 nt [15]. SSRs in the cp genome, similar to those in the nuclear genomes, are highly variable and are often used as genetic markers [25]. In this study, nine cp genome sequences of Rubiaceae plants (including S. hydrophyllacea) were used to determine SSR loci using MISA software ( Figure 2). A total of 52 microsatellites were identified in the S. hydrophyllacea cp genome ( Figure 2A, Figure S1, Table S1). Moreover, 43, 38, 46, 66, 67, 64, 54, and 45 SSRs were detected in C. arabica, C. canephora, E. henryi, G. aparine, G. mollugo, G. nanlingensis, M. speciosa, and M. officinalis, respectively ( Figure 2A, Table S1). G. aparine (66 SSRs), G. mollugo (67 SSRs), and C. canephora (38 SSRs) have the highest and lowest number of SSRs, respectively. All SSRs were classified into five types of microsatellites: Mononucleotide, dinucleotide, trinucleotide, tetranucleotide, and pentanucleotide ( Figure 2A,B). Consistent with previous reports, most of the SSRs are mononucleotide repeats [26]. In agreement with previous research, the number of mononucleotide repeats is more than the sum of  Figure 2B), and all mononucleotide repeats consist of A or T bases, which is analogous to other land plants [15]. As for the SSR loci, the repeats located in the LSC region are more frequent compared with those in the SSC region and IR regions in all the analyzed Rubiaceae plants ( Figure 2C). The frequency of identified SSR motifs in different repeat class types of these nine species are listed in Figure 2D. Mononucleotide A/T showed the highest frequency in all repeats.

Codon Usage and Putative RNA Editing Sites in cp Genes of S. hydrophyllacea
In this study, the codon usage frequency and the relative synonymous codon usage (RSCU) in the S. hydrophyllacea plastome were analyzed. All protein-coding genes presented a total of 68,907 bp and 22,969 codons in the S. hydrophyllacea cp genome. Among all the codons, leucine (Leu) was the most abundant amino acid with a frequency of 10.58%, followed by isoleucine (Ile) with a frequency of 8.61%, whereas cysteine (Cys) was less abundant with a frequency of 1.06% (Figure 3, Table S2 and S3). Leucine and isoleucine are among the more common codons in comparison with other previously reported land plant cp genomes [27,28]. All 19 A/U-ending codons had an RSCU value of >1, whereas two amino acids, methionine (Met) and tryptophan (Trp), with C/G-ending codons had RSCU values of <1 and showed no codon bias. The results for the number of codons (Nc) of each protein-coding gene ranged from 28.65% (petN gene) to 61.00% (PetG) ( Table S3). The condon usage bias of the cp genome may be caused by selection and mutation [29]; meanwhile, a better understanding of exogenous gene expression and molecular evolution mechanisms of S. hydrophyllacea can be gained from further research on codons.

Codon Usage and Putative RNA Editing Sites in cp Genes of S. hydrophyllacea
In this study, the codon usage frequency and the relative synonymous codon usage (RSCU) in the S. hydrophyllacea plastome were analyzed. All protein-coding genes presented a total of 68,907 bp and 22,969 codons in the S. hydrophyllacea cp genome. Among all the codons, leucine (Leu) was the most abundant amino acid with a frequency of 10.58%, followed by isoleucine (Ile) with a frequency of 8.61%, whereas cysteine (Cys) was less abundant with a frequency of 1.06% (Figure 3, Table S2 and S3). Leucine and isoleucine are among the more common codons in comparison with other previously reported land plant cp genomes [27,28]. All 19 A/U-ending codons had an RSCU value of >1, whereas two amino acids, methionine (Met) and tryptophan (Trp), with C/G-ending codons had RSCU values of <1 and showed no codon bias. The results for the number of codons (Nc) of each protein-coding gene  Table S3). The condon usage bias of the cp genome may be caused by selection and mutation [29]; meanwhile, a better understanding of exogenous gene expression and molecular evolution mechanisms of S. hydrophyllacea can be gained from further research on codons. Potential RNA-editing sites in S. hydrophyllacea plastome were analyzed using the PREP program, and the results showed that the most frequent conversions at the codon positions consist of serine (Ser) changing to leucine (Leu) ( Table 2). A total of 46 editing sites in 18 protein-coding genes were identified, with the ndhB and ndhD genes having the highest number of predicted RNA-editing sites, which is analogous to other land plants [27]. Furthermore, rpoB has four predicted RNA-editing sites, whereas accD, atpA, matK, and ndhA have three editing sites. All the RNA-editing conversions in the S. hydrophyllacea cp genome resulted in hydrophobic products comprising isoleucine, leucine, tryptophan, tyrosine, valine, methionine, and phenylalanine. These results are also congruent with previous reports, which found that the most RNA-editing sites in higher plants led to amino acid change from polar to apolar and resulted in an increase in protein hydrophobicity [15,29,30].  Potential RNA-editing sites in S. hydrophyllacea plastome were analyzed using the PREP program, and the results showed that the most frequent conversions at the codon positions consist of serine (Ser) changing to leucine (Leu) ( Table 2). A total of 46 editing sites in 18 protein-coding genes were identified, with the ndhB and ndhD genes having the highest number of predicted RNA-editing sites, which is analogous to other land plants [27]. Furthermore, rpoB has four predicted RNA-editing sites, whereas accD, atpA, matK, and ndhA have three editing sites. All the RNA-editing conversions in the S. hydrophyllacea cp genome resulted in hydrophobic products comprising isoleucine, leucine, tryptophan, tyrosine, valine, methionine, and phenylalanine. These results are also congruent with previous reports, which found that the most RNA-editing sites in higher plants led to amino acid change from polar to apolar and resulted in an increase in protein hydrophobicity [15,29,30].

Comparison of Basic Characteristics of the cp Genome of Nine Rubiaceae Species
The cp genome of Rubiaceae has the typical circular structure with lengths ranging from 152,712 to 155,600 bp (Table 3), and the cp genome of G. aparine has the shortest one. The LSC length of Rubiaceae is 83,594-86,298 bp, with the longest found in S. hydrophyllacea and the shortest found in G. aparine. The SSC length ranges from 17,054 bp to 18,208 bp, and the IR length varies from 25,594 bp to 26,076 bp. In most cases, the differences in the length of the IRs determine the length differences of the cp genome [31]. However, the largest difference in length was found in the LSC region rather than in the SSC and IR regions among the Rubiaceae cp genomes. The GC content of the Rubiaceae cp genomes was similar, and ranged from 37.18% to 38.52%, in which G. nanlingensis has the highest GC content ( Table 3). The obtained cp genome of S. hydrophyllacea exhibits the typical angiosperm quadripartite structure. Moreover, gene content, order, and GC content were consistent with those of the other members of the Rubiaceae family [22,23,32].
The first discovery of cp RNA-editing in cp came from the maize rpl2 transcript in 1991, in which an ACG codon changed to a start codon AUG, which was defined as the post-transcriptional modification of pre-RNAs [33]. Comparisons of RNA editing sites among all nine studied Rubiaceae species revealed that M. offocinalis has highest number of RNA-editing sites (58 in 23 genes), followed by E. henryi (58 in 21 genes). Meanwhile, the lowest number of RNA-editing sites is found in G. aparine (44 in 19 genes, Table S4). All nine Rubiaceae species shared 13 editing sites distributed in eight genes (Table 4), and the highly conserved RNA-editing sites occurred between genera (Table S4). Even though the most frequent editing events in higher plants are C-to-U/T changes, U/T-to-C editing has also been observed in this research [33]. In S. hydrophyllacea, on the other hand, 46 RNA-editing sites were found in 25 genes, all with C-to-T editing. Furthermore, not one U/T-to-C editing in all RNA-editing sites has been found in the other seven Rubiaceae species (Table S4). In all species except for G. nanlingensis, the ndhB gene was observed to have the highest number of editing sites, followed by the ndhD gene. In the G. nanlingensis cp genome, there are two editing sites in the ndhB gene. At the same time, a notable RNA-editing event was also detected in all nine Rubiaceae species at the initiator codon (ACG), resulting in an ATG translational start codon in the ndhD gene, which is analogous to several other plants [27,33]. For the ycf 3 gene, one editing site is found in both Galium species, and no editing site was found in the other seven tested cp genomes.  Table 4. List of RNA-editing sites shared by the nine plastomes predicted by the PREP program. To compare the sequence variation between species, an alignment of nine Rubiaceae species plastid genome sequences was carried out using the mVISTA program ( Figure 4). Overall, the comparative genomic analysis showed that nine Rubiaceae cp genomes were relatively conserved. In agreement with similar studies in other plants, the IR region appeared to be more conserved than the LSC and SSC regions [15,34]. The noncoding regions appeared to be more variable globally than the coding regions in the cp genomes of Rubiaceae species. In all nine cp genome sequences, some highly divergent regions, including matK, rps16, atpF, psaB, ycf 3, psbH, petD, rpl16, rpl22, ndhF, and ccsA were identified, which might be used as a source of potential molecular markers for Rubiaceae plants. However, further work is necessary to verify the suitability of these potential molecular markers for the phylogenetic studies of Rubiaceae. To compare the sequence variation between species, an alignment of nine Rubiaceae species plastid genome sequences was carried out using the mVISTA program ( Figure 4). Overall, the comparative genomic analysis showed that nine Rubiaceae cp genomes were relatively conserved. In agreement with similar studies in other plants, the IR region appeared to be more conserved than the LSC and SSC regions [15,34]. The noncoding regions appeared to be more variable globally than the coding regions in the cp genomes of Rubiaceae species. In all nine cp genome sequences, some highly divergent regions, including matK, rps16, atpF, psaB, ycf3, psbH, petD, rpl16, rpl22, ndhF, and ccsA were identified, which might be used as a source of potential molecular markers for Rubiaceae plants. However, further work is necessary to verify the suitability of these potential molecular markers for the phylogenetic studies of Rubiaceae. The IR region is always considered to be consistent and stable in the cp genome, and is also common in plant evolution with the events of border region contraction or expansion. In this study, the IR boundaries of the S. hydrophyllacea cp genome were analyzed and compared with those of the other eight Rubiaceae species ( Figure 5). The events of expansion or contraction within the border regions between the two IR regions and the single-copy regions are considered to contribute to the genome size variations among plant lineages [35]. According to our research, IR regions are more conservative than LSC and SSC regions in the cp genomes of Rubiaceae. Although there are still expansion or contraction events in IR regions observed among the studied representatives of Rubiaceae, they contributed little to the observed differences in the overall size of the cp genomes. Interestingly, C. canepora showed obvious differences compared with the other eight Rubiaceae The IR region is always considered to be consistent and stable in the cp genome, and is also common in plant evolution with the events of border region contraction or expansion. In this study, the IR boundaries of the S. hydrophyllacea cp genome were analyzed and compared with those of the other eight Rubiaceae species ( Figure 5). The events of expansion or contraction within the border regions between the two IR regions and the single-copy regions are considered to contribute to the genome size variations among plant lineages [35]. According to our research, IR regions are more conservative than LSC and SSC regions in the cp genomes of Rubiaceae. Although there are still expansion or contraction events in IR regions observed among the studied representatives of Rubiaceae, they contributed little to the observed differences in the overall size of the cp genomes. Interestingly, C. canepora showed obvious differences compared with the other eight Rubiaceae species with the rpl2 gene in LSC/IR, which was found in the IR region in other eight species. The location of ycf2 in the SSC/IR region was replaced by ycf1 in the other eight cp genomes ( Figure 5).

Phylogenetic Relationships in Gentianales
The alignment of complete plastid genome sequences resulted a well-resolved phylogenetic topology of 40 Gentianales taxa ( Figure 6). In general, species representing Ruiaceae, Gentianaceae, Apocynaceae, and Asclepiadaceae were clustered into three groups. Furthermore, Apocynaceae and Asclepiadaceaea had the nearest distance and were clustered into one group. Six subfamilies were clustered from those 10 Rubiaceae species (C. arabica, C. canephora, S. hydrophyllaceae, E. henryi, D. sinensis, M. speciosa, M. officinalis, N. cadamba, G. aparine, and G. mollugo), and the position of S. hydrophyllaceae appeared to be closer to the genus Coffea than to species representing Cinchoneae, Naucleeae, Morindeae, and Rubieae. Previous research discovered the tribal and generic relationships in Rubiaceae via analyses of morphology, nuclear, ribosomal internal transcribed spacer (ITS), the restrictions sites of cpDNA, and single chloroplast gene (rbcL) [9,36]. Four species (M. officinalis, E. hennryi, C. arabica, and C. canephora) in Rubiaceae based on whole protein-coding genes of the cp genome were used to evaluate the phylogenetic relationships in Gentianales plants [32]. Furthermore, the closely related phylogenetic relationships of Rubiaceae plants (two Coffea species) with an out-group plant such as the Solanaceae family was also analyzed using Conserved Ortholog Set II makers [37] or chloroplast genes [22]. Phylogenetic analyses based on the complete plastid genome sequence instead of a few genes have been conducted in several high land plant species [20]. Our phylogenetic analyses resolved similar topologies, which confirm the results of previous phylogenetic analyses in Rubiaceae based on fewer genes [9].
According to the gene annotation of S. hydrophyllacea, the exons including the indel region were analyzed using MEGA, and insertion events were found in the following genes: rpoC2, rpl20, rpl32, ndhF, and rbcL. Deletion events were also found in rpoB, accD, ccsA, and ycf4 ( Figure 6). The insertion

Phylogenetic Relationships in Gentianales
The alignment of complete plastid genome sequences resulted a well-resolved phylogenetic topology of 40 Gentianales taxa ( Figure 6). In general, species representing Ruiaceae, Gentianaceae, Apocynaceae, and Asclepiadaceae were clustered into three groups. Furthermore, Apocynaceae and Asclepiadaceaea had the nearest distance and were clustered into one group. Six subfamilies were clustered from those 10 Rubiaceae species (C. arabica, C. canephora, S. hydrophyllaceae, E. henryi, D. sinensis, M. speciosa, M. officinalis, N. cadamba, G. aparine, and G. mollugo), and the position of S. hydrophyllaceae appeared to be closer to the genus Coffea than to species representing Cinchoneae, Naucleeae, Morindeae, and Rubieae. Previous research discovered the tribal and generic relationships in Rubiaceae via analyses of morphology, nuclear, ribosomal internal transcribed spacer (ITS), the restrictions sites of cpDNA, and single chloroplast gene (rbcL) [9,36]. Four species (M. officinalis, E. hennryi, C. arabica, and C. canephora) in Rubiaceae based on whole protein-coding genes of the cp genome were used to evaluate the phylogenetic relationships in Gentianales plants [32]. Furthermore, the closely related phylogenetic relationships of Rubiaceae plants (two Coffea species) with an out-group plant such as the Solanaceae family was also analyzed using Conserved Ortholog Set II makers [37] or chloroplast genes [22]. Phylogenetic analyses based on the complete plastid genome sequence instead of a few genes have been conducted in several high land plant species [20]. Our phylogenetic analyses resolved similar topologies, which confirm the results of previous phylogenetic analyses in Rubiaceae based on fewer genes [9]. and deletion sequence lengths in each species are listed in Table S5. The accD gene encoding one of the four subunits of the acetyl-CoA carbosylase enzyme in most cps showed insertion events in Nicotianoideae and Solanoideae plants, which is regarded as a possible ancestral trait of these species [33]. In this research, the deletion events in the accD gene were found in M. speciosak, N. cadamba, and two Galium species. Furthermore, no insertion events in the accD gene were found in all the Rubiaceae species investigated in this study.

Conclusions
The complete cp genome sequence obtained from one endangered mangrove, S. hydrophyllacea, was compared with that of eight other Rubiaceae cp genomes. The cp genomes of those Rubiaceae species have undergone evolution at the gene level rather than the genome level, because no significant structural changes were found. The IR/SSC and IR/LSC junctions are relatively conservative in Rubiaceae except for C. canepora. Eleven cp DNA markers were developed from the relatively highly variable regions, which may be used for further studies that focus on the identification of markers. All the chosen Rubiaceae taxa were completely distinguished with high bootstrap support based on the whole cp genome sequences. Gene insertion events in five genes and deletion events in four genes were found in Rubiaceae cp genomes. The data presented in this study will help improve our understanding of the evolutionary history of Gentianales. The availability of this cp genome sequence will serve as a tool to advance the study of protection in S. hydrophyllacea and help researchers explore the endangered mechanism of and genetic questions about this species.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Table S1: SSRs in chloroplast genomes of nine Rubiaceae plants, Table S2: Codon usage in protein-coding genes from S. hydrophyllacea, Table S3: Codon usage for individual protein genes, Table S4: List of RNA-editing sites predicted by the PREP program in the selected chloroplast genome, Table S5: Insertion and deletion events of According to the gene annotation of S. hydrophyllacea, the exons including the indel region were analyzed using MEGA, and insertion events were found in the following genes: rpoC2, rpl20, rpl32, ndhF, and rbcL. Deletion events were also found in rpoB, accD, ccsA, and ycf4 ( Figure 6). The insertion and deletion sequence lengths in each species are listed in Table S5. The accD gene encoding one of the four subunits of the acetyl-CoA carbosylase enzyme in most cps showed insertion events in Nicotianoideae and Solanoideae plants, which is regarded as a possible ancestral trait of these species [33]. In this research, the deletion events in the accD gene were found in M. speciosak, N. cadamba, and two Galium species. Furthermore, no insertion events in the accD gene were found in all the Rubiaceae species investigated in this study.

Conclusions
The complete cp genome sequence obtained from one endangered mangrove, S. hydrophyllacea, was compared with that of eight other Rubiaceae cp genomes. The cp genomes of those Rubiaceae species have undergone evolution at the gene level rather than the genome level, because no significant structural changes were found. The IR/SSC and IR/LSC junctions are relatively conservative in Rubiaceae except for C. canepora. Eleven cp DNA markers were developed from the relatively highly variable regions, which may be used for further studies that focus on the identification of markers. All the chosen Rubiaceae taxa were completely distinguished with high bootstrap support based on the whole cp genome sequences. Gene insertion events in five genes and deletion events in four genes were found in Rubiaceae cp genomes. The data presented in this study will help improve our understanding of the evolutionary history of Gentianales. The availability of this cp genome sequence will serve as a tool to advance the study of protection in S. hydrophyllacea and help researchers explore the endangered mechanism of and genetic questions about this species.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/10/11/1000/s1, Table S1: SSRs in chloroplast genomes of nine Rubiaceae plants, Table S2: Codon usage in protein-coding genes from S. hydrophyllacea, Table S3: Codon usage for individual protein genes, Table S4: List of RNA-editing sites predicted by the PREP program in the selected chloroplast genome, Table S5: Insertion and deletion events of chloroplast genes in Rubiaceae species. Figure S1. The distribution, type, and presence of microsatellites (SSRs) in the chloroplast genome of S. hydrophyllacea. (A) Number of different SSR types; (B) Proportion of SSRs in LSC, SSC, and IR regions; (C) Number of identified SSR motifs in different repeat class types.