Complete chloroplast genome sequence of the mangrove species Kandelia obovata and comparative analyses with related species

As one of the most cold and salt-tolerant mangrove species, Kandelia obovata is widely distributed in China. Here, we report the complete chloroplast genome sequence K. obovata (Rhizophoraceae) obtained via next-generation sequencing, compare the general features of the sampled plastomes of this species to those of other sequenced mangrove species, and perform a phylogenetic analysis based on the protein-coding genes of these plastomes. The complete chloroplast genome of K. obovata is 160,325 bp in size and has a 35.22% GC content. The genome has a typical circular quadripartite structure, with a pair of inverted repeat (IR) regions 26,670 bp in length separating a large single-copy (LSC) region (91,156 bp) and a small single-cope (SSC) region (15,829 bp). The chloroplast genome of K. obovata contains 128 unique genes, including 80 protein-coding genes, 38 tRNA genes, 8 rRNA genes and 2 pseudogenes (ycf1 in the IRA region and rpl22 in the IRB region). In addition, a simple sequence repeat (SSR) analysis found 108 SSR loci in the chloroplast genome of K. obovata, most of which are A/T rich. IR expansion and contraction regions were compared between K. obovata and five related species: two from Malpighiales and three mangrove species from different orders. The mVISTA results indicated that the genome structure, gene order and gene content are highly conserved among the analyzed species. The phylogenetic analysis using 54 common protein-coding genes from the chloroplast genome showed that the plant most closely related to K. obovata is Ceriops tagal of Rhizophoraceae. The results of this study provide useful molecular information about the evolution and molecular biology of these mangrove trees.


INTRODUCTION
Kandelia obovata Sheue, Liu & Yong, 2003 is a viviparous mangrove species belonging to Rhizophoraceae in Malpighiales that inhabits the intertidal zones of tropical and subtropical coasts. It is distributed from northern Vietnam through southeast China to south Japan in East Asia (Sheue, Liu & Yong, 2003;Tomlinson, 1986). This species is naturally distributed in the Hainan, Guangdong (including Hong Kong and Macau), Guangxi, Fujian and Taiwan Provinces of China (Li & Lee, 1997). With strong cold-resistance and high salt tolerance, K. obovata is one of the northernmost mangrove species in China (Bin-Yuan et al., 2007). Previous studies have found this species to have the ability to accumulate heavy metals (Weng et al., 2012;Weng et al., 2014). Previous studies of the molecular biology of K. obovata have focused on geographical relationships, genetic diversity (Chen et al., 2010) and cold stress (Fei et al., 2015). However, there has been no report of the chloroplast genome of K. obovata, which may be important for illuminating the evolution of mangrove species.
With the development of DNA sequencing technologies, an increasing number of researchers have focused on chloroplast genome research. Since the first two complete chloroplast genomes were reported from liverwort (Ohyama et al., 1986) and tobacco (Shinozaki et al., 1986), approximately 2,300 plant chloroplast genomes have been made publicly available in the National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/genome/browse#!/organelles/). Chloroplasts are organelles that provide energy to the plant and play an important role in photosynthesis and many biosynthetic activities (Douglas, 1998;Keeling, 2004). The structure of the chloroplast genome in most plants is characterized by a typical circular quadripartite structure and double-stranded DNA molecule, including a pair of inverted repeats (IRs) separated by a large single-copy region (LSC) and a small single-copy region (SSC) (Jansen et al., 2005). Generally, the chloroplast genome ranges from 107-218 kb in length and includes 110-130 genes, which are mainly involved in photosynthesis, transcription and translation; the gene content and gene order of the genome are highly conserved among taxa (Asaf et al., 2016;Daniell et al., 2016;Sugiura, 1992).
Up until the present study, 117 full plastid genome sequences from order Malpighiales had been published in NCBI, none of which are from mangrove plants species. Sequences of the true mangrove Lumnitzera littorea (MG182696) in Combretaceae, Sonneratia alba (MH105772) in Lythraceae and semi-mangrove Barringtonia racemosa (NC035705) in Lecythidaceae have been reported, but these species belong to Myrtales or Ericales. Comparisons of the chloroplast genome among K. obovata and mangrove species from different orders that also experience high salt and anoxic stress will improve our understanding of the evolution of stress tolerance.
In this study, we sequenced and analyzed the complete chloroplast genome of K. obovata based on next-generation sequencing methods (Illumina, HiSeq X Ten, San Diego, California, USA), and deposited the annotated sequence into the NCBI database under accession number MH277332. Then, the complete chloroplast genomes of six related species from different orders were compared to explore the evolution of the chloroplast genome. Subsequently, a phylogenomic analysis was performed based on the 54 proteincoding genes of 22 chloroplast genomes. Our study will improve the understanding of the evolutionary relationships among these mangrove species.

Sampling and DNA sequencing
Samples were collected in Xinyin National Wetland Park, Danzhou,China (19 • 30 N,109 • 30 ). Voucher specimens were deposited in the herbarium of Hainan Normal University under accession number Yang Y-201803, and replicate specimens (XY2019061201) were sent to Traditional Chinese Medicine Herbarium of Hainan Province for conservation. Five fresh leaves were collected from five healthy trees of K. obovata and then stored on ice for the return to the laboratory. Total genomic DNA was extracted from mixed fresh leaf tissues using the CTAB method (Doyle & Doyle, 1987). The extracted total genomic DNA was dissolved in 50 µL of TE buffer. After quality and concentration were analyzed by agarose electrophoresis and spectrophotometry (Beijin Puxi T6, China), a final DNA concentration of >30 ng/mL was used for Illumina sequencing. Library preparation and sequencing were performed at TGS-Shenzhen, China. The genome was sequenced on an Illumina HiSeq X Ten platform (Illumina, San Diego, CA, United States) with 150 bp paired-end reads.

Simple sequence repeat (SSR) analysis
Distributed throughout the genome, SSRs are repeat sequences with a typical length of 1-6 bp that are generally considered to have a higher mutation rate than neutral DNA regions. The distributions of SSRs in the chloroplast genome were predicted by using the microsatellite search tool MISA (Kurtz et al., 2001) with the following parameters: ≥10 for mononucleotide repeats, ≥5 for dinucleotide repeats, ≥4 for trinucleotide repeats, and ≥3 for tetranucleotide repeats, pentanucleotide repeats, and hexanucleotide repeats.

Comparative Genome analysis
To investigate the sequence divergence of the chloroplast genome among the analyzed mangrove species, the whole chloroplast genome sequences of the mangrove species K. obovata and C. tagal of Rhizophoraceae, L. littorea of Combretaceae, and S. alba of Lythraceae; the semi-mangrove species B. racemosa of Lecythidaceae; and the land species E. novogranatense of Erythroxylaceae were analyzed using the mVISTA program in the Shuffle-LAGAN mode (Frazer et al., 2004). The K. obovata annotations were used as references. The differences in the chloroplast genome length, LSC length, SSC length, GC content, encoding gene types and gene numbers among these 6 species were analyzed. The LSC/IR/SSC boundaries among the species were determined by comparative analysis to explore the variation in these angiosperm chloroplast genomes.

Genome organization and gene features
In total, we obtained 5,542 Mb short sequence data with a Q20 of 98.62% using the Illumina HiSeq X Ten platform. We obtained the same chloroplast genome sequence of K. obovata by the two methods (NOVOPlasty and SOAPdenovo2). The K. obovata chloroplast genome is a typical double-stranded circular DNA molecule with a quadripartite structure. The length is 160,325 bp ( Fig. 1), with a pair of IR regions 26,670 bp in length that separate an LSC region of 91,156 bp and an SSC region of 15,829 bp. The GC content of the chloroplast genome is 35.23% (Table 1), and the GC contents of the LSC, SSC and IR regions are 32.34%, 29.05% and 41.99% respectively. As the rRNA genes rrn23, rrn16, rrn5 and rrn4.5 are located in the IR region, the IR region shows a higher GC content. The above-described content of the K. obovata chloroplast genome is similar to the contents of other chloroplast genomes in Malpighiales (de Santana Lopes et al., 2018;Li et al., 2017). However K. obovata, C. tagal and E. novogranatense, all of Malpighiales, have larger genome sizes and LSC sizes and lower GC contents than those mangrove species from different orders ( Table 1).
The chloroplast genome contains a total of 128 unique genes, including 80 proteincoding genes, 38 tRNA genes, 8 rRNA genes and 2 pseudogenes. Among them, 19 of these  genes occur in IRs, and contain 8 protein-coding genes (rps19, rpl2, rpl23, ycf2, ndhB, rps7, rps12, ycf1), 7 tRNA genes (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, trnN-GUU ) and 4 rRNA genes (rrn23, rrn16, rrn5, rrn4.5). In total, 15 genes with introns were found. Thirteen of these genes contain one intron, and two of these genes (clp3 and ycf3) contain two introns (Table 2). Rps12 is a trans-spliced gene with a 5 exon located in an LSC region and two 3 exons located in IR regions, similar to most other plant chloroplast genomes (Asaf et al., 2016;Cheng et al., 2017;Raman & Park, 2016;Jiang et al., 2017). The start of the nucleotide sequence of a protein-coding gene usually begins with ATG. However, there are some exceptions in the K. obovata chloroplast genome in Table 2 List of annotated genes in the chloroplast genome of K. obovata.  Notes. a Gene with two introns. b Gene with one intron. c Genes located in the inverted repeats.

Category Group of genes Name of genes
which the first nucleotide has changed from A to G or the second nucleotide has changed from T to C, such as rps19 and cemA, which begin with GTG, and ndhD which begin with ACG. The K. obovata chloroplast genome with gene annotations was submitted to GenBank under the accession number MH277332.

Simple sequence repeats (SSR) analysis
With the development of next generation sequencing (NGS) technologies, SSR development has become quicker, more efficient and cheaper than before, even in species for which background genetic information is lacking (Davey et al., 2011;Zalapa et al., 2012). In our research, using the microsatellite identification tool MISA, we identified 108 SSR loci in the K. obovata complete chloroplast genome sequence, including 92 mononucleotide SSR loci (A/T), 15 dinucleotide SSR loci (AT/TA), and 1 trinucleotide SSR loci (AAT) ( Table 3).
A total of 103 of the 108 SSRs are located in intergenic regions, and 5 SSRs are located in gene-coding regions. Furthermore, 83, 6 and 19 SSRs, were discovered within the LSC, SSC and IR regions, respectively (Fig. 2).

Comparative analysis of the chloroplast genome sequences of six species
In this study, the chloroplast genomes of several mangrove species and one land species were analyzed by the mVISTA program. Considerable similarities in genome composition and size were identified among the species (Fig. 3). The coding regions of the two mangrove species in Rhizophoraceae were almost identical, whereas the non-coding regions are more variable. The mangrove species in Malpighiales shows a closer relationship with the land plant E. novogranatense, which belongs to the same order, than to the mangroves species from different orders.

IR contraction and expansion
In this study, we aligned the positions of the LSC, IRA, SSC and IRB borders and the adjacent genes among members of mangrove species and found that the studied locations are generally similar to those of all previously reported chloroplast genomes (Jo et al., 2016;Wei et al., 2017;Zhou et al., 2017;Yu et al., 2018). However, in the analyzed species, two copies of the K. obovata rpl22 gene are located in the boundaries of the LSC/IRA junction and IRB region; as the rpl22 gene in the IRB region has no open reading frame that encodes a functional protein, we regarded this gene as a pseudogene. However, in the other species, rpl22 is only located in the LSC region, and there is no rpl22 in the IRB region. The IR extended into the ycf1 genes, creating long ycf1 pseudogenes with variable lengths. The length of the ycf 1 pseudogene is in the IRA region is 1,353 bp, whereas the ycf1 in the SSC/IRB junction contains intact open reading frames (ORFs). There are two copies of rps19 in the IR region of the K. obovata chloroplast genome, but in the other species, there is only one copy, located in the junction of the LSC/IRA region. In all the analyzed genomes, the trnH gene is the first gene in the LSC region, although its distance from the IRB/LSC junction ranges from 1 bp to 79 bp. Comparisons among the species revealed that the K. obovata chloroplast genome has lost the ndhF gene (Fig. 4).

Phylogenetic analysis
To analyze the phylogenetic position of K. obovata in Malpighiales, we used the common 54 protein-coding genes among the 22 complete chloroplast genome sequences to infer phylogenetic relationships. The phylogenetic trees generated by the ML and MP methods have similar topologies (Fig. 5). Both show that K. obovata is most closely related to C. tagal, with 100% bootstrap supports; both of these taxa belong to Rhizophoraceae. These two species are more closely related to the land species E. novogranatense (Erythroxylaceae) than to the other species, suggesting that Erythroxylaceae is sister to Rhizophoraceae. Our study will provide valuable genetic information for genome-scale phylogenetic studies in mangrove plants.

DISCUSSION
With the development of NGS, chloroplast genome sequences can be obtained efficiently and economically. In the present study, we obtained the complete sequence of the K. obovata chloroplast genome (160,325 bp), which was fully characterized and compared to the chloroplast genomes of species from different orders. The K. obovata chloroplast genome includes 128 unique genes encoding 80 proteins, 8 rRNAs, 38 tRNAs and two 2 pseudogenes (ycf1 in the IRA region and rpl22 in the IRB region). rps19 and cemA begin with GTG, and ndhD begin with ACG; these changes are most likely made at the RNA stage. Similar exceptions have been found in the chloroplast genomes of other plants, such as Betula platyphylla (Wang et al., 2018), Panax ginseng (Zhao et al., 2015 and Phoenix dactylifera L. (Yang et al., 2010). The obtained chloroplast genome of K. obovata has a typical quadripartite structure, and its gene content, gene order and GC content are similar to those of most other species from different orders. Although IRs are more conservative than the LSC and SSC regions in chloroplast DNA, the expansion and contraction of the border between SC and IR regions are common evolutionary events and produce size variation in chloroplast genomes (Raubeson et al., 2007;Wang et al., 2008). Relative to the IR regions of other mangrove species, those of K. obovata contain an additional gene (rps19). Furthermore, K. obovata shows a lower GC content than the other mangrove species L. littorea, S. alba and B. racemosa of Myrtales. Simple sequence repeats (SSRs) are significant repetitive elements of the entire genome and play important roles in genome recombination and rearrangement. The SSRs in chloroplast genomes are usually distributed in intergenic regions (Zhao et al., 2015). In the SSR analysis, 108 SSR loci were found, and most SSRs were located in the intergenic region. The SSRs identified in the chloroplast genome of K. obovata can be used to analyze polymorphisms at the intraspecific level. They can also be used to develop lineage-specific markers for future evolutionary and genetic diversity studies. The mVISTA results showed that the sequence of the chloroplast genome is highly conserved among the six species. In addition, they showed that the sequence and content of IR regions are more conserved than are those of the LSC and SSC regions among the studied species, possibly because of the rRNA in IR regions. The results showed that the genes of the chloroplast genome are largely identical between K. obovata and C. tagal, whereas the intergenic regions are more variable. Thus, we propose that the intergenic regions could potentially be used as molecular markers (such as trnL-trnF, petA-psbJ and ndhC-trnV ) for evolutionary and genetic diversity studies of these two species and other species of Rhizophoraceae.
In this study, the phylogenetic position of K. obovata in the Malpighiales was inferred by analyzing the complete chloroplast genome and 54 genes shared among 22 species. The results suggest that K. obovata is most closely related to C. tagal of Rhizophoraceae and E. novogranatense of Erythroxylaceae despite their contrasting living habitats. The bootstrap values were high for species reconstructed in the same family but were low among different families, suggesting that these protein-coding genes may be conserved within families but vary extensively among families. In addition, the mVISTA results showed that E. novogranatense is more closely related to the species in Rhizophoraceae than to other mangrove species despite having very different living conditions. Compared with the land species, the mangrove species have evolved features such as salt glands and pneumathodium as adaptations to their high salt, anoxic environments. Although K. obovata is a typical mangrove plant occurring in the coastal intertidal zone, the characteristics of this plant are quite similar to those of land species of angiosperm (Asaf et al., 2017;Bruneau, Doyle & Palmer, 1990;Sugiura, 1992).
We found no character in the chloroplast genome that distinguished the land species from the mangrove species, even though the species in Rhizophoraceae are generally considered mangrove species living in intertidal environments. Therefore, we infer that genetic variation within the chloroplast genome did not contribute to the adaptation of different genera to divergent habitats. Rhizophoraceae includes some land species, such as those belonging to the genera Carallia Roxb. and Pellacalyx Korth. The extent of genetic variation of plastomes among mangrove and non-mangrove species does not correspond to habitat divergence among these taxa. Further studies of this topic are warranted. Our study on the K. obovata chloroplast genome provides information on mangrove plant species in coastal intertidal zones. Moreover, the chloroplast genomic data provided in this study will be valuable for future phylogenetic studies and other studies of mangrove species.

CONCLUSION
We successfully assembled, annotated and analyzed the complete chloroplast sequence of K. obovata, a mangrove species. The chloroplast genome was found to be conserved among several species, with that of K. obovata being very similar to both other mangrove species and land species. We identified 108 SSR loci in the chloroplast, which can be used for the development of lineage-specific markers The LSC/IRB/SSC/IRA boundary regions of the chloroplast genome were compared among four mangrove species, and the results revealed that the K. obovata chloroplast genome has lost the ndhF gene. The phylogenetic analyses showed that K. obovata is most closely related to C. tagal among the studied taxa. The molecular data in this study represent a valuable resource for the study of evolution in mangrove species.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This study was supported by grants from Hainan Natural Science Foundation (Grant No. 318MS176) and the National Natural Science Foundation of China (Grant No. 41776148 and 31760119). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.