Complete chloroplast genomes of 14 mangroves: phylogenetic and genomic comparative analyses

Mangroves are main components of an ecosystem which connect land and ocean and is of significant ecological importance. They are found around the world and taxonomically distributed in 17 families. Until now there has been no evolutionary phylogenetic analyses on mangroves based on complete plastome sequences. In order to infer the relationship between mangroves and terrestrial plants at the molecular level, we generated chloroplast genomes of 14 mangrove species from eight families, spanning six orders: Fabales (Pongamia pinnata), Lamiales (Avicennia marina), Malpighiales (Excoecaria agallocha, Bruguiera sexangula, Kandelia obovata, Rhizophora stylosa, Ceriops tagal), Malvales (Hibiscus tiliaceus, Heritiera littoralis, Thespesia populnea), Myrtales (Laguncularia racemose, Sonneratia ovata, Pemphis acidula), and Sapindales (Xylocarpus moluccensis). The whole-genome length of these chloroplasts is from 149kb to 168kb. They have a conserved structure, with two Inverted Repeat (IRa and IRb, ~25.8kb), a large single-copy region (LSC, ~89.0kb), a short single-copy (SSC, ~18.9kb) region, as well as ~130 genes (85 protein-coding, 37 tRNA, and 8 rRNA). The number of simple sequence repeats (SSRs) varied between mangrove species. Phylogenetic analysis using complete chloroplast genomes of 71 mangrove and land plants, confirmed the previously reported phylogeny within rosids, including the positioning of obscure families such as Linaceae within Malpighiales. Most mangrove chloroplast genes are conserved and we found six genes subjected to positive or neutral selection. Genomic comparison showed IR regions have lower divergence than other regions. Our study firstly reported several plastid genetic resource for mangroves, and the determined evolutionary locations as well as comparative analyses of these species provid insights into the mangrove genetic and phylogenetic research.

associates from five families [4,5]. Due to their wide distribution, the phylogenetic position of each species 46 would be of considerable interest. As one of the most important organelles in plants, chloroplast has an 47 independent genome with fixed sequence structure and a relatively conserved number of expressed genes 48 associated with energy production and metabolism. In previous reports [6][7][8], chloroplast genes such as rbcL 49 and psbA were used to infer evolutionary origins and relationship of mangroves species from different clades or 50 geographical regions. DNA barcodes of rbcL, matK, trnH-psbA genes have also been used to identify unknown 51 mangrove species [5]. To date, however, no study has examined complete chloroplast genome sequences of 52 multiple mangroves.

53
In this study, we sequenced and assembled the complete chloroplast genomes of 14 mangrove plants 54 ( Figure S1). The expected 70 four-segment structure of a plant chloroplast genome was identified in all assemblies: two inverted repeat 71 regions (IRa and IRb), a short single copy (SSC) region, and a long single copy (LSC) region. The length of the 72 four regions were similar within, but slightly different between orders. The average length of IRs of the 73 mangrove orders Fabales, Lamiales, Malpighiales, Malvales, Myrtales, and Sapindales were approximately 74 23.6kb, 25.6kb, 26.3kb, 26.0kb, 25.3kb, and 27.0kb, respectively. The size of the SSC of the fourteen species 75 ranged from 17.9kb to 20.0kb, while the LSC ranged from 83kb to 91kb (Table 1). The GC content was between 76 35% to 39%. In IR regions the GC content (~43%) was higher than in SSC (30%) and LSC (34%) regions 77 (Table S1).

103
Phylogenetic relationships of mangroves 104 Similar to mitochondrial genomes used in vertebrate genetics, chloroplast genomes are highly useful in 105 resolving phylogenetic and evolutionary questions [11]. In order to unfold the phylogenetic relationships of 106 these mangroves, a total of 57 reported chloroplast genome data which represent 57 distinct terrestrial plant 107 families in 17 orders, together with our 14 mangrove species, were used to construct phylogenetic trees (Table   108 S2, Figure 1). Based on the whole chloroplast genome-wide sequences, we identified 41 highly conserved genes 109 (see method) in the 71 species and constructed two Bayesian inference (BI) trees and three Maximum likelihood 110 (ML) trees using different data sets and models ( Figure 1, Figure S2-S5). We compared the five evolutionary 111 trees in two ways and found that the trees constructed using whole gene sets have relatively higher posterior 112 probability or bootstrap, and the position of species also tends to be consistent with low log-likelihood 113 difference value between them (Tables S3). It is additionally found that the tree constructed with Bayesian 114 algorithm with 41 gene sequences showed the highest reliability (Table S3) and most branches have posterior 115 probabilities more than 0.9 ( Figure 1). These results suggest high confidence of the BI phylogenetic tree in this 116 study.

117
From our phylogenetic tree, at the scale of dicots, it is clear that species in same order are in a group.

184
However, the experimental research on the specific function of rps7 is limited at present. We speculate that 185 relaxed neutral selection (Ka/Ks value around 1.0) and potential positive selection (Ka/Ks value greater than 1.0) 186 of rpl23, rpl36, and rps7 in various species here reflects the highly diverse environments and adaptions of plants 187 (Table S4). For gene ycf1, it is one of the largest genes in chloroplast genome and usually has two functional 188 copies [22,25], or one of the two copies was evolved to be a pseudogene in some plants [19,[31][32][33]   about five hundred million pair-end sequencing data was randomly used from each species for further assembly.

206
These data were processed by MITObim v1.9 [34] as initial assembly using closest reference-referred assembly 207 strategy. Total length of each chloroplast were estimated by SPAdes v3.13.0 [35]. Then NOVOPlasty v2.7.2 [36] 208 was performed to assemble a complete genome based on the initial assembly information and genome size 209 information. Finally, manual orientation and connection was further carried out to acquire circular sequences.

210
The data that support the findings of this study have been deposited in the CNSA (https://db.cngb.org/cnsa/) of 211 CNGBdb with accession number CNP0000567.