Complete plastome sequencing from Toona (Meliaceae) and phylogenomic analyses within Sapindales

Premise of the Study Toona (Meliaceae, Sapindales) is a small genus of five species of trees native from southern and eastern Asia to New Guinea and Australia. Complete plastomes were sequenced for three Toona species to provide a basis for future plastome genetic studies in threatened species of Toona. In addition, plastome structural evolution and phylogenetic relationships across Sapindales were explored with a larger data set of 29 Sapindales plastomes (including members of six out of nine families). Methods The plastomes were determined using the Illumina sequencing platform; the phylogenetic analyses were conducted using maximum likelihood by RAxML. Results The lengths of three Toona plastomes range from 159,185 to 158,196 bp. A total of 113 unique genes were found in each plastome. Across Sapindales, plastome gene structure and content were largely conserved, with the exception of the contraction of the inverted repeat region to exclude ycf1 in some species of Rutaceae and Sapindaceae, and the movement of trnI‐GAU and trnA‐UGC to a position outside the inverted repeat region in some Rutaceae species. Discussion The three Toona plastomes possess the typical structure of angiosperm plastomes. Phylogenomic analysis of Sapindales recovered a mostly strongly supported phylogeny of Sapindales, including most of the backbone relationships, with some improvements compared to previous targeted‐gene analyses.

are generally conserved in structure, gene content, and gene order (Green, 2011;Ruhlman and Jansen, 2014), although rearrangements and gene loss have been detected in a number of lineages and most differences in plastome gene number are related to fluctuations in the size of the inverted repeat (IR) region (e.g., Guisinger et al., 2011;Knox, 2014;Zhu et al., 2016). To date, complete plastomes of 26 species across six families are available for Sapindales, including one Meliaceae species (Azadirachta indica A. Juss., Melioideae). Although McPherson et al. (2013) sequenced the T. ciliata plastome for phylogeographical study of this species in Australia, the plastome structure of this species was not reported, and the assembled plastome sequences of this species are not openly available. Additional sequenced plastomes from Meliaceae as well as across Sapindales may help to improve our understanding of phylogenetic relationships within the order and would provide insight into plastome evolution in this clade. In this study, we sequenced and characterized the complete plastomes of three Toona species and downloaded all 26 available Sapindales plastomes from GenBank, with the following objectives: (1) to provide a basis for future plastome genetic studies in threatened species of Toona, (2) to determine whether plastomes can resolve phylogenetic relationships among families of Sapindales,and (3) to evaluate plastome structure evolution across Sapindales.

METHODS
Fresh leaves of T. sinensis, T. sureni, and T. ciliata were obtained from Wuhan Botanical Garden (30.54°N,110.42°E),Lushan Botanical Garden (29.55°N,115.99°E), and the National Nature Reserve of 109.92°E), respectively. Vouchers were deposited at the Herbarium of Wuhan Botanical Garden, Chinese Academy of Sciences (HIB) ( Table 1). High-quality plastid DNA was obtained following the plastid DNA extraction method of Shi et al. (2012). Approximately 30 g of fresh, young leaf tissue was used for each species, and for each plastome a DNA TruSeq Illumina (Illumina Inc., San Diego, California, USA) sequencing library, with 500-bp insert sizes, was constructed at the Beijing Genomics Institute (BGI) in Wuhan, Hubei, China, using 2.5-5 ng of sonicated plastid DNA. An Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, California, USA) and quantitative PCR were used to quantify DNA amounts in the libraries. Libraries were multiplexed by TruSeq adapter and 150-bp paired-end sequenced on an Illumina HiSeq 2000 platform at BGI (Wuhan, Hubei, China). The raw data are available from the National Center for Biotechnology Information Sequence Read Archive (accession no. SRR6146642, SRR6146640, and SRR6146641).
The raw reads were subsequently filtered for high-quality reads following the method described by Sun et al. (2016). Filtered reads were assembled into contigs with a minimum length of 1000 bp using CLC Genomics Workbench 9 (Girard et al., 2011) with default parameters, except that the k-mer value was set to 60 for T. sinensis and T. sureni, and 64 for T. ciliata, to produce the highest N50 value. The assembly statistics are presented in Appendix 1. After trimming, the contigs were ordered according to the reference genome Azadirachta indica A. Juss. (NC_023792). Plastid genomes were annotated with DOGMA (Wyman et al., 2004), and gene start and stop codons were determined through comparison to start and stop codons in the homologous genes of A. indica. Annotation of tRNA genes was conducted using tRNAscan-SE (Schattner et al., 2005). Junctions between large single-copy regions (LSCs) and IRs and small singlecopy regions (SSCs) and IRs of the three plastomes were verified with PCR and Sanger sequencing. Physical maps of plastomes were generated using GenomeVx (Conant and Wolfe, 2008).
In total, 79 protein-coding regions and the ycf15 region were identified from the plastomes of three Toona species and 26 other species of Sapindales, with two taxa of Malvales (Cytinus hypocistis (L.) L. and Hibiscus syriacus L.) as outgroups (Table 1). These sequences were then manually compiled into a single file of the 31-taxon data set and aligned with MAFFT (Katoh et al., 2002) for phylogenetic analyses. GenBank information for all plastomes used for phylogenetic analyses are provided in Table 1. In order to further investigate the phylogenetic relationships within Sapindales, maximum likelihood (ML) analyses were conducted using RAxML version 7.4.2 (Stamatakis et al., 2008) under the general time-reversible (GTR) substitution model. We conducted both unpartitioned and partitioned analyses. PartitionFinder version 1.1.1 (Lanfear et al., 2012) was employed to determine the best-fit partition scheme for partitioned ML analysis. Bootstrap support was estimated with 1000 bootstrap replicates. In order to be convenient for subsequent population genetic study within Toona, simple sequence repeats (SSRs) were detected using MISA (Thiel et al., 2003) with thresholds of 10 repeat units for mononucleotide SSRs, five repeat units for di-and trinucleotide SSRs, and three repeat units for tetra-, penta-, and hexanucleotide SSRs. Additionally, repeat sequences were identified for each plastome using REPuter (Kurtz et al., 2001) with a minimum repeat size of 30 bp. Single-nucleotide polymorphisms (SNPs) and insertion/ deletion polymorphisms (indels) were also identified among three Toona plastomes with Geneious 7.0 (Kearse et al., 2012).

RESULTS
Within Toona, the plastome size of T. sureni was 159,371 bp, and those of T. sinensis and T. ciliata were 186 bp and 385 bp longer, respectively (Table 2). These three plastomes possess the typical quadripartite structure of angiosperm plastomes, comprising an LSC, an SSC, and two IR regions (Fig. 2). A total of 113 unique genes, including 30 tRNA genes, four rRNA genes, and 79 proteincoding genes were found in each plastome. Nineteen genes were duplicated in the IR regions (Table 3). Additionally, 14 genes were found to possess one intron, and three genes (rps12, clpP, ycf3) were found to possess two introns (Appendix 2).
A total of 193 SSRs were identified in the three plastomes of Toona. Among these, 70 were distributed in T. sureni, 57 in T. sinensis, and 66 in T. ciliata (Appendix 3). The majority of SSRs were A/T mononucleotides, a total of 14 AT dinucleotide repeats were found in the three plastomes, and one TA dinucleotide repeat was detected in T. sinensis, whereas the only AG dinucleotide repeat from T. sureni was located in the rpoB-trnC-GCA intergenic region. The other kinds of repeat units (e.g., six dinucleotide; four trinucleotide; three tetra-, penta-, and hexanucleotide) were not found in the three plastomes of Toona. Most SSRs were located in intergenic regions (72.5%), with few in introns (12.5%) and genes (15%). Overall, nine SSRs were shared by all three Toona species, including four in intergenic regions (trnE-UCC/trnT-GGU, trnT-GGU/psbD, ccsA/ndhD, and ycf15/rps12), three in exons (rpoC2, rpoB, and psbF), and two in introns (trnL-UAA and ndhB). In total, 23 repeats were detected in three Toona plastomes. A majority of the repeats (69.56%) were 30 to 40 bp in length, and 17.40% of the repeats were longer than 50 bp. Four repeats were shared by three Toona plastomes (Appendix S2). Additionally, we detected 466 SNPs (0.4%) and 90 indels among three plastomes, and we screened out four noncoding regions (psbZ-trnG, psbA-trnK, trnF-ndhJ, trnK-rps16) with potential to be loci for identification of Toona species (Appendix S3).

DISCUSSION
In most angiosperm plastomes, the IR/LSC boundary lies within the rps19 gene and the SSC/IR boundary lies within the ycf1 gene (Kumar et al., 2009). Among the 29 Sapindales plastomes, the LSC/ IR B boundary of the majority lies within the rps19 gene, while nine of these 29 plastomes have experienced an IR region expansion. Obvious IR region expansion to the LSC region has been detected in many other taxa, e.g., in Pelargonium L'Hér. (Chumley et al., 2006), Tetracentron Oliv. (Sun et al., 2013), and Veronica nakaiana Ohwi (Choi et al., 2016). In contrast, within Sapindales, there have been at least eight cases where the SSC/IR A boundary has contracted to exclude all of ycf1 (Fig. 4). IR region contraction has been found to occur in several ways, ranging from complete IR loss (e.g., Geraniaceae [Blazier et al., 2011], Cephalotaxus oliveri Mast. [Yi et al., 2013], and Agathis dammara (Lamb.) Rich. & A. Rich. [Wu and Chaw, 2014]), to the loss of tRNA genes within the IR region (e.g., Epifagus virginiana (L.) W. P. C. Barton [Morden et al., 1991] and Bergera koenigii L. [Shivakumar et al., 2016]), to the rpl22 loss in rosids , and to contraction at the IR/SSC boundaries reported in a number of early-diverging angiosperms (e.g., Buxus L., Epimedium L., and Macadamia F. Muell.) (Hansen et al., 2007). Notably, in Rutaceae, all Clauseneae genera are characterized by the absence of trnI-GAU and trnA-UGC in the IR region. Tsuji et al. (2007) indicated that the tRNA loss may be caused by the RNA editing during the tRNA mutation. Pseudogenization of the infA gene has been detected in a number of angiosperm plastomes such as tobacco (Shinozaki et al., 1986), Arabidopsis Heynh. (Sato et al., 1999), and Oenothera elata Kunth (Hupfer et al., 2000), whereas among 29 Sapindales plastomes this was only detected in four plastomes (Boswellia sacra, Acer davidii, A. morrisonense, and A. miaotaiense) of Sapindaceae (Blazier et al., 2016). In some cases, the effect of plastid-to-nucleus gene transfer has been demonstrated to generate the pseudogenization of this gene (Millen et al., 2001). As has been found in many other studies involving plastome-scale phylogenetic analysis (Parks et al., 2009), we recovered improved  rps2, rps3, rps4, rps7 (×2), rps8, rps11, rps12* (×2), rps14, rps15, rps16, rps18, rps19 Ribosomal protein large subunit rpl2* (×2), rpl14, rpl16, rpl20, rpl22, rpl23 (×2), rpl32, rpl33, rpl36 RNA polymerase rpoA, rpoB, rpoC1*, rpoC2 Photosynthesis Photosystem I psaA, psaB, psaC, psaI, psaJ Photosystem II psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT  phylogenetic support along the backbone of Sapindales compared to previous targeted gene analyses. We recovered Meliaceae as sister to the clade formed by Simaroubaceae (only one species included) + Rutaceae with maximal support, differing from the topology recovered by Muellner et al. (2007) and Muellner-Riehl et al. (2016), where a moderately supported clade of Meliaceae + Simaroubaceae was sister to Rutaceae. Our result is consistent with the earlier work of Gadek et al. (1996) based on trnL-F sequences, although they recovered only weak support. Unfortunately, the problem of the previously unsupported relationship of Sapindaceae with other Sapindales (Muellner-Riehl et al., 2016) could also not be resolved by our plastome data analysis. It is important to emphasize caution for these results, however. Additional taxon sampling for complete plastomes, including additional lineages of already-sampled families as well as the inclusion of the early-diverging Sapindales families Biebersteiniaceae, Kirkiaceae, and Nitrariaceae may affect topology and support. Likewise, the plastome itself can be treated as a single locus for the purpose of phylogenetics, and genomic-scale nuclear data may provide different estimates of phylogeny, especially for short branches. Within Rutaceae, our results are highly congruent with those of the previous study (Shivakumar et al., 2016), which also found a clade of Citrus + Merrillia sister to a clade composed of (Micromelum + Glycosmis) + (Murraya + Clausena), although in the latter clade the bootstrap support was low. In our tree, all of the taxa sampled in Shivakumar et al. (2016) formed a clade, which is sister to Zanthoxylum. Our analysis suggests that tribe Clauseneae sensu Swingle and Reece (1967;Micromelum Blume, Glycosmis Corrêa, Clausena Burm. f., Murraya J. Koenig, and Merrillia Swingle) is not monophyletic because Merrillia is sister to Citrus L. of the tribe Citreae. The genera of Clauseneae are characterized by the absence of two tRNA genes (trnI-GAU and trnA-UGC), while this is not found in the genus Citrus (Fig. 4). Additionally, four genera (Micromelum + Glycosmis + Murraya + Clausena) in Rutaceae and two species (Acer davidii + Acer morrisonense) in Sapindales, characterized by the absence of ycf1 in the SSC region, each formed a clade in our phylogenetic tree (Fig. 4). This gene loss shared by multiple taxa shows a particularly strong case of homoplasy in the phylogeny. Within Sapindaceae, Sapindus L. is sister to a clade containing Dipteronia Oliv. and Acer L. Although the support value is weak (57%), the two species of Dipteronia do not form a clade, instead forming a grade with respect to Acer.
The plastome structure and gene content of Toona reported in the present study enrich the available plastome resources within Sapindales, the comparative analyses among 29 plastomes provide insight into the plastome evolution of Sapindales, and the phylogenomic analyses of Sapindales improve our understanding of phylogenetic relationships within this order. In addition, the SSRs detected in three Toona species could provide a basis for future plastome genetic studies in Toona, especially in the threatened species.