The complete chloroplast genome of Fagus crenata (subgenus Fagus) and comparison with F. engleriana (subgenus Engleriana)

This study reports the whole chloroplast genome of Fagus crenata (subgenus Fagus), a foundation tree species of Japanese temperate forests. The genome has a total of 158,227 bp containing 111 genes, including 76 protein-coding genes, 31 tRNA genes and 4 ribosomal RNA genes. Comparison with the only other published Fagus chloroplast genome, F. engeleriana (subgenus Engleriana) shows that the genomes are relatively conserved with no inversions or rearrangements observed while the proportion of nucleotide sites differing between the two species was equal to 0.0018. The six most variable regions were, in increasing order of variability, psbK-psbI, trnG-psbfM, rpl32, trnV, ndhI-ndh and ndhD-psaC. These highly variable chloroplast regions in addition to 160 chloroplast microsatellites identified (of which 46 were variable between the two species) will provide useful genetic resources for studies of the inter- and intra-specific genetic structure and diversity of this important northern hemisphere tree genus.


INTRODUCTION
The genus Fagus is a major tree of temperate forests of the northern hemisphere with two informal subgenera recognized (Shen, 1992): Engleriana with three species and Fagus with seven species (Oh, 2015;Renner et al., 2016). The genus has been the focus of intensive genetic studies over the last 30 years enabling insights into relationships of the extant species (Denk, Grimm & Hemleben, 2005), the impact of the interglacial-glacial cycles on extant genetic diversity (Fujii et al., 2002;Magri et al., 2006) and predictions of the impacts of ongoing climate change (Csilléry et al., 2014). However, despite the significance of the genus there remains a dearth of Next Generation Sequencing based-genetic resources for Fagus, including for the chloroplast genome, with the whole chloroplast genome of only a single species, the Chinese endemic F. engleriana of subgenus Engleriana (Yang et al., 2018), so far published.
This study reports the whole chloroplast genome of the Japanese endemic Fagus crenata, the first reported of subgenus Fagus. This species is a foundation tree of Japan's cool temperate forest ecosystem and is distributed widely from the mountains of southern Kyushu (31.4 N 130.8 E) to southern Hokkaido (42.8 N 140.2 E). Phylogeographic studies based on Sanger sequencing of small portions of the chloroplast genome have revealed strong geographic structuring of chloroplast haplotypes (Fujii et al., 2002;Okaura & Harada, 2002) that, combined with fossil pollen data (Tsukada, 1982), suggests the species persisted in multiple coastal refugia and has occupied most of its current wide geographic range in the postglacial. Here we report the whole chloroplast genome sequence of F. crenata and compare it to the genome of F. engleriana (subgenus Engleriana). This data will be a useful genetic resource for investigating the phylogenetic relationship of Fagus and for developing chloroplast-based genetic markers, including both single nucleotide polymorphism-and microsatellite-based markers.

Next Generation Sequencing and chloroplast genome assembly
Whole genomic DNA was extracted from a single sample of F. crenata collected from Daisengen Peak, Hokkaido, Japan (41.616 N-140.1333 E) representing the F. crenata chloroplast haplotype A (following Fujii et al., 2002) using a modified CTAB protocol (Doyle, 1990). DNA concentration and quality were assessed by agarose gel electrophoresis and a Qubit 2.0 fluorometer (Life Technologies). A total of 9 mg of DNA was sent to the Beijing Genomic Institute where short-size Truseq DNA libraries were constructed and paired-end sequencing (2 Â 100 bp) was performed on an Illumina HiSeq2000 Genome Analyser resulting in a total of 7,223,910 reads (the raw sequence reads are deposited in NCBI BioProject Database Accession number: PRJNA528838).
Assembly of chloroplast DNA from the whole genomic sequencing data was undertaken in Novoplasty 2.6.3 (Dierckxsens, Mardulyn & Smits, 2016), a seed-and-extend algorithm that is designed for the specific purpose of assembling chloroplast genomes from whole genome sequencing data, starting from a chloroplast seed sequence (trnK-matK of haplotype A: Genbank accession AB046492). This resulted in nine chloroplast contigs varying in length from 2,748 to 43,982 bp constructed from 230,360 chloroplast reads (3.19% of the total reads) with an average read coverage of the chloroplast genome of 145. The nine contigs were ordered and oriented using the F. engleriana whole chloroplast genome (KX852398) as a reference and the complete chloroplast sequence of F. crenata was constructed by connecting overlapping terminal sequences. Sanger sequencing was undertaken to check the accuracy of assembly of the joins of the nine contigs and the inverted repeat and single copy regions and also the sequences of the most diverged sites between F. crenata and F. engleriana (see "Results and Discussion"). A total of 8,146 bp was sequenced using 15 primer pairs and no differences were observed with the F. crenata genome apart from those due to inaccurate sequence at the terminal ends of the Sanger sequences.

Chloroplast genome annotation
The annotation of the chloroplast genome was performed using the online program Dual Organellar Genome Annotator (Wyman, Jansen & Boore, 2004). Initial annotation, putative starts, stops and intron positions were determined according to comparisons with homologous genes of F. engleriana chloroplast genome using Geneious v9.0.5 (Biomatters, Auckland, New Zealand). A circular gene map was drawn by the OrganellaGenomeDRAW tool (OGDRAW) followed by manual modification (Lohse, Drechsel & Bock, 2007).

Phylogenetic analysis and assessment of divergent regions
A multiple sequence alignment of F. crenata, F. engleriana, representative whole chloroplast genomes of the Fagaceae family and outgroups from Betulaceae, Juglandaceae and Myricaceae obtained from Genbank was constructed using T-Coffee using default parameters (Notredame, Higgins & Heringa, 2000). Subsequently, Gblocks v0.91b (Castresana, 2000) was used to identify homologous blocks of DNA and remove poorly aligned and divergent regions of the chloroplast genomes. RAxML NG (Kozlov et al., 2018) was then used to construct a maximum likelihood phylogenetic tree using the most appropriate DNA substitution model, TVM+I+G, as estimated in jModelTest 2.1.10 (Darriba et al., 2012) and 1,000 bootstrap replicates. Pairwise nucleotide differences (p-distance) between the sequences of the Gblocks alignment were calculated in Mega 7 (Kumar, Stecher & Tamura, 2016) excluding parts of the sequence alignment with gaps. The coding genes, non-coding regions and intron regions were compared between the alignment of the two Fagus chloroplast genomes to detect divergence hotspots. We examined 101 regions (39 coding genes, 52 intergenic spacers and 10 intron regions) of the two Fagus species for nucleotide variability (Pi) values calculated in DnaSP v5.0 (Librado & Rozas, 2009).

Identification of chloroplast microsatellites
Chloroplast microsatellite regions shared in both F. crenata and F. engleriana were searched for in an alignment of the two full chloroplast genomes (constructed by MAFFT v7.308 (Katoh et al., 2002) under default settings) using Phobos Tandem Repeat Finder (Mayer, 2008) implemented in Geneious v9.0.5. Microsatellite in either of the sequences with a repeat unit length of 1-2 bp were searched for using a minimum length of 10 bp while those with a repeat length of 3-6 bp were selected if they displayed three or more repeats.

RESULTS AND DISCUSSION
The assembled whole chloroplast genome of F. crenata has a total of 158,227 bp ( Fig. 1 (Fig. 2). Fagus crenata and F. engleriana formed a strongly diverged clade consistent with previous evidence of the large divergence of Fagus from all other Fagaceae genera (Heenan & Smissen, 2013). The proportion of nucleotide sites that differed (p-distance) between F. crenata and F. engleriana was 0.0018 which was lower than any other pairwise differences observed including between five Quercus species which  The two Fagus chloroplast genomes were relatively conserved (Fig. 3) with the IR region more conserved than both the large single copy (LSC) and small single copy (SSC) regions. We did not detect either inversions or translocations among the two genome sequences, and no rearrangement occurred in gene organization after verification (Fig. 4). There was high variation in nucleotide diversity values observed between the 101 regions of the two Fagus species with values ranging from 0.0003 (ycf2 gene) to 0.0781 (ndhD-psaC) (Fig. 5). The six most variable regions were, in increasing order of variability, psbK-psbI, trnG-psbfM, rpl32, trnV, ndhI-ndh and ndhD-psaC of which four are located in the LSC region and two in the SSC region (Fig. 5). The nucleotide diversities of these variable regions between F. crenata and F. engleriana were higher than observed within some other studies of Fagaceae genera including East Asian (Yan et al., 2018) and Mediterranean oaks (Vitelli et al., 2017). A total of 160 chloroplast microsatellites with a repeat unit length between 1 and 6 bp were identified based on the selection criteria in the two species of which mono-and tri-nucleotide repeat microsatellites were the most abundant with a frequency of 38.7% and 43.1%, respectively. This abundance of mono-and tri-repeats in the chloroplast is similar to a range of other angiosperms (Melotto-Passarin et al., 2011). Of these microsatellites, 46 displayed size variation between F. crenata and F. engleriana (see DatasetS4 for a table with details of all 46 variable chloroplast microsatellites). The majority (66.1%) of the variable chloroplast microsatellites were mono-nucleotide repeats while 20% of di-nucleotide repeats and both of the two hexa-nucleotide repeats were variable. On the other hand, zero of the tri-, tetra-and penta-nucleotide repeats showed size variation between the two species (Fig. 6). The length of variable versus non-variable chloroplast microsatellites was similar but with a greater length variation for variable microsatellites in both F. crenata and F. engleriana (Fig. 7). CONCLUSION Overall, the chloroplast genome of F. crenata will provide a useful genetic resource for future genetic studies into the foundation temperate tree genus Fagus. Specifically, the chloroplast genomes of both informal subgenera will provide useful references and sources of molecular markers to investigate phylogeographic patterns of the chloroplast within and between Fagus species. Some major questions are yet to be resolved in Fagus, including resolving taxonomic boundaries of western Eurasian Fagus populations which has remained a recalcitrant problem due to low marker resolution and high within-species genetic diversity (Denk et al., 2002) and the non-monophyly of the chloroplast of East Asian species as suggested by Sanger sequence-based data (Manos & Stanford, 2001;Okaura & Harada, 2002).