Characterization and comparative analysis of the complete chloroplast genome sequence from Prunus avium ‘Summit’

Background Sweet cherry (Prunus avium) is one of the most popular of the temperate fruits. Previous studies have demonstrated that there were several haplotypes in the chloroplast genome of sweet cherry cultivars. However, none of chloroplast genome of a sweet cherry cultivar were yet released, and the phylogenetic relationships among Prunus based on chloroplast genome data were unclear. Methods In this study, we assembled and annotated the complete chloroplast genome of a sweet cherry cultivar P. avium ‘Summit’ from high-throughput sequencing data. Gene Ontology (GO) terms were assigned to classify the function of the annotated genes. Maximum likelihood (ML) trees were constructed to reveal the phylogenetic relationships within Prunus species, using LSC (large single-copy) regions, SSC (small single-copy) regions, IR (inverted repeats) regions, CDS (coding sequences), intergenic regions, and whole cp genome datasets, respectively. Results The complete plastid genome was 157, 886 bp in length with a typical quadripartite structure of LSC (85,990 bp) and SSC (19,080 bp) regions, separated by a pair of IR regions (26,408 bp). It contained 131 genes, including 86 protein-coding genes, 37 transfer RNA genes and 8 ribosomal RNA genes. A total of 77 genes were assigned to three major GO categories, including molecular function, cellular component and biological process categories. Comparison with other Prunus species showed that P. avium ‘Summit’ was quite conserved in gene content and structure. The non-coding regions, ndhc-trnV, rps12-trnV and rpl32-trnL were the most variable sequences between wild Mazzard cherry and ‘Summit’ cherry. A total of 73 simple sequence repeats (SSRs) were identified in ‘Summit’ cherry and most of them were mononucleotide repeats. ML phylogenetic tree within Prunus species revealed four clades: Amygdalus, Cerasus, Padus, and Prunus. The SSC and IR trees were incongruent with results using other cp data partitions. These data provide valuable genetic resources for future research on sweet cherry and Prunus species.


INTRODUCTION
is generally situated in the cytoplasmic matrix and plays an important role in photosynthesis and fatty acid, starch, and amino acid synthesis (Wicke et al., 2011). The cp genome size ranges from 100 kb to 200 kb (Daniell et al., 2016). It is a typical quadripartite structure that consists of large single copy (LSC) region, small single copy (SSC) region, and two inverted repeats (IR) regions; It is well known that the cp genome is usually highly conserved in gene structure and content. Several mutations and small structural changes, such as insertions, deletions, reversals, and translocations have been identified in cp genomes. Therefore, the mutational changes in cp genome sequences provide valuable information for phylogenetic, genetic diversity analysis and molecular marker development.
Prunus L., a large and diverse genus, comprises more than 400 species, including most of economically important fruit crops as well as many ornamental species. Due to the parallel evolution of morphological traits, and interspecific hybridization, the botanical classification of the Prunus L. has long been controversial and complicated. As early as in 1,700, six subgenera within Prunus were recognized based on fruit morphology: Amygdalus L., Armeniaca Mill., Cerasus Mill., Laurocerasus Duhamel, Persica Mill., and Prunus sensu stricto (Bouhadida et al., 2007). Afterwards, different opinions, such as a single genus Prunus subdivided into seven sections by Bentham and Hooker in 1865, four subgenera within Prunus by Koehne in 1911, were also put forward. Currently, the most widely accepted classification of Prunus was defined by Rehder in 1940, in which five subgenera Amygdalus, Cerasus, Laurocerasus, Padus, and Prunus (=Prundophora) were divided (Potter, 2011).
Sweet cherry (Prunus avium L.) is an important Prunus fruit in temperate and subtropical regions. Traditional intraspecific and interspecific hybridizations have been carried out in this species for genetic improvement by introducing additional desirable characters (Sansavini & Lugli, 2008;Potter, 2011;Carrasco et al., 2013), and all kinds of cultivars were released to meet the market needs of all over the world. Due to natural multiplication and artificial cultivation for so long time, genetic diversity among cultivars and/or populations were verified by researchers (Frascaria, Santi & Gouyon, 1993;Beaver, Iezzoni & Ramm, 1995;Lacis et al., 2009). Because of the conserved properties of chloroplast DNA (cpDNA), many researchers believed that the chances of detecting intraspecific cpDNA variations were low. However, several haplotypes in sweet cherry populations or cultivars were reported previously (Mohanty, Martín & Aguinagalde, 2001a;Mohanty, Martín & Aguinagalde, 2001b;Panda et al., 2003), which provided a great opportunity to study plastome sequence variation below species level. Recently, the complete chloroplast genome of wild Mazzard cherry (P. avium) has been deposited in GenBank (Chen et al., 2018a). However, none of chloroplast genome sequences of a sweet cherry cultivar have yet to be released, which was not conducive to cherry haplotypes variation studies.
In this study, we assembled and analyzed the chloroplast genome of a sweet cherry cultivar 'Summit' based on the next-generation sequencing method. Furthermore, we carried out comparative analysis with other Prunus species to obtain basic features of cp genomes in Prunus. Particularly, general cp genome features and sequence comparison between wild Mazzard cherry and 'Summit' were conducted. Phylogenetic trees were also constructed based on the LSC, SSC, IR, CDS (coding sequences), intergenic regions, and the whole chloroplast sequences to study the relationships in genus Prunus. The results might benefit the genetics and breeding of cultivated sweet cherries and related Prunus species.

Sampling and DNA extraction
The sample 'Summit' tree was grown in Baima Teaching and Research Base of Nanjing Forestry University, Jiangsu Province, China. The voucher specimen was deposited in Nanjing Forestry University Herbarium (NF0000016). Total genomic DNA was extracted from fresh leaves by a CTAB method (Li et al., 2013) with slight modifications. The concentration of DNA was checked by using a Nanodrop ND-2000 spectrometer (Nanodrop Technologies, Wilmington, DE, USA).

Sequencing, assembly, annotation, and Gene Ontology (GO) analysis
A shortgun DNA library was constructed and the subsequent high-throughput sequencing was carried out on the Illumina HiSeq 2500 Sequencing System (Illumina, CA, USA). Raw paired reads were retrieved, trimmed using Fastp 0.20.0 (Chen et al., 2018b) to obtain clean data. The de novo assembly of the complete cp genome was performed by NOVOPlasty v3.1 program (Dierckxsens, Mardulyn & Smits, 2017). The complete cp genome of Prunus persica (HQ336405) was selected as the reference, with rbcL as seeds sequence in the analysis. On-line program Geseq (Tillich et al., 2017) was used to annotate the cp genome, and the annotation results were inspected by Geneious 8.0.4 software (Kearse et al., 2012) while modified manually as needed. We deposited the sequence data into GenBank with the accession number MK622380. A physical map of the genome was obtained by using the online tool OGDRAW (Lohse, Drechsel & Bock, 2007). Gene Ontology (GO) annotation was performed by TBtools 1.6 (Chen et al., 2018a) to assign GO terms in our genome data.

Phylogenetic analysis
Besides P. avium 'Summit', an additional 19 Prunus species were chosen for phylogenetic analysis, using M. prunifolia (KU851961) as an outgroup. The complete cp genome sequences were downloaded from GenBank. Phylogenetic analysis was conducted using the whole genome data, as well as LSC, SSC, IR, CDS, and intergenic regions. The sequences of individual partition regions were aligned using MAFFT v7.308 (Kazutaka & Standley, 2013). A maximum likelihood (ML) tree was implemented in IQ-tree v1.6.8 (Nguyen et al., 2015) under the best-fitting model TVM + F + R2. We completed a bootstrap analysis with 1000 replicates. Phylogenetic trees were visualized using the FigTree v1.4.3 software.
When Gene Ontology (GO) was conducted, only 3 genes (rps19, pbf1, and lhbA) were unable to be annotated. According to the GO result, the most functional groups (15) were identified in psaA and psbA genes (Table S1). Predicted genes of cp genome were functionally classified according to the three main GO categories including 58 functional groups (Fig. 2, Table S2). Molecular functional categories were strongly represented by Genes in the biological process category were primarily sorted into the metabolic process and biosynthetic process.

Comparative analysis of the cp genomes of genus Prunus
The complete cp genome sequence of P. avium 'Summit' was compared to that of reported Prunus species. The results (Table 1) showed that sequenced plastid genomes were similar in terms of organization, gene content, gene order, and GC content. From the aspect of genome size, P. cerasoides had the smallest cp genome with the smallest LSC region (85,792 bp), while P. serotina had the largest cp genome size with the largest LSC, at 87,289 bp  1). We further calculated sequence similarity for six species of cpDNA using mVISTA by aligning the cp genomes with P. avium 'Summit' (Fig. 3). Sequence comparison results revealed that the LSC and the SSC regions were more divergent than the IR regions as expected. The highly divergent regions among the six chloroplast genomes mainly occured in the intergenic spacers like trnH -psbA, trnK-rps16, rps16-trnQ, trnS-trnG, trnR-atpA, atpH-atpI, rpoB-trnC, trnC-petN, petN-psbM, trnT-psbD, psbC-trnS, psbZ-trnG, ycf3-trnS, trnF-ndhJ, ndhC-trnV, psbE-petL, ndhF-rpl32, rpl32-trnL, and ndhG-ndhI. The sequence similarity between Mazzard cherry and 'Summit' cherry was relatively high, but several non-coding regions, such as ndhC-trnV, rps12-trnV and rpl32-trnL, exhibited divergence.

IR expansion and contraction
The IR-SSC and IR-LSC boundaries, together with the adjacent genes, among the cp genomes of five Prunus species and M. prunifolia were aligned. From Fig. 4, P. avium 'Summit' contained nearly the same IR/SC structure with other congeneric species in which IRb/SC boundaries lay respectively in coding regions of a rps19 and ndhF. For P. avium 'Summit', P. avium, P. tomentosa and P. padus, it was found to be 19 bp of ndhF extension into IRb while a shorter length of 10 bp extension into IRb in P. persica. Similarly, the IRb/ LSC junction was located in the complete rps19 region in all six species cp genomes Table 2 List of genes annotated in the cp genome of P.avium 'Summit' sequence.

Selfreplication
Large subunit of ribosome rpl2 a (2); rpl14; rpl16 ; rpl20; rpl22; rpl23 (2) and extended into the LSC region by different lengths depending on the species, P. avium was 93 bp extension into LSC region while 240 bp in P. padus. A truncated rps19 in IRa region was found, and only 1 bp away from the JLA junction in P. avium, P. tometosa, P. padus, and M. prunifolia, while 3 bp in P. avium 'Summit'. Also, the length of rps19 of P. padus in IRa region was only 39 bp, which was much shorter than that in other fiver species (180 bp, 186 bp, 183 bp, 187 bp, 120 bp, respectively).

SSR analysis
In our study, a total of 73 SSRs were identified in the cp genome of P. avium 'Summit', most of which were detected in the LSC region (Table 3). Among them, 54 (74.0%) were mononucleotide SSRs and most of them belonged to the A/T type, 13 (17.8%) were dinucleotide SSRs, five (6.8%) were tetra-nucleotide SSRs, one (1.4%) was a penta-nucleotide SSR, there were no tri-nucleotide and hexa-nucleotide SSRs. Only 24 SSRs were located in genes and the others were in the intergenic regions.

Phylogenetic analysis
Six datasets of 20 Prunus cp genome sequences were used to build the phylogenetic tree. When the six phylogenetic trees were compared with each other, we found that the topological structures based on LSC region, CDS region, intergenic region and whole cp genome datasets were similar (Fig. 5)

DISSCUSSION
The cp genome normally has a circular structure, and it is composed of a LSC region, a SSC region and two IR regions. From the results, the genome structure, gene order and GC content of P. avium 'Summit' were much similar to those reported Prunus cp genomes (Cho et al., 2016;Feng et al., 2018;Luan et al., 2018). Through comparative analysis of complete cp genome sequences, much genetic information could be discovered. Our results revealed that the sequence divergence of IR regions was lower than that in LSC and SSC regions, which was also reported in many land plants. In angiosperms cp genomes, the higher divergent intergenic regions, especially the rpl32-trnL region, has been used for phylogenetic and evolutionary studies even at the species level (Dong et al., 2012;Zecca et al., 2012;Jara-Arancio, Vidal & Arroyo, 2018). The highly divergent non-coding regions revealed by comparative analysis showed the potentiality for genetic analysis in Prunus genus. Raubeson et al. (2007) pointed out that contraction and expansion at the borders of IR regions were common evolutionary events, and might be the main reason for size diversity  Chang et al. (2006) demonstrated that the entire ycf1 gene in Phalaenopsis aphrodite was not across the IR/SSC boundary but within the SSC region. In addition, there were reports on the deletion of ycf1 gene in IRb/SSC border region in P. maximowiczii and Cerasus humilis (Mu et al., 2018). The function of ycf1 gene in the evolution of chloroplast genome requires further investigations. In this study, two rps19 genes in the IR/SC boundaries were found. In Dianthus, there was one copy of the rps19 gene at the IRb/SSC junction and the other truncated one at IRA/LSC junction a pseudogene (Raman & Park, 2015). Lu, Li & Qiu (2017) also reported that in three Cardiocrinum (Liliaceae) species, the rps19 gene located in the LSC/IRa boundary apparently lost its protein-coding ability due to partial gene duplication. In our study, pseudogene rps19 gene located in LSC/IRa boundaries remained to be further elucidated, especially in P. padus which a much shorter rps19 in the LSC/IRa boundary was found.
Further analysis of the cp genomes of wild Mazzard cherry and 'Summit' cherry revealed a relatively conserved structure, though there were some variations in both cp genomes. The contraction and expansion of IR regions resulted in minor variation of rps19 and ycf1 extension length in IR/SC boundaries. The sequence variations between Mazzard cherry and 'Summit' cherry were mostly restricted to the non-coding regions, such as ndhc-trnV, rps12-trnV and rpl32-trnL. Wang et al. (2018a) reported that the intraspecific variation among four peanut varieties cp genomes was also relatively limited. Owing to the conserved properties of cpDNA, cp genome sequence variation was scarcely used below species level. However, the variations in these non-coding regions provides potentials for developing molecular markers in cultivar identification, which has been reported in Fig (Baraket et al., 2008) and olive (Mariotti et al., 2010). Nuclear SSRs have been recognized as powerful and advantageous genetic markers due to its abundance in genomes, high degree of polymorphism, and co-dominance. A variety of SSR markers have been applied to the analysis of genetic variability, cultivar identification, parentage assessment, and quality control of rootstock in P. avium (Guarino et al., 2009;Lacis et al., 2009;Turkoglu et al., 2012;DeRogatis et al., 2013;Ivanovych & Volkov, 2017). Additionally, Molecular markers of cpDNA have been successfully used for assessment of genetic diversity in P. avium cultivars and populations. The haplotype diversity in sweet cherry populations or cultivars helped to understand the maternal inheritance of chloroplast genome in sweet cherry (Mohanty, Martín & Aguinagalde, 2001a;Mohanty, Martín & Aguinagalde, 2001b;Panda et al., 2003). Khadivi-Khub et al. (2014) revealed that intraspecific polymorphism was observed by cpSSR primers in P. avium and other related Prunus species. This intraspecific polymorphism revealed by cpSSR also had conformity with viewpoints of Powell et al. (1995) and Provan et al. (1997). More recently, chloroplast SSRs in P. salicina had shown to be highly useful markers for phylogenetic studies in Prunus genus (Ohta, Nishitani & Yamamoto, 2005). Furthermore, Turkec, Sayar & Heinze (2006) suggested that cpDNA analysis was a straightforward way to classify cherry cultivars. The cpSSR markers in this study may be further developed for candidate markers to detect genetic diversity among different cultivars and populations in sweet and sour cherry, which will help breeders select parental genotypes aiding to cherry breeding programmes. Many studies attempted to construct a phylogenetic framework of Prunus from different aspects but interspecies relationships within the genus still remained ambiguous. Therefore, the relationships within Prunus species need further investigation. Linnaeus divided the Prunus into Amygdalus, Padus, and Prunus, and later recognized four genera: Armeniaca, Cerasus, Padus (including Laurocerasus) and Prunus. In Shi's analysis, Amygdalus and Prunus were merged into one subgenus Prunus, and three subgenera Cerasus, Prunus and Padus were constructed according to cp regions and nuclear genes data (Shi et al., 2013). Our phylogenetic results based on LSC region, CDS region, intergenic region and whole cp genome datasets recognized four subgenera: Amygdalus, Cerasus, Padus, and Prunus, which was in accordance with previous phenotype-based classification of Linnaeus in 1754 and Koehne in 1911(Lee & Wen, 2001. Without Laurocerasus (laurel-cherries) clade in our results may be the limited cp genome datasets and additional cp genome data would be necessary to test the genetic relationship of Laurocerasus within Prunus. P. padus and P. serotine were assigned to subgenus Padus which was in line with previous results (Wen et al., 2008;Shi et al., 2013). The position of subgenus Amygdalus as sister to Prunus was in accordance with the results of Yazbek & Oh (2013) and Yazbek & Al-Zein (2014). However, the monophyletic subgenus Amygdalus was contrary to the results of Lee & Wen (2001), who reported Amygdalus to be paraphyletic. This difference between mono-and paraphyly of Amygdalus may be due to marker or sampling differences. Bortiri, Heuvel & Potter (2006) also deemed that molecular data alone do not support the monophyly of subgenus Amygdalus. More molecular data combined morphological data are needed to address this question thoroughly.
Two subgenera Prunus and Amgydalus, especially members of P. tomentosa, P. pedunculata, P. mongolica and P. davidiana were intermixed in SSC and IR trees (Fig.  5), which indicated a close tie between Prunus and Amgydalus. Previous results also demonstrated that subgenera Prunus and Amgydalus were more closely related to one another than either to subgenus Cerasus (Badenes & Parfitt, 1995;Lersten & Horner, 2000;Lee & Wen, 2001;Wen et al., 2008). According to Rehder's classification, P. tomentosa was classified in Subgenus Cerasus. Hybridization studies (Kataoka, Sugiura & Tomana, 1988) and isozyme results (Mowrey & Werner, 1990), together with cp regions and nuclear genes data, demonstrated that it was closer to subgenus Prunus rather than to Cerasus (Bortiri et al., 2001;Bortiri et al., 2002;Shi et al., 2013). In addition, P. pedunculata was traditionally classified as a member of genus Amygdalus (Lu & Bartholomew, 2003). However, Yazbek & Oh (2013), Al-Zein (2014), andDuan et al. (2018) suggested that P. pendunculata should be excluded from subgenus Amygdalus, and recovered in subgenus Prunus. P. mongolica and P. davidiana were closely related to species of peach (P. persica), and previous studies based on molecular and morphological analysis all supported the placement of subgenus Amygdalus (Yazbek & Oh, 2013;Yazbek & Al-Zein, 2014). Since previous assertions and results of LSC, CDS, intergenic region and whole cp genome trees in this study did not support the placement of P. tomentosa, P. pedunculata, P. mongolica and P. davidiana, thus, we maintained that chloroplast genome datasets, such as LSC, CDS, intergenic region and whole cp genome could be employed to construct phylogenetic inferences in Prunus.
Numerous phylogenetic studies based on the cpDNA sequences have been carried out during the past years. The cp genome approaches together with nuclear and phenotypic data can provide complimentary information for genetic analysis in Prunus. Our incongruent phylogenetic relationship results among Prunus species illustrated that when phylogenetic analysis were conducted, the plastome data partitions should be prepared with meticulous care.

CONCLUSIONS
The study reported the first complete cp genome of a sweet cherry (P. avium) cultivar 'Summit'. Comparison with other Prunus species revealed that P. avium 'Summit' was quite conserved in structure as well as gene content. The cp SSRs and several intergenic regions compared with other Prunus species could be selected to develop into valuable DNA markers in further study. The phylogenetic analysis using SSC region and IR region datasets were not in accordance with the results using other cp data partitions and other published phylogenies. LSC, CDS, intergenic region and whole cp genome datasets could be employed to evaluate phylogenetic relationships in Prunus.
• Yu Ding performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, approved the final draft.
• Yan Huo analyzed the data, prepared figures and/or tables, approved the final draft.
• Zhaohe Yuan conceived and designed the experiments, authored or reviewed drafts of the paper, approved the final draft.

Data Availability
The following information was supplied regarding data availability: The raw data is available at NCBI: MK622380, and Genbank SRA: PRJNA579503.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/ peerj.8210#supplemental-information.