The first chloroplast sequence of Rosa davurica Pall. var. Davurica

Abstract Rosa davurica Pall. var. davurica is a member of the plant family Rosaceae. Although R. davurica has high application value, its chloroplast genome sequence has not been reported. This study aims to reveal the genetic characteristics of the chloroplast genome of Rosa roxburghii. The length of its total chloroplast DNA is 156,971 bp, with 37.22% G/C content. Its chloroplast genome has two inverted repeat (IRa and IRb) regions totaling 26,051 bp which are separated by a large single copy (LSC) region of 86,032 bp and a small single copy (SSC) region of 18,837 bp. The genome contains 131 independent genes (86 protein-coding, 37 tRNA, and 8 rRNA), and there are 18 repeated genes within the IR region. Among these genes, 17 genes contained one or two introns. The phylogenetic analysis showed that R. davurica was relatively close to other Rosa species, such as the Rosa hybrid.


Introduction
The chloroplast plays an important role in plant processes, such as photosynthesis, and is a very important organ in the biosynthesis of fatty acids, starches and amino acids (Neuhaus and Emes 2000). Since the complete chloroplast genome of tobacco was first sequenced (Shinozaki et al. 1986), an increasing number of other plant chloroplast genomes have been rapidly sequenced. The ever-increasing chloroplast genome sequences provide a large foundation for studying the evolution of the plant chloroplast genome and the function and development of chloroplast genes (Shinozaki et al. 1986;Liu et al. 2012). In contrast to the nuclear genome, the chloroplast genome is relatively conserved in terms of gene composition and structure. Currently, hundreds of plant chloroplast genome sequences are available in NCBI for research.
Rosa davurica Pall. var. davurica (Jisaburo Ohwi 1966) is an important flowering plant in Rosaceae and is used as a good greenery and medicinal plant (Hu et al. 2011). Studies have shown that its fruits possibly contain compounds that inhibit the degranulation of rat mast cells (Kim et al. 1999). Eight known tetracyclic triterpene acids and three known flavonoids were found in R. davurica, which were used in traditional Chinese medicine (Kuang et al. 1989;Huo et al. 2017). Rosa davurica fruit and leaves could significantly prevent lipid peroxidation and inflammation (Jiao et al. 2004;Jung et al. 2011). One study found that the leaf extract of R. davurica had seven main components, which had a protective effect on inflammation in vitro and in vivo (HWANG et al. 2020). Studies have found that the phenolic compounds in R. davurica were the main biological components in its extract, which showed the strongest antioxidant effect on human low-density lipoprotein (LDL) oxidation (Sa et al. 2004). Information about the chloroplast genome sequence of R. davurica would help facilitate genetic research and breeding. In this paper, we report the complete chloroplast genome sequence of R. davurica, obtaining information for further research on the medicinal value of R. davurica.

Materials and methods
Fresh R. davurica leaves were collected from Jiulong Mountain in Mentougou District, Beijing,China (39.944 E,116.019 N) (Figure 1). A specimen and DNA were deposited at the herbarium of the Experimental Center of Forestry in North China (39.970 E,116.096 N) (https://www.ncbi.nlm.nih. gov/nuccore/MW381769, contact person is Shaofeng Li and email is Lisf@caf.ac.cn) under voucher number cimei-001. The total leaf DNA was extracted by the CTAB method described by Wang (Wang et al. 2015). A TruSeq DNA sample preparation kit (Illumina, USA) was used to generate a sequencing library with a template preparation kit (Pacific Biosciences, USA). Genome sequencing was performed with the Pacific Biosciences platform and Illumina NovaSeq platform. We obtained a total of 21,586,904 reads (including 20,985,892 high-quality reads), and SPAdes (Bankevich et al. 2012) and A5-miseq (Coil et al. 2015) were performed to construct scaffolds and contigs from the clean reads. The online program Geseq (https://chlorobox.mpimp-golm.mpg.de/geseq.html) was used to upload the assembled complete chloroplast genome sequence for functional annotation. The genome of a close relative was provided for reference, and other parameters followed the default parameters. The maximum, the minimum and the average depth were 2468Â, 323Â, and 895.714 Â, respectively ( Figure S1). The plastome map and structure of the genes ( Figure S2 and S3) were drawn using the CPGview program (http://www.1kmpg.cn/cpgview/) (Liu et al. 2023). The phylogenetic analysis was performed based on the complete chloroplast genome sequence of 16 Rosa species and 2 other plants, including P. armeniaca and P. salicina. The phylogenetic relationships were determined with MAFFT using the maximum-likelihood method (Katoh et al. 2002).

Results
The length of the chloroplast genome sequence of R. davurica (accession number MW381769) is 156,971 bp, which contains an LSC region of 86,032 bp, an SSC region of 18,837 bp and an IR region of 26,051 bp. The total GC content of the chloroplast genome was 37.22% (LSC, 35.22%; SSC, 31.06%; and IR, 42.74%) (Figure 2). The chloroplast genome encodes 131 genes, including 86 protein-coding, 8 rRNA, and 37 tRNA genes. In total, 18 genes are repeated in the IR region, including 7 tRNAs, 4 rRNAs, and 7 protein-coding genes. In total, 17 genes contain introns, including 6 tRNAs, while ycf3 and clpP1 contain two introns. The rps12 gene is a transspliced gene; the 5 0 -end is located in the LSC region, and the 3 0 -end is located in the two IR regions.
To confirm the phylogenetic position of this newly described genome, we downloaded 16 complete plastid datasets from Rosa and 2 other plants. The phylogenetic trees generated in this study also indicated a close relationship between R. davurica and the R. hybrid, with high bootstrap support (Figure 3). The chloroplast genome of R. davurica is a useful gene resource that could be used to improve the ecological, genetic and medicinal values of Rosaceae plants. The complete chloroplast genome provides a wealth of genetic information and molecular markers that underlie the evolutionary relationships of land plants (Luo et al. 2014). This paper first characterized the complete chloroplast genome of R. davurica. The chloroplast genome of R. davurica is 156,971 bp (Figure 2), which provided an important basis for analyzing the evolutionary relationship of R. davurica in the Rosaceae plant. The R. davurica chloroplast genome contains an LSC region, an SSC region and two IR regions. It encodes 131 genes, including 86 protein-coding, 8 rRNA and 37 tRNA genes, and the genes contain one or two introns. The total GC content of the chloroplast genome was 37.22% (Figure 2), and these results are similar to those of other flowering plants.

Discussion and conclusion
Chloroplast genomes have many unique advantages, including efficient transgene expression (Cosa et al. 2001). To date, transgenes have been stably integrated and expressed Figure 2. Physical map of the Rosa davurica whole chloroplast genome. Genes inside the circle are transcribed clockwise; genes outside are transcribed counterclockwise. The dark grey inner circle corresponds to the GC content. The chloroplast genome of R. davurica is 156,971 bp. The R. davurica chloroplast genome contains an LSC region, an SSC region and two IR regions. It encodes 131 genes, including 86 protein-coding, 8 rRNA, and 37 tRNA genes, and the genes contain one or two introns. The total GC content of the chloroplast genome was 37.22%. through the chloroplast genome to confer several useful agronomic traits, including drought tolerance (Lee et al. 2003) and salt tolerance (Kumar et al. 2004). The complete chloroplast genome sequence of R. davurica reported in this paper provides a basis for future research on the medicinal value and other agronomic traits of R. davurica.

Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors. In this experiment, we did not collect any human or animal samples.

Author contributions
SL conceived and designed the experiment. YX performed the experiments. YX and HW analyzed the data. YX, and HW contributed regents/material/analysis tools. YX and SL wrote the paper.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the Fundamental Research Funds of CAF [CAFYBB2020QD001].

Data availability statement
The genome sequence data that support the findings of this study are openly available in GenBank of NCBI at [https://www.ncbi.nlm.nih.gov] (https://www.ncbi.nlm.nih.gov/) under accession no. MW381769. The associated BioProject, SRA, and Bio-Sample numbers are PRJNA701780, SRR13718192, and SAMN17910923, respectively.  (Jeon and Kim 2019). Note: The sequence type was chloroplast complete genome; Maximum Likelihood method is adopted, and GTR is used as the alternative model. Bootstrap replicates is 1000, and bootstrap scores is the number before the branch in the diagram.