A draft genome at chromosome level and metabolomes of leave, root and owers provide insights into the molecular basis of medicinal ingredients of loquat (Eriobotrya japonica (Thunb.)

Eriobotrya japonica, commonly called loquat, is a type of fruit and a famous traditional Chinese medicinal material. Here, a high-quality draft genome of the E. japonica cultivar, Big Five-pointed Star, containing 733.32 million bases (Mb) covering approximately 98% of estimated whole genome size (749.25 Mb) is reported. A total of 45,492 protein-coding genes were predicted. Meanwhile, metabolomes of ower, leave and root of this loquat variety were also determined by UPLC-ESI-MS/M system. 577 metabolites were determined in total, including 98 phenolic acids, 95 avonoids, 33 alkaloids, 28 terpenoids, one steroids. The accumulation difference of these metabolites among leaf, ower and root of loquat were also claried. Based on KEGG annotation, genes related to the biosynthesis of medicinal ingredients including some avonoids and terpenoids were identied. Overall, this study provides global fundamental molecular insights into the medical biology of Eriobotrya japonica.


Introduction
E. japonica (Maloideae: Rosaceae), commonly known as loquat, is a kind of evergreen fruit rich in nutrients [1] . E. japonica was rst domesticated during the Han dynasty in China 2000 years ago according to documentary records and archaeological relics [2] . Today, E. japonica is planted in more than thirty countries including China, Japan, the United States, France, Italy, and Spain. The annual output exceeds 1.2 million tons worldwide [3] . E. japonica is also an important medicinal plant. Its roots, leaves, and owers have long been used as traditional Chinese medicine, and have good therapeutic e cacy for in ammation, diabetes, cancer, bacterial infection, aging, pain and allergy. Rich bioactive components, including phenols, avonoids, terpenoids and polysaccharides, have been found in these tissues [4][5][6][7] .
A draft genome of a white-eshed E. japonica cultivar named 'Seventh Star' has been recently published [25] however, most E. japonica cultivars are yellow/red-eshed. As a fruit and medicinal plant, studies on the biosynthesis of medicinal ingredients of E. japonica at the genome level are still very limited, which greatly hampers the full understanding and utilization of E. japonica. In this study, the whole genome of 'Big Five-pointed Star', a yellow/red-eshed E. japonica cultivar, was sequenced, assembled and analyzed. Meanwhile, metabolomes of ower, leave and root from loquat tree of this cultivar were also determined and analysis. The major aims were to offer another high-quality reference genome for further research and utilization, and to provide insight into the molecular mechanisms about the biosynthesis of medicinal ingredients in E. japonica.

Results
Sequencing and assembly of a high-quality draft genome An individual of Big Five-pointed Star was selected as sequencing material. Approximately 688.18 million clean short-reads with total 51.54 Gb data was rst generated using the Illumina Hiseq 4000 sequencing platform (Table S1). Using these data, a K-mer analysis was performed, and the genome size of Big Fivepoint Star was evaluated to be 749.25 Mb (Table S2; Fig. 1A), which is almost identical to the result (749 Mb) determined using ow cytometry [26] . The heterozygosity and GC content were evaluated to be 0.31% and 38.58%, respectively (Table S2). Whole genome sequencing was then performed using PacBio longread sequencing technologies, and more than six million clean subreads were obtained with an average length of 6,121 bp (N50 = 11,469 bp). A total of 36.90 Gb was obtained (Table S4, Fig.S1). With these clean subreads, an initial draft genome composed of 3,677 contigs with 733.32 Mb non-redundant sequences was assembled (Table S4) covering approximately 97.87% of the estimated whole genome size. Three measures were used to evaluate the assembly completeness of the initial draft genome. The rst was a screening of 458 core eukaryotic genes and 248 conserved sequence datasets in the CEGMA database [27] , which identi ed 447 (97.06%) and 238 (95.97%) matches, respectively (Table S5). The second was by querying the BUSCO database, which contains 1,440 plant-speci c orthologous genes [28] . A total of 1,359 (94.38%) genes, with 921 single and 438 duplicate complete BUSCO genes were identi ed. The number of missing BUSCO genes was only 66 (4.38%) ( Table S5). The third was to map the short-read data onto the draft genome. It was found that 93.81% of the draft genome could be aligned (Table S5). The above results suggested that the initial draft genome had good assembly completeness.
Further corrections to the initial assembly and locations of the contigs on the chromosomes were performed using Hi-C technology with approximately 96 Gb of data from about 321.2 million reads generated by the Illumina Hiseq 4000 sequencing platform (Table S6). Among these, approximately 159.7 million read pairs were uniquely mapped on the initial draft genome, and more than 74.6 million read pairs were valid interactions (Table S6). These read pairs were used to scaffold the contigs onto chromosomes, and the number of contigs were nally corrected to 3,938, of which 3,725 with 727.40 Mb covering 99.19% of the draft genome sequences were anchored on chromosomes. The order and direction on the chromosomes of 2,181 contigs, which accounts for 644.88 Mb, could be determined. (Fig.S3; Table S7). These results indicate that the nal assembled draft genome has good integrity, and was su cient as a reference of the whole genome re-sequencing data, and for other molecular markers and genes and for genome utilization.
Other metabolites in leaf, ower and root of loquat Except the phenolic compounds, terpenoid, alkaloids and so on secondary metabotiles, polysaccharide also playes an important role in medicinal application of loquat [46] . Study showed that the contents of polysaccharides in different part of tree were signi cantly different among difference loquat cultivars [47] .
In this study, 16

Identi cation of genes responsible for metabolism of medicinal ingredients using KEEG annotation
Previous studies suggested that the main components that endow E. japonica with medicinal value are avonoids, terpenoids and polysaccharides [32][33][34] . Among 577 metabolites identi ed, 271 could be annotated by KEGG database (Data le 4,9), but only 110 including 15 phenolic acids, six avonoids and 12 alkaloids could be assigned into speci c pathway such as 'Metabolic pathways' (ko01100), 'Biosynthesis of secondary metabolites' (ko01110), 'Phenylpropanoid biosynthesis' (ko00940), 'Flavonoid biosynthesis' (ko00941), 'Stilbenoid, diarylheptanoid and gingerol biosynthesis' (ko00945), 'Caffeine metabolism', 'Iso avonoid biosynthesis' (ko00950), etc (Data le 10). According to the KEGG annotation, 71 genes responsible for the biosynthesis of avonoids (Fig S2; Data le 10), and 92, 32, 56, and 37 genes involved into the biosynthesis pathways of terpenoid backbones, monoterpenoids, diterpenoids, sesquiterpenoid-triterpenoids were identi ed respectively (Fig. S3-6; Data le 10). Quercetin is an important avonoid which has been shown to modify eicosanoid biosynthesis (anti-prostanoid and antiin ammatory responses) and other therapeutic functions [48] . Which are abundant and almost existes with the form of quercetin glycosides in loquat as mentioned above. Genes encoding key enzymes in the pathway of biosynthesis of quercetin, including three genes (EVM0007289.1; EVM0018354.1; EVM0040197.1) coding avonoid 3'-monooxygenase-like [1.14.13.21] enzymes were identi ed in loquat genome (Data le 10). Chlorogenic acid is one of phenolic acids rich in leaf, ower and root of loquat, which play key role in Phenylpropanoid biosynthesis (ko00940), Flavonoid biosynthesis (ko00941), Stilbenoid, diarylheptanoid and gingerol biosynthesis (ko00945 residues from activated nucleotide sugars to acceptor molecules (aglycones), and play a key role in regulating the solubility of the acceptors within cells and throughout the organism [49] . In this study, several UDP-glucoronosyl coding genes involved in the biological metabolism pathways of polysaccharides, including ve genes (EVM0001836.3; EVM0024320.2; EVM0026146.1; EVM0036105.1; EVM0038009.1) encoding UDP-N-acetylglucosamine transferase subunit ALG14-like enzymes [EC:2.4.1.141] and two genes (EVM0001695.1; EVM0001956.1) encoding UDP-Nacetylglucosaminephosphotransferase-like enzymes [EC:2.7.8.15] were found (Data le 10). These results offer a possible molecular clue to explain why water soluble polysaccharides in E. japonica are high and could play an important role in its healthful effects. Discussion E. japonica blooms in late autumn/winter, and ripens in spring/early summer. At this time, most fresh fruits are not on the market in the Northern Hemisphere, so the fresh fruit of japonica plays an important role in lling the gap of the fruit basket, greatly adding to its commodity value. In recent years, E. japonica has increasingly becoming an important fruit worldwide. However, due to the lack of information on its genome, studies on the genetics and molecular biology of E. japonica are limited. This has greatly impeded understanding of growth, development, and varieties breeding. Recently, a draft genome of E. japonica cultivar Seventh Star has been published [25] . Here, a new high-quality draft genome of the E. japonica cultivar, Big Five-pointed Star, is described. Big Five-pointed Star is a cultivar with yellow/red fresh and has the largest planting area in China [50] . In contrast, Seventh Star is a mutant with white esh and was bred recently. Results of the assembly and annotation of these two draft genomes were not exactly the same. For example, the estimated genome size of Seventh Star is ∼710.83 Mb, while Big Fivepointed Star is ∼749.25 Mb. The number of predicted coding genes in Seventh Star is 45,743, while the number in Big Five-pointed Star is 45,450. This new E. japonica draft genome from Big Five-pointed Star will provide a more solid foundation and more choices about the reference genome for further E. japonica studies in molecular biology, genetics, and breeding.
More than fourth metabolites incldung ursolic acid, ursolic acid methyl ester, acetyl ursolic acid, and oleanolic acid, chlorogenic acid, neochlorogenic acid, caffeic acid with potential medicinal value have been detected in different organs of loquat by classical instrumental analyses [51,52] . However, Traditional phytochemistry methods are very time-consuming, and labor intensive and low-throughput. So far, there is still a limited of global determination about the metabolites including phenols, avonoids, terpenes and so on secondary metabolites which have potential health effects of loquat tissues or organs. And this is not conducive to the further research and utilization of medicinal value of loquat. In recent years, traditional Chinese medicine (TCM) has been greatly accelerated by the analytical technologies and methodologies of genomics, proteomics and metabonomics [53,54] . Among these Omics, metabolomics is a very valuable for the analysis of components including various metabolites of TCM [55] . Here, the metabolomes of leaf, ower and root of loquat were determined using a widely targeted metabolomic analysis based on the based on the liquid chromatography and series mass spectrometry (LC-MS/MS) which rstly offered by Chen et al [56] , and many new metabolites including those belong to phenols, terpenoids and alkaloids were detected in loquat, and this would greatly deepen the knowledge of the biochemical substances in loquat that produce medicinal e cacy. Meanwhile, lots of genes encoding the enzymes those related to the biosynthesis of metabolites including phenols, terpenoids and alkaloids, and this laid a good foundation for further study on molecular pharmacology of loquat.

Conclusion
In this study, a high-quality genome of loquat with yellow-eshed were assembled, and three highthroughput metabolomes of leaf, ower and root of loquat were determined. 45490 putative proteinencoding genes and 577 metabolites were detected. Among them, some metabolite belong to phenols, terpenoids and alkaloids and other kind with potential healthful value and genes related to the biosynthese of these metabolites were identi ed and analyzed. Overall, this study provides global fundamental molecular insights into the nutrient and medical biology of Eriobotrya japonica.

Material And Methods
Brief introduction of sequencing objective: loquat cultivar, Big Five-pointed Star Big Five-pointed Star, also called Dawuxing, shows excellent characteristics, including high average single fruit weight, edible rate, and soluble solids content. The peel and esh are easily stripped, and has a sweet taste with juicy, soft, and delicate meat. Big Five-pointed Star has become a popular cultivar with the most rapid development and widest planting area in China [50] .

DNA extraction
Total DNA were extracted from young leaves of Big Five-pointed Star using a modi ed protocol based on the CTAB method [57] . RNAase was then used to remove RNA contamination from the total DNA in a 37 °C water bathfor 1 h. The quality of and content of extracted total genomic DNA was checked using agarose gel electrophoresis and spectrophotometry (Nanodrop WND-1000, Nano-Drop Technologies Inc, Delaware, USA).
Illumina short-read library construction, sequencing and raw data statistics The Illumina paired-end reads library (350 bp) was constructed strictly according to the guideline of sequencer manufacturer using the following steps: Quali ed total DNA was fragmented into small segment; segments approximately 350 bp in length were selected on a 3% agarose gel for further analysis; end repair and A-tailing was performed, and Illumina compatible adaptors were added to the selected DNA fragments; PCR-ampli cation was executed using Illumina adapter-speci c primers, and the paired-end sequencing library was then nished. The quality and quantity of the sequencing library was performed to sequence by using 150 base-length read v3 chemistry in paired-end ow cell on the Illumina HiSeq 2000 (Illumina, San Diego, CA, USA).

Survey of Big Five-pointed Star based on short-reads data by K-mer analysis
A K-mer analysis was performed using the KAT program [58] to make an initial estimate based on C-value, heterozygosity, and repetitive rate of the genome of Big Five-pointed Star according to the following formula: genome size = (total nucleotides number)/(average sequencing depth) = (total number of Kmer)/(average K-mer depth). The K value used the maximum number of odd numbers that met the following formula: 4^K/genome > 200.
Pacbio long-read library construction, sequencing, and raw data statistics The long-read sequencing library (20 kb) was constructed strictly according to the guidelines of the Pacbio sequencer (Menlo Park, CA, USA)) using the following steps. The genomic-DNA of Big Fivepointed Star was fragmented by g-TUBE; Performed a damage repairing for fragmented DNA; Executed end-repairing for the broken DNA; Ligated the broken DNA with the dumbbell-shaped adaptor; Digested the DNA segment by using exonuclease; Selected the target segment by using BluePippin. Finally, the sequencing library was successfully constructed.

Assembly and integrity evaluation of draft genome
To assembly the loquat draft genome using the long-read sequencing data, subreads with low-quality (<Q20) and short length (<500) were ltered, and the remaining subreads were corrected by Canu software [59] . The corrected data were then assembled into a draft genome sequence by WTDBG [60] , Falcon [61] , and Canu software, respectively. Then, the three assembled results were optimized using the Quickmerge ideology [62] . Finally, the assembly was further improved by correcting errors joining the shortread data using Pilon software [63] .
Three methods were used to evaluate completeness of draft genome. The rst was by blasting the assembled draft genome with a standard of more than 70% identity against the Core Eukaryotic Genes Mapping Approach database (CEGMA) [27] , and included 458 core eukaryotic genes (CEGs) and 248 highly conserved core eukaryotic genes. The second method was by blasting the assembled draft genome with at least 70% identity against the embryophyta_odb9 dataset in the BUSCO v2.0 database [28] , which includes 1,440 conserved core genes of plant. Finally, the short-reads sequencing data by Illumina technology were mapped to the assembly draft genome using BWA software [64] .

Hi-C sequencing library construction
A Hi-C sequencing library was constructed according to standard protocols descripted by Servant et al (2015) [65] . Brie y, the cells of young leaves were xed with formaldehyde.The Dissociated the xed cell and cut the cross-linking product with restricted endonuclease enzyme to produce viscous ends. The biotin marker was introduced into the viscous ends and repaired to produce blunt ends; Ligated the blunt end; Unlashed the cross-linking to separate DNA from protein, and protein from protein; Extracted the DNA; Used the Covaris interruptor to interrupt DNA to the right size then repaired the end; Puri ed and recycled the interrupt DNA segment by gel electrophoresis; Removed those DNA segment without biotin marker; Added A to the remaining DNA segment including biotin marker; Added PCR adaptor; PCR; Puri ed and recycled PCR produce by gel electrophoresis. the Hi-C sequencing library was nally nished.
Hi-C sequencing, data summary and estimation The Illumina HiSeq 4000 (Illumina) sequencing platform was used for paired sequencing by synthesis.
The paired reads were mapped on the assembled loquat genome draft by using the BWA program (version: 0.7.10-r789; aln model; other parameters were set to default) [64] .

Repetitive sequence predicting and annotation
To annotate the genomic sequence, a special database for identifying repetitive sequences in the loquat genome was constructed with the help of LTR FINDER v1.05 [66] , MITE-Hunter [67] , RepeatScout v1.0.5 [68] and PILER-DF v2.4 software [69] based on the principles of structure prediction and de novo prediction.
This special database was then merged with the REPBASE database [70] as the nal repetitive sequence database. The PASTEClassi er software [71] was used to classify the database. Finally, the Repeatmasker V4.0.6 software [72] was used to predict repetitive sequences based on a well-constructed repeating sequence database.

Sampling, sample preparation and extraction for metabolites
Nine samples including nine samples including three from leaf, three from ower and three from root of Big Five-pointed Star at different developmental stages were were collected for metabolome analysis.

UPLC Conditions
The sample extracts were analyzed using an UPLC-ESI-MS/MS system (UPLC, Shim-pack UFLC SHIMADZU CBM30A system, www.shimadzu.com.cn/; MS, Applied Biosystems 4500 Q TRAP, www.appliedbiosystems.com.cn/). The analytical conditions were as follows, UPLC: column, Agilent SB-C18 (1.8 µm, 2.1 mm*100 mm); The mobile phase was consisted of solvent A, pure water with 0.1% formic acid, and solvent B, acetonitrile. Sample measurements were performed with a gradient program that employed the starting conditions of 95% A, 5 % B. Within 9 min, a linear gradient to 5% A, 95% B was programmed, and a composition of 5% A, 95% B was kept for 1min. Subsequently, a composition of 95% A, 5.0 % B was adjusted within 1.10 min and kept for 2.9 min. The column oven was set 1 to 40°C; The injection volume was 4μl. The e uent was alternatively connected to an ESI-triple quadrupolelinear ion trap (QTRAP)-MS.

ESI-Q TRAP-MS/MS
LIT and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometer (Q TRAP), API 4500 Q TRAP UPLC/MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in positive and negative ion mode and controlled by Analyst 1.6.3 software (https://sciex.com/products/software/analyst-software; AB Sciex). The ESI source operation parameters were as follows: ion source, turbo spray; source temperature 550; ion spray voltage (IS) 5500 V (positive ion mode)/-4500 V (negative ion mode); ion source gas I (GSI), gas II(GSII), curtain gas (CUR) were set at 50, 60, and 30.0 psi, respectively; the collision gas(CAD) was high. Instrument tuning and mass calibration were performed with 10 and 100 μmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. QQQ scans were acquired as MRM experiments with collision gas (nitrogen) set to 5 psi. DP and CE for individual MRM transitions were done with further DP and CE optimization. A speci c set of MRM transitions were monitored for each period according to the metabolites eluted within this period [56] .
Qualitative and quantitative analysis of metabolites Based on mwdb (metaware database) built by Maiwei company, the qualitative analysis is carried out according to the secondary spectrum information, and the repeated signals including K + ion, Na + ion and NH4 + ion, as well as the repeated signal of fragment ion which itself is larger molecular weight substance are removed. Metabolite quanti cation was performed by multiple reactions monitoring mode analysis using triple quadrupole mass spectrometry [86] .

Data analysis
The mass spectrum data were processed by Software Analyst 1.6.3. Samples from same organ were regarded as repetition. Principal component analysis (PCA), Pearson's correlation analysis, differential expression analysis, and drawing of heat mapwere performed using statistical module of R software package (https://www.r-project.org/). The Kyoto Encyclopedia of genes and genomes database [87] was used to annotate and display the biosynthetic pathways of different metabolites.

Declarations
Data avaibility: Draft genome sequence with fasta format and protein-coding genes sequence with gff format were deposited in China National Center for Bioinformation https://bigd.big.ac.cn/gsub/: accessible ID: GWHAOTB00000000. The reviewer can log in with username: wys3269@126.com code: wys123456 to check it. Chromosome-wide distribution of tRNA, rRNA, miRNA, repeats, genes, and GC from inside out The category (indicating by Y-axis) and quantity (indicating by X-axis) of Metabolites in leaf, ower and root of loquat