Candidate genes associated with red colour formation revealed by comparative genomic variant analysis of red- and green-skinned fruits of Japanese apricot (Prunus mume)

The red-skinned fruit of Japanese apricot (Prunus mume Sieb. et Zucc) appeals to customers due to its eye-catching pigmentation, while the mechanism related to its colour formation is still unclear. In this study, genome re-sequencing of six Japanese apricot cultivars was carried out with approximately 92.2 Gb of clean bases using next-generation sequencing. A total of 32,004 unigenes were assembled with an average of 83.1% coverage rate relative to reference genome. A wide range of genetic variation was detected, including 7,387,057 single nucleotide polymorphisms, 456,222 insertions or deletions and 129,061 structural variations in all genomes. Comparative sequencing data revealed that 13 candidate genes were involved in biosynthesis of anthocyanin. Significantly higher expression patterns were observed in genes encoding three anthocyanin synthesis structural genes (4CL, F3H and UFGT), five transcription factors (MYB–bHLH–WD40 complexes and NAC) and five anthocyanin accumulation related genes (GST1, RT1, UGT85A2, ABC and MATE transporters) in red-skinned than in green-skinned Japanese apricots using reverse transcription-quantitative polymerase chain reaction. Eight main kinds of anthocyanin s were detected by UPLC/MS, and cyanidin 3-glucoside was identified as the major anthocyanin (124.2 mg/kg) in red-skinned cultivars. The activity of UDP-glucose flavonoid-3-O-glycosyltransferase enzyme determined by UPLC was significantly higher in all red-skinned cultivars, suggesting that it is the potential vital regulatory gene for biosynthesis of anthocyanin in Japanese apricot.


INTRODUCTION
Japanese apricot (Prunus mume Sieb. et Zucc), an attractive fruit tree, originated from Southwest China and has been extensively cultivated in all of East Asia and Japan. China, being the origin of Japanese apricot, is rich in good quality germplasm with an approximately 190 fruiting cultivars (Chu, 1999). The fruit is usually processed into many regulated directly by TFs whose functions have been demonstrated in many plants. An intricate of MYB TF, basic helix-loop-helix (bHLH) TFs and WD-repeat proteins in MYB-bHLH-WD40 (MBW) complex combined with promoters of structural genes in anthocyanin biosynthesis (Gonzalez et al., 2008). In apple and pear, expression of the MYB10 gene has been shown to control red apple fruit flesh and foliage anthocyanin accumulation to the exclusion of other non-red pigments (Takos et al., 2006;Espley et al., 2007). The biosynthesis of anthocyanins in lotus (Nelumbo Adans) was regulated by MBW complex (NnMYB5, NnbHLH1 and NnTTG1); the study showed that NnMYB5 was a transcriptional activator of anthocyanin synthesis that interacted with NnbHLH1 and NnTTG1 (Sun et al., 2016). TTG1, a WD40 protein, participated in anthocyanin synthesis had also been characterized from fruit species including grapevine (Kobayashi, Goto-Yamamoto & Hirochika, 2004) and strawberry (Schaart et al., 2013).
In Japanese apricot, fruit colour is ascertained by the accumulation of anthocyanins, most abundant flavonoids. Cyanidin 3-glucoside remained the major anthocyanin in several diversities of red-skinned cultivars. In this present study, red-skinned cultivars were found to be more attractive to consumers than their green counterparts due to their eye-catching appeal and therefore were used for more findings of anthocyanin biosynthesis.

Plant materials and DNA extraction
Fruits of six different cultivars of Japanese apricot were collected from National Field Genebank for P. mume and Waxberry at the Jiangpu Agricultural Research Station, Nanjing Agricultural University, Nanjing, P.R. China. Based on fruit skin colour, these cultivars were classified into two groups: green-and red-skinned cultivars. Green-skinned cultivars included 'QingjiaNo.2' (QJM), 'Yanglao' (YLM) and 'Shinano koume' (SKM), and red-skinned cultivars included 'Ruantiao hongmei' (RHM), 'Xiaoye zhugan' (XZM) and 'Zhonghong' (ZHM). Red-skinned cultivars can be easily distinguished from green-skinned by their specific phenotypic traits. DNA samples were extracted from young leaves of all cultivars using CTAB method following the manufacturer's protocol to perform genome re-sequencing. Fruits were harvested in triplicate for re-sequencing on 28th May and 2nd June.

Exploration of physiological indicators and phenotypic traits
Physiological indicators were measured at the same ripening time (Fig. 1). The vertical height, flank diameter and width were measured by vernier calliper (GuangLu, Guilin, China). Single fruit weight was measured with an electronic balance (METTLER TOLEDO, Zurich, Switzerland). Fruits were peeled and pulp was squeezed in a cheesecloth, and filtered, homogenized juice was used for determination of soluble solids contents using a digital refractometer (ATAGO, Tokyo, Japan).
The colour of the fruits was observed and calculated with a Minolta CR-400 Chroma Meter (Konica Minolta Sensing, Inc., Osaka, Japan) using an illumination of D75 and at measuring angle of 10 after calibration with an ordinary white plate (Y = 94.00, x = 0.3158, y = 0.3322). Three records of L Ã represents lightness, a Ã and b Ã represent red-green and yellow-blue chromaticity which coordinates were noted for individually Japanese apricot sunny surface. Each sample was chosen from 10 replicates by changing the position of sunny surface at maturity stage. Chroma and hue angle were calculated as Chroma = (a Ã2 + b Ã2 ) 1/2 and Hue = tan -1 (b Ã /a Ã ) (Oms-Oliu, Soliva-Fortuny & Martín-Belloso, 2008).

Construction of libraries and Illumina sequencing
To construct a whole genome shotgun sequencing library for whole genome resequencing, high-quality genomic DNA was extracted and randomly interrupted by ultrasonic shearing at intervals of 150-800 bp (pair-end library). DNA fragments of required length were isolated by electrophoresis, and then Klenow DNA polymerase, T4 DNA polymerase and T4 PNK were utilized to convert sticky ends to blunt ends. Amplified DNA fragments were finally sequenced using an Illumina HiSeq TM 2500 platform (1gene, Zhejiang, China). The sequences were aligned to the Japanese apricot reference genome (https://doi.org/10.6084/m9.figshare.6197219.v1) using the SOAP2 (Short Oligonucleotide Alignment Program 2) software (http://soap.genomics.org.cn/soapaligner. html), and SOAP2 was matched to the single-end and paired-end fragment. According to alignment results, it was possible to calculate the depth and coverage of sequencing with respect to reference genome. Statistical analyses were carried out by means of SPSS 18.0 software (SPSS Inc., Chicago, IL, USA).

Single nucleotide polymorphisms detection
Single nucleotide polymorphisms (SNPs) were recognized using SOAPsnp (http://soap.genomics.org.cn/soapsnp.html) to explain the effects of substitutions (including synonymous SNPs and non-synonymous SNPs). According to the comparison results, a Bayesian model was used to determine the probability of each possible genotype based on actual observed data, while considering data characteristics, sequencing quality and experimental factors. The genotype of specific locus of sequenced individual was selected as the maximum of probability and a quality value reflecting genotype was assigned on the basis of that probability, then consensus sequence was obtained. Based on this consensus sequence, reference sequence exists, polymorphism screening and filtering was done. The SAM tools function 'mpileup' was used to identify unprocessed SNPs population by means of reads through a mapping value !20. Using the SAM tools database 'vcfutils,' SNPs extracted using the beyond procedure were initially filtered to yield sequencing depths between 35 and 91. Raw SNPs sites were additionally filtered based on subsequent criteria: copy number 2, and a least of 5 bp apart, with the exclusion of insignificant allele frequencies (MAF ! 0.05) where SNPs were engaged when distance between SNPs was <5 bp. To check the SNPs calling accuracy of SAMtools, fragments were casually selected and amplified using corresponded primers, and subsequent PCR products exposed to Sanger sequencing for concordance rates. Both HTML and text production files were created from SOAPsnp; output included the chromosome (Chr), position, reference base <-> sample base, SNPs status, strand, annotation type, feature type, codon phase (phase of the codon mutation), codon mutate (the codon before mutation <-> the codon after mutation), amino acid mutate (amino acids before mutation <-> amino acids after mutation), synonymous mutations, non-synonymous mutations, start position, end position, gene id and its functions. Using these annotation files, SNPs could correspond to a homologous functional element region, such as coding sequence (CDS) region, exon region, UTR region and others. This information could help us to select desired SNPs.

Insertions or deletions detection
SOAPindel software was designed for all paired-end sequencing data, wherein SOAPindel comparison allowed open gaps, and short insertions or deletion (InDel) detections was performed. During short InDels testing, all reads met paired-end requirements, and only one end gaps contained the alignment. In this process, the length of gap was set, not more than 5 bp. In the detection of InDels, at least three pairs of reads were required to define an InDel. For InDels annotation, analysis of regional information and impact of InDels on CDS region, such as in a frameshift mutation, was possible.

Structural variations detection
The SOAPsv (http://soap.genomics.org.cn/SOAPsv.html) program was used to detect structural variations by whole genome de novo assembly system requirements. Structural variations (SVs) is one of the important variations between individuals of the same species. In contrast to paired-end, if there is a structural difference an individual sequence and the reference sequence, a comparison cannot be made. The abnormal condition of this double end alignment can be used to detect SVs. At present, the main types of SVs can be detected, such as insertion, deletion, duplication (tandem duplication and dispersed duplication), inversion (upstream and downstream) and complex. Sometimes, two types of SVs can occur simultaneously.

DNA level functional annotation and screening of variant genes
Gene functions were annotated by BLAST alignment with Japanese apricot reference genes (Zhang et al., 2012)  Explanation of data based on homologous genes was utilized to monitor for unique genes associated with colour formation. Non-synonymous SNPs, InDels or SVs were examined on the basis of green-skinned ('QJM,' 'YLM' and 'SKM') and red-skinned ('RHM,' 'XZM' and 'ZHM') cultivars phenotypes.

Reverse transcription-quantitative polymerase chain reaction
Reverse transcription-quantitative polymerase chain reaction (qRT-PCR) was carried out to identify the comparative alteration in expression of genes identified in re-sequencing analysis. Total RNA samples were extracted from peels using CTAB extraction method and used for expression analysis (Tong et al., 2009).
SuperScript II RT (Invitrogen) was used to synthesize the first-stand cDNA from Individual sample total RNA with an Oligo primer. qRT-PCR reactions were performed in 96-wells plates using a 7300 Real-Time PCR System (Applied Biosystems, Foster City, CA, USA) using SYBR Green PCR Master Mix (Applied Biosystems, Foster City, CA, USA). Beacon designer 7.0 program (Premier Biosoft International, Palo Alto, CA, USA) was used to design gene-specific primers. Each qRT-PCR reaction (20 mL) confined 8.6 mL ddH 2 O, 0.2 mL (100 nM) of each primer, 10 mL SYBR Green II Master Mix and 1 mL of diluted cDNA. To standardize the entire quantity of cDNA in individually reaction, a housekeeping gene, RPII (sense: 5′-TGAAGCATACACCTATGATGATGAAG-3′, antisense: 5′-CTTTGACAGCACCAGTAGATTCC-3′) was used as a reference gene in control reactions (Tong et al., 2009). All primers that were used in reactions are listed in Table 1. The expression levels of relative genes were measured using the 2 -ÁÁCt method (Ramakers et al., 2003).

Extraction and analysis of anthocyanin
Extraction of anthocyanins was carried out conferring to previously reported methods with some alterations (Pomar, Novo & Masa, 2005). Peels (2 g) were detached from pulp to obtain juice with 2% formic acid in methanol (10 mL) for 24 h in dark. The process was reiterated four times. The extracts attained were assorted and centrifuged at 12,000 rpm for 25 min, and then the supernatant was evaporated to dryness at 40 C and reconstituted in 2% formic acid in methanol (5 mL). Individual sample was accomplished for third times, and the extracts were used for anthocyanin pigmentation identification. Cyanidin 3-glucoside (Cy-3-glc) (Sigma Chemical, St. Louis, MO, USA) was used as a standardized, dissolved in acidified MeOH (2% formic acid) to obtain a concentration of 1 mg/mL. The UPLC system with the Xevo Ò G2-XS QTof Mass Spectrometer was used for separation according to the method described by Hosseinian, Li & Beta (2008).

UFGT enzyme activity
Pericarp (1 g) was pulverized with a mortar and pestle in liquid N 2 . Reaction substrate was prepared following the method (Wu et al., 2017) with some modifications. Reaction tubes including enzyme were incubated at 25 C for 50 min, and stopped through accumulation of 20% trichloroacetic acid in methanol (75 mL). An ultra-performance liquid chromatograph (UPLC) was fortified with a opposite segment column ACQUITY UPLC HSS C18 (1.8 mm particle sizes, 100 mm Â 2.1 mm I.D.) (Waters, Milford, MA, USA) for separation. The UPLC technique was carried out with the method described by Wu et al. (2017). UFGT were enumerated using quercetin-3galactoside (Sigma Chemical, St. Louis, MO, USA) as an ordinary. Lastly, UFGT action was characterized by mg quercetin-3-gal · g -1 FW.

Statistical analysis
Analysis of variance (ANOVA) was carried out to compare cultivar mean values using IBM SPSS Statistics 18 (SPSS Inc., Chicago, IL, USA). The least significant difference test was employed to determine variances between means at a 5% significance level. GraphPad Prism version 6.0 (GraphPad Software, San Diego, CA, USA) was used for graph plotting.

Variation in phenotypic traits of six Japanese apricot cultivars
Phenotypic observations showed a significant difference between red-and green-skinned cultivars due to red pigmentation of fruit skin ( Fig. 1). 'XZM' showed highest red pigmentation from all cultivars followed by 'RHM' and 'ZHM,' whereas green-skinned cultivars 'SKM' and 'YLM' had trace amounts or no colour pigmentation. Phenotypic indicators revealed no significant differences in flank diameter, vertical diameter and width among all red-skinned, whereas the values for 'QJM' and other two green-fleshed cultivars, 'SKM' and 'YLM,' were significantly lower than those of all other cultivars ( Table 2). The 'QJM' cultivar had the highest fruit weights, while other two green-skinned cultivars were approximately 1/8 to 1/7 the weight of 'QJM.' The recorded fruit weights of three green-skinned cultivars 'QJM,' 'SKM' and 'YLM' were 36.89, 5.27 and 4.36 g, respectively. The vertical diameter of all red-skinned were more than all green-skinned cultivars, for 'width and flank diameter,' 'QJM' and all red-skinned cultivars were close in value and had no significant difference except two green-skinned cultivars, 'SKM' and 'YLM.' The weights of red-skinned cultivars 'RHM,' 'XZM' and 'ZHM' were recorded as 26.68, 24.22 and 30.17 g, respectively ( Table 2). The highest soluble solids were found in 'QJM' (9.16%), closely matched by 'SKM' (9.12%), whereas the concentration in other cultivars ranged between 5.79% and 8.83%. The soluble solids contents in red-skinned cultivars were recorded lower than those in green-skinned cultivars (Table 2).
CIELab parameters were also measured for these cultivars (Table 2). Significant differences appeared between red-and green-skinned cultivars. 'XZM' exhibited highest a Ã (16.58) value and lowest b Ã (11.18) value, representing the highest amount of red pigmentation among the studied cultivars, followed by 'RHM' and 'ZHM' with a Ã values of 15.93 and 12.62, respectively, and b Ã values of 11.93 and 13.37, respectively. 'QJM' showed the lowest a Ã (-18.83) value and a higher b Ã (34.26) value, revealing the highest amount of green pigmentation, whereas 'YLM' revealed the second lowest a Ã (-17.01) value above 'SKM' (-9.77) and a lower b Ã (40.52) than 'SKM' (43.91), indicating higher amount of green pigmentation but a lower amount of yellow pigmentation. Moreover, lightness (L Ã ), hue angle (H Ã ) and Chroma (C Ã ) varied significantly between red-and green-skinned cultivars, all three parameters were higher in green-skinned than in red-skinned cultivars.

Note:
L * means lightness, a * means red-green chromaticity, b * means yellow-blue chromaticity. Each sample chose from 10 replications by changing the position of the sunny surface at maturity stage. C means Chroma, and h means hue angle, they were calculated being Chroma = (a * 2 +b * 2 ) 1/2 and Hue = tan -1 (b * /a * ), respectively. Values represent as a mean ± standard deviation, n = 3. Different letters in rows indicate significant differences among mean values of treatments (p < 0.005).

Re-sequencing of six cultivars
Genomic variants amongst the six genotypes incorporated SNPs, InDels and SVs. Six DNA libraries were built in order to perform whole genome re-sequencing. In total, 92.2 gigabits (Gb) of crude sequence data, about 15.3 Gb (clean data) per individual remained after filtering the paired-end sequences of individual cultivar (

SNPs detection
SOAPsnp (http://soap.genomics.org.cn/soapsnp.html) software was used to detect SNPs and considered the main variant among these six genotypes. On the basis of same

Notes:
Values represent as a mean ± standard deviation, n = 3. Different letters in rows indicate significant differences among mean values of treatments (p < 0.005). ND, no detected. sequence information (i.e. after comparing all SNPs of the six samples), filtering out the polymorphic loci detected between reference sequence and genotype by low quality and depth, high reliability of SNP data was attained. A total of 1,387,518 SNPs were detected in 'QJM,' of which 53.09% were homozygous and 46.91% were heterozygous (Fig. 3A) (Fig. 3A). The distribution of SNPs corresponding to functional element regions (such as CDS, exons, UTR, etc.) revealed that the mRNA region (46.15%) occupied the majority of SNPs among all functional regions, followed by the gene region (41.05%). Exons accounted for third most enriched region, and considered to be an important CDS region, the other  Single nucleotide polymorphisms in CDS region are worthy of attention based on explanation of the P. mume reference genome. SNPs that change the translation start-stop site or variable splice sites would cause phenotypic change. Approximately 10,397 (0.14%) of identified SNPs were considered large-effect SNPs consisting of premature stop (5,787 SNPs), stop-codon to non-stop codon (1,997 SNPs), start-codon to non-start codon (616 SNPs) and splice site (1,997 SNPs). The cultivar 'QJM' demonstrated the highest number of large-effect SNPs (1,882 SNPs), 'RHM' (1,878 SNPs) showed no significant difference from 'QJM,' followed by 'ZHM' (1,768 SNPs), 'YLM' (1,762 SNPs) and 'SKM' (1,626 SNPs), whereas 'XZM' revealed the lowest number of large-effect SNPs (Fig. 3C).
The distribution of InDels was not significantly different from that of SNPs among eight chromosomes in all six cultivars. 'SKM' (82072) showed highest number of InDels of all six genotypes, followed by 'ZHM' (79160), 'QJM' (77390) and 'YLM' (77381), while 'XZM' contained the lowest number of InDels of all cultivars. The percentages of insertions and deletions among eight chromosomes in all six cultivars were similar: insertions ranged from 47.1% to 49.5%, slightly less than deletions (50.5% to 52.9%). Chr 2 had the most abundant variation of all eight chromosomes, accounting for 20% of all variation (Table S2).
Frameshift mutations are genetic mutations affected by insertions and deletions of multiple nucleotides in a DNA sequence. Some InDels were projected to be frameshift mutations, but the lengths of InDels in coding regions were more probable from 1 to 5 (the length of a codon), than InDels in the rest of the genome (Fig. 5). In this study, two types of mutation were detected; the most frequent frameshift was a 3X-shift mutation in CDSs (3987), including phase0 (1,320) and phaseNo0 (2,667), followed by frameshift mutations in CDSs, which were the second highest number of mutations (3,245), whereas 'QJM' had the highest number of two-shift mutations (Fig. 3F).

SVs detection
Structure variation was important between individuals of the same species, and the types detected could be insertions and deletions, repetitions, inversions and translocations. Because of their small contribution to variation, types of variation other than insertions and deletions were classified as others. The distribution of SVs in different regions showed (Fig. 3E) no significant difference between SNPs and InDels, and this region was the highest amount of structure variation among all regions. 'XZM' had the most variation in CDSs, the regions of most interest to us, 'QJM' cultivar has the least variable while other four cultivars were not significantly different. The distribution of SVs was not significantly different from that of InDels and SNPs among eight chromosomes in all six cultivars (Table S3). The highest variation was observed on chr 2 (16.6%-18.4%), and distribution of other chromosomes was similar. The percentage of insertions and deletions was significantly different among all cultivars between green-and red-skinned: approximately 60% of SVs was insertion in 'QJM,' while 'YLM' had the lowest number of insertions of all cultivars (7%). A total of 40,055 SVs were detected in 'QJM,' in which insertions accounted for 60% (the highest amount), and deletions accounted for 43.76% (the lowest amount) of the identified SVs. The opposite trend was observed in 'YLM': it had the lowest number of insertions and the highest number of deletions, whereas the distributions of other types of SVs differed little, one to another.

Functional annotation of databases
A change in function of a gene was usually because of the existence of non-synonymous (NS) mutation, frameshift (F) and SVs; in turn, expression of related protein was influenced. All tested and functionally explained genes were categorized into GO (Fig. 6). Approximately 4,736 genes per cultivar were separated into three groups: molecular function, cellular components and biological process. Among all categories, the highest number of genes (2,060 on average) were recorded in molecular function, while the lowest number of genes were observed in cellular components (1,657) and biological process categories (1,020), respectively. GO enrichment classification recommended that the genes from molecular function, cellular component and biological process groupings could be classified into three, 11 and 18 groups, respectively.
Genes explained by BLASTx, with a threshold of 10 -5 , were noted in six public databases comprising the GO, NCBI non-redundant (NR), nucleotide sequence (NT), Swiss-Prot, Clusters of Orthologous Groups of Proteins (COG) and KEGG databases. A total of 23,698 genes in 'QJM,' 23,554 genes in 'SKM,' 23,572 genes in 'YLM,' 23,597 genes in 'RHM,' 23,458 genes in 'XZM' and 23,640 genes in 'ZHM' were annotated successfully (Fig. 7). All the genes were noted in the NR database, and the distributions of genes annotated in six databases were not significantly different. At least 8,605 genes ('XZM') Overall, CDS is a region of DNA that is translated to form protein, and detection of sequence variation occurring in the CDS region becomes extremely important for gene function analysis. Through genome-wide comparative analysis of six re-sequence data sets in the CDS region, a total of 181,331 positions of SNPs, 3,318 positions of InDels and 51,427 SVs detected 19,774, 1,937 and 13,850 unigenes, respectively. The significant difference between green-and red-skinned cultivars is the colour pigmentation due to existence of anthocyanin, in order to obtain the same sequence variations occurring in all green-skinned cultivars, while red-skinned cultivars have either no variation or synonymous mutations only associated with the biosynthesis and accumulation of anthocyanin. After comparison found that a sum of 3,143 SNPs (1,350 unigenes), 60 InDels (32 unigenes) and 337 SVs (122 unigenes) differed between green-and red-skinned cultivars (Table 5), including 25 unigenes containing two sequence variations at same time, it was a simple task to find the important variations which leads to pigmentation changes: there are four pathways, namely phenylpropanoid biosynthesis (31 unigenes, ko00940), flavonoid biosynthesis (15 unigenes, ko00941), flavone and flavonol biosynthesis (seven unigenes, ko00944), isoflavonoid biosynthesis (two unigenes, ko00943), that were found to be related to biosynthesis of anthocyanin according to previous research results. TFs (26 unigenes), as a series of regulatory genes that control structural gene transcription, were detected, including genes encoding MYB (four unigenes), bHLH (two unigenes), WD40-repeat protein (one unigenes), NAC (one unigenes), etc. Twenty-one unigenes may be related to accumulation of anthocyanin.

Genes validation by qRT-PCR at transcriptional level
Fruit colour is determined by the accumulation and biosynthesis of anthocyanins, mediated by multiple enzymes in the phenylpropanoid and flavonoid pathways, regulatory genes and TFs. According to functional annotation results, 13 candidate homologous genes were revealed related to colour pigmentation, the translated proteins were detected as NS, InDels or SVs and 13 candidate genes participated in the biosynthesis of anthocyanin (Table 6). The expression patterns of genes encoding three anthocyanin synthesis structural genes (4CL, F3H and UFGT), five TFs (MBW complexes and NAC) and five related regulated genes (GST1, RT1, UGT85A2, ABC and MATE transports) were definite through quantitative reverse transcription-polymerase chain reaction (qRT-PCR), expression pattern of these candidate genes were not significantly different, and transcription levels of genes in red-skinned cultivars were significantly higher than those of the genes in green-skinned cultivars (Fig. 8).

UFGT enzyme analysis
UDP-glucose flavonoid-3-O-glycosyltransferase is assumed to be an important enzyme for anthocyanin biosynthesis in many plants. In this study, UFGT activity was identified in all cultivars. While the lowest activity was 0.01 mg g -1 in 'YLM' and 0.03 mg/g in 'SKM,' compared with green-skinned cultivars, red-skinned cultivars showed significantly higher UFGT activity. 'XZM' cultivars had the highest activity (1.3 mg/g) of all cultivars, followed by 'RHM' (0.5 mg/g) and 'ZHM' (0.2 mg/g) (Fig. 9).

DISCUSSION
In the present study, we mainly focused on understanding the colouring mechanism in Japanese apricot. Different experiments were conducted for this study. Firstly, we used UPLC/MS method to identify the main anthocyanin components, we identified a component 'cyanidin-3-glucoside' a major anthocyanin in Japanese apricot fruit, consistent with previous studies conducted in flowers (Changling, Weiming & Junyu, 2004). However, the molecular mechanisms of anthocyanin biosynthesis in Japanese apricot are not yet clear. After that, we performed genome re-sequencing to investigate the molecular mechanism of synthesis and accumulation of anthocyanin, comparing gene variations between red-and green-skinned cultivars and confirmed using quantitative qRT-PCR to find the expression pattern of those candidate genes, we found that three anthocyanin synthesis structural genes (4CL, F3H and UFGT), five TFs (MBW complexes and NAC) and five genes (GST1, RT1, UGT85A2, ABC and MATE transports) were possibly involved in the anthocyanin metabolic pathway of Japanese apricot. In addition, developing SNP makers in a wider range of Japanese apricot varieties by using re-sequencing data.

Genes encoding enzymes in the anthocyanin biosynthetic pathway
Multiple enzymes participated in anthocyanin biosynthesis in phenylpropanoid and flavonoid pathways. In this study, three candidate enzymes were discussed (Fig. 8). 4CL was the final enzyme of the general phenylpropanoid pathway, converting phenylalanine to 4-coumaroyl-CoA, a product that resulted in a range of flavonoid compounds, such as pro-anthocyanidins and anthocyanins (Boss, Davies & Robinson, 1996b). Ozeki (Ozeki & Komamine, 1985) studied the regulated relationship between 2,4-D and anthocyanin synthesis, and when no anthocyanin synthesis occurred, the activities of 4CL increased one day after transfer due to transfer effect but consequently decreased and remained at low levels. When the activities of 4CL increased, and reached up to maximum, anthocyanin was synthesized most rapidly. Christie (Christie, Alfenito & Walbot, 1994) found anthocyanin accumulation in all tested lines with 4CL-homologous transcripts increased at least threefold over levels in unstressed plants (no anthocyanin accumulation). These observations agree well with those qRT-PCR results mentioned above. The primary phases of the flavonoid pathway were from 4-coumaroyl CoA through chalcone and naringenin to dihydroflavonol, in which F3H was one of three main enzymes (other two were CHS, CHI converted naringenin flavanone to dihydroflavonols). F3H gene expression seems to be fundamental in regulation at bifurcation of anthocyanin and flavonol branches, catalyses the stereospecific hydroxylation of (2S)-flavanones at 3-position of C-ring to (2R, 3R)-dihydroflavonols (Sparvoli et al., 1994). Zuker (Zuker et al., 2002), working on antisense suppression to block the expression of a flower colour gene encoding F3H of carnation (Dianthus caryophyllus L.), found that transgenic plants showed flower colour alterations ranging from attenuation to complete loss of their orange/reddish colour. F3H expression was explored to clarify the molecular mechanism of red colouration in apple; results indicated that the F3H gene was co-ordinately expressed during fruit development, and its levels of expression was positively related to the degree of anthocyanin concentration (Honda et al., 2002). Castellarin (Castellarin et al., 2007) performed dehydration to magnify the expression of F3H gene, which described well the increase of anthocyanin content in grapevines (Vitis vinifera L.) during experimental seasons. In radish (Raphanus sativus) varieties, the F3H gene was remarkably expressed through accumulation of sucrose in red radish; in contrast, the expression of F3H was powerfully suppressed in the white variety despite the accumulation of sucrose, showing that F3H activated the biosynthesis of anthocyanins via regulation of TFs (Hara et al., 2004).
UDP-glucose flavonoid-3-O-glycosyltransferase, an enzyme participated in the late step in anthocyanin biosynthesis, transfers the glucosyl moiety from UDP-glucose to the 3-hydroxyl group of acceptor molecules in a glucosyltransferase catalytic reaction. The role of UFGT is to anthocyanidins were glucosylated by UFGT during red-skinned fruit ripening. Dissimilarities in colour strength make contribution to the expression differences of structural genes. High expression of UFGT has been detected in red skin of grapes, while in white grapes, almost extremely low expression was found (Boss, Davies & Robinson, 1996b). The lack of anthocyanins detected in white Malay apple fruits was due to the absence of detectable levels of UFGT transcripts (Kotepong, Ketsa & van Doorn, 2011). Takos (Takos et al., 2006) confirmed that the transcript levels of genes encoding UFGT in multiple anthocyanin pathways was much higher in red-skinned apples than in others. The transcript expression pattern of gene encoding UFGT reach the fastest accumulation speed at both primary and late developmental stage was confirmed in other fruits like peach (Tsuda et al., 2004), bilberry (Jaakola et al., 2002).
In the current study, transcript levels of 4CL, F3H and UFGT, consistent to the flavanol and anthocyanin levels, were greater in red-skinned fruits than in green-skinned cultivars (Fig. 8); up-regulated structural genes in red-skinned fruits indicated three genes contributed in the accumulation of anthocyanin that led to red pigmentation in plants.
The anthocyanin pathway genes are regulated by interaction of DNA binding R2R3-MYB TFs and MYC-like bHLH and WD40-repeat proteins (MBW complexes) in plants (Chagné et al., 2007). Two MYB genes, MYB29 and MYB114, were found in this present study to be pertain in regulation of anthocyanin synthesis (Li et al., 2014;Schwinn et al., 2016;Sun et al., 2016) (Fig. 8). Schwinn (Schwinn et al., 2016) reported that MYB29 was associated with bulb colour (Allium cepa L., Allioideae, Asparagales): as an R2R3-MYB TF that regulated the flavonoid pathway, it was demonstrated that MYB29 belonged to sub-group that carried out biosynthesis of flavonoids. Overexpression of MYB5 or MYB114 strongly induces proanthocyanidin (PA) accumulation in Medicago truncatula hairy roots, and both MYB5 and MYB114, mutants of M. truncatula, which exhibit darker seed coat colour than wild-type plants, were revealed to physically interact and synergistically activate the promoters of anthocyanidin reductase and leucoanthocyanidin reductase (Liu, Jun & Dixon, 2014). Gonzalez (Gonzalez et al., 2008) reported that overexpression of MYB114 resulted in enhanced pigment production in a TTG1-and bHLH-dependent manner. Li (Li et al., 2014) found that JA promotion of anthocyanin accumulation under far-red light is dependent on the phytochrome A signalling pathway; at the same time, knockdown expression of MYB114 significantly reduces methyl jasmonic acid promotion of anthocyanin accumulation.
As a regulatory complex comprising another candidate TF, bHLH30-like (bHLH30), and WD repeat-containing protein 75 (WD40), the expression arrangements of these two candidate genes, bHLH30 and WD40, in six cultivars were the same as those of MYB family genes (MYB5 and MYB114), which are upregulated throughout fruit ripening in red-skinned associated with green-skinned cultivars. Lc (leaf color) as the first plant bHLH protein involved in maize tissue-specific anthocyanin pigmentation by mutant analyses (Dias et al., 2003). Zhou (Zhou, Shi & Xie, 2012) reported that nitrogen regulated the synthesis of anthocyanin in red cells operates through the mechanism by which the expression levels of genes encoding equally major constituents of TTG1-GL3/TT8-PAP1 complex and negative regulators was effected by nitrogen, which was a complex of bHLH and WD40; this may be same as in our present study. All three regulatory complex genes were highly upregulated in red-skinned relative to green-skinned cultivars.
Like MBW regulatory complex, the TF NAC was also found and related to red colour formation. The NAC family are plant-particular TFs and have 106 and 149 members pretend in Arabidopsis and rice, correspondingly (Gong et al., 2004;Xiong et al., 2005). Morishita (Morishita et al., 2009) found that NAC078 protein regulated the expression of genes related to the biosynthesis of flavonoids and the level of anthocyanins were expressively increased in NAC078-containing plants and reduced in NAC078-knockout plants of Arabidopsis. In orange fruit, a gene encoding TF NAC domain protein was induced the biosynthesis of anthocyanins in blood oranges but not in common oranges. The same phenomena occurred in raspberry in a previous study: a candidate gene encoding NAC (CUC2-like) was accountable for the accumulation of cyanidin and pelargonidin pigments (Kassim et al., 2009). Our study showed that NAC TF 29 was highly upregulated in red-skinned relative to green-skinned cultivars, which suggested that it is a potential TF contributing to the regulation of red colour formation in Japanese apricot.
Among these candidate TFs, the functions of only two MYB TFs (MYB29 and MYB114) have been studied in Arabidopsis and other plants, whereas reports on TF function are rare in Japanese apricot (Fig. 8). The other three candidate genes have not been stated to be intricate in the biosynthesis of anthocyanins. Advance studies are desired to regulate whether changes in the transcription of these candidate genes are correlated to regulation of anthocyanin metabolism. Moreover, in Japanese apricot, the relationship between regulatory complex (MBW) and anthocyanin biosynthesis leftovers is unclear. Subsequently, this matter should also be explored.

Related genes contributed to accumulation of anthocyanin
There were several candidate genes participating in biosynthesis or the accumulation of anthocyanin. However, these genes were not the same as structural genes, which could directly impact the anthocyanin synthesis pathway, and neither were they like TFs, which could control structural gene transcription; they may be related to regulation of the anthocyanin biosynthesis pathway.
In this study, seven candidate genes were identified that might be associated to the biosynthesis of anthocyanin (Fig. 8). Glutathione S-transferase (GST1) probably have the same function as Bz2 gene in maize (Marrs et al., 1995), which encodes a protein with GST activity conjugated with glutathione and the inhibitor vanadate in plant tonoplast and has been shown to inhibit the accumulation of anthocyanins in vacuole. Alfenito (Alfenito et al., 1998) revealed that An9, which encoded a kind of glutathione S-transferase which transported glutathionated cyanidin 3-glucoside (C3G) to the vacuole, regulated transcriptional activator of the anthocyanin pathway. In Vitis vinifera L., two other GST genes (VvGST1 and VvGST4) showed different induction patterns, but all their transcriptional profiling showed that they induced intensively different accumulation of anthocyanin from that of Bz2 and An9 in maize and petunia, correspondingly (Conn et al., 2008). TT19 was demonstrated in Arabidopsis that works as a carrier for uptake of anthocyanins or proanthocyanidin precursors into the vacuole and may protect cyanidin from degradation during transport process (Kitamura, Shikazono & Tanaka, 2004). GST in strawberry was also mentioned as one of the list of differentially expressed gene related to biosynthesis anthocyanin (Gu et al., 2015).
In petunia hybrida, expression of UDP-rhamnose/rhamnosyltransferse (Rt) gene was induced by intense light, and transcription of UDP-rhamnose: anthocyanidin-3-glucisde rhamnosyltransferse (3RT) played a key role in promoting the accumulation of anthocyanin in epidermal cells of the petal (Morita et al., 2005). Regarding secondary metabolism in Norton grape (Ali et al., 2011), the transcriptional profile of UDP-rhamnose/rhamnosyltransferse was proposed to be associated with the anthocyanin pathway. The RT gene downregulated the activity of PAL when it was low; however, PAL was a structural gene in the anthocyanin pathway, indicating that RT gene influenced the biosynthesis of anthocyanin (Given, Venis & Grierson, 1988).
The transcriptional expression of UDP-glycosyltransferase 85A2-like (UGT85A2) was highly upregulated in red-skinned comparative to green-skinned cultivars, indicating that it may be associated with the biosynthesis of anthocyanin. In Crocus species, transcription of UDP-glycosyltransferase 703B1 was found only in stigmas and petals where anthocyanin accumulates (Nørbaek et al., 2002;Moraga et al., 2009). A relationship between Arabidopsis thaliana mutant UGT72B1 and decreased accumulation of anthocyanin in floral stems was confirmed, and an association between UGT86A1 with the anthocyanin biosynthesis pathway in strawberry was also mentioned (Gu et al., 2015). The ABC and MATE transporter families are two important groups of proteins whose members participate in an extensive variety of transport processes. The upregulation of ABC transporters demonstrated that ABC transporters take part in transport of anthocyanin into vacuoles of Arabidopsis (Lu, Li & Rea, 1997). In yeast, two multidrug resistance-associated protein-type ABC transporters were found to facilitate vacuolar uptake of anthocyanin-glutathione conjugates (Rea, 1999), and same results were reported in maize (Debeaujon et al., 2001) and tea (Li, 2016). The MATE transporters TT12 of Arabidopsis work as a precursors that active in cells synthesizing proanthocyanidins, thus inducing vacuolar accumulation of proanthocyanidins in the seed (Gomez et al., 2009).
All related genes intricate in regulation of the biosynthesis of anthocyanin were studied in this experiment and showed same expression pattern as the structural genes and TFs, also representing that these genes may be associated to anthocyanin accumulation.

Gene expression encoding UFGT enzyme
As the gene encoding enzyme accountable for preceding step in biosynthesis of anthocyanin, the UFGT gene played an important role in preeminent substances of anthocyanins and flavonols in red pericarps and their effective absence in green pericarps. In this study, gene expression of UFGT was significantly upregulated in red pericarps, while almost no one was detected in green pericarps (Fig. 9). At the same time, the enzyme activity was detected among all Japanese apricot cultivars, whether in coloured skin where anthocyanin was accumulated or in green pericarps where no colour pigmentation was present, but the activity of UFGT was expressively greater in red pericarps than in green pericarps (Fig. 9). Studies of other plant species have shown that the expression of UFGT gene was acute for anthocyanin biosynthesis. For example, the expression of StUFGT, which induced corresponding by gibberellic acid and sucrose, was detected in roots, leaves and stems, suggested that UFGT was related to the accumulation of anthocyanin in potato (Hu et al., 2011). In grape berry, UFGT gene expression was only identified in red-skinned spots, and in Kyoho it was detected by northern blot analysis, indicating that the UFGT gene is a main regulator of anthocyanin biosynthesis (Kobayashi et al., 2001). Unlike other genes of the anthocyanin pathway, expression of UFGT was expected to mirror the anthocyanin content, suggesting that fruit colouration was strongly influenced by UFGT expression in litchi . In conclusion, the UFGT gene was the most important gene correlated with the biosynthesis and accumulation of anthocyanins, contributing to red pigmentation of Japanese apricot.

CONCLUSION
Empiric genome re-sequencing on an Illumina platform was a realistic means to scientifically investigate the universal genomic variations associated with the accumulation of anthocyanins in Japanese apricot fruits. We investigated that the genes participating in developments of anthocyanin biosynthesis, TFs and related genes were only upregulated in red-skinned cultivars, confirming that these genes may have the function of regulating anthocyanin biosynthesis. Generally, colourful fruits such as those of the ornamental Japanese apricot are very attractive to consumers. Considering the molecular mechanisms underlying their formation is imperative, predominantly for understanding the transcriptional regulatory networks of dissimilar coloured fruits and alterations in accumulation and distribution of different compounds in coloured cultivars. This data can provide valuable information and will subsidize to the selection of plant germplasm resources for marker-assisted selection as well as for developing the nutritional and medicinal importance of Japanese apricot in the future.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the Six Telent Fund of Jiangsu Province (No. NY068) and the project of crop germplasm collection and evaluation (2016NWB032). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Muhammad Khalil-ur-Rehman authored or reviewed drafts of the paper. Zhihong Gao conceived and designed the experiments, approved the final draft.

Data Availability
The following information was supplied regarding data availability: The raw sequencing data in this paper are uploaded to the GenBank SRA database. The BioProject is PRJNA371370. The data for the 'Xiaoyezhugan' cultivar have been deposited at SRA under accession SRR5241555. The data for 'Zhonghong' cultivar have been deposited at SRA under accession SRR5241554. The data for the 'Shinanokoume' cultivar have been deposited at SRA under accession SRR5241553. The data for the 'Yanglao' cultivar have been deposited at SRA under accession SRR5241552. The data for the 'Ruantiaohongmei' cultivar have been deposited at SRA under accession SRR5241550. The data for the 'QingjiaNo.2' cultivar have been deposited at SRA under accession SRR5241549.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/ 10.7717/peerj.4625#supplemental-information.