Genome-wide identification of ubiquitin proteasome subunits as superior reference genes for transcript normalization during receptacle development in strawberry cultivars

Gene transcripts that show invariant abundance during development are ideal as reference genes (RGs) for accurate gene expression analyses, such as RNA blot analysis and reverse transcription–quantitative real time PCR (RT-qPCR) analyses. In a genome-wide analysis, we selected three “Commonly used” housekeeping genes (HKGs), fifteen “Traditional” HKGs, and nine novel genes as candidate RGs based on 80 publicly available transcriptome libraries that include data for receptacle development in eight strawberry cultivars. The results of the multifaceted assessment consistently revealed that expression of the novel RGs showed greater stability compared with that of the “Commonly used” and “Traditional” HKGs in transcriptome and RT-qPCR analyses. Notably, the majority of stably expressed genes were associated with the ubiquitin proteasome system. Among these, two 26 s proteasome subunits, RPT6A and RPN5A, showed superior expression stability and abundance, and are recommended as the optimal RGs combination for normalization of gene expression during strawberry receptacle development. These findings provide additional useful and reliable RGs as resources for the accurate study of gene expression during receptacle development in strawberry cultivars.


Background
The cultivated octaploid strawberry (Fragaria × ananassa) is an important fruit crop grown worldwide. The wild diploid strawberry (Fragaria vesca) has emerged as a model system for strawberry made possible by the availability of a draft genome sequence (~240 Mb) and its relative transformability [1]. In botanical terms, the fruit of strawberry is an aggregate fruit composed of multiple achenes on the surface of the juicy flesh, which is accessory tissue developed from the enlarged receptacle (Fig. S1). The process of strawberry fruit development is divided into the early phase dominated by growth, and the ripening phase when the achenes enter dormancy accompanied by dramatic developmental changes in the receptacle, such as color changes, softening, and flavor development. The regulatory mechanism of fruit development is of considerable interest to plant scientists and breeders. In particular, elucidation of the molecular events involved in fruit development is required. Quantification of gene expression levels is crucial to unravel this complex regulatory network. Reverse transcription-quantitative real time PCR (RT-qPCR) is a favored approach used for quantification of gene expression on account of its specificity, accuracy, and reproducibility. Accurate normalization is fundamental for reliable analysis of RT-qPCR data. Therefore, this technology requires stably expressed reference genes (RGs) for expression normalization of target genes. Failure to use an appropriate RG may lead to biased gene expression profiles and low reproducibility.
Traditional housekeeping genes (HKGs) are used commonly as RGs on the basis of their essential cellular roles and therefore are thought to be stably expressed. To date, the RG transcripts most frequently used for RT-qPCR in strawberry fruit studies include three traditional HKGs that encode the 26-18S rRNA intergenic spacer [2,3], Actin [4,5], and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) [6,7]. Unfortunately, traditional HKGs, including the four HKGs used in strawberry fruit studies, are utilized generally without validation of their stability and based on the supposition that the genes are expressed at constant levels under all conditions. Increasing evidences question the reliability of traditional HKGs, which can be subject to considerable variation under certain conditions, including different developmental stages [8]. For instance, traditional HKGs analyzed from a developmental series of Arabidopsis seed and pollen samples show highly variable expression [9]. Therefore, it is essential to evaluate appropriate RGs for the experimental system under study. For this purpose, several research groups have developed software, such as geNorm [10], BestKeeper [11], NormFinder [12], and Delta CT [13], which are commonly used for statistical analyses and selection of the most stably expressed RGs. In previous researches, a few members of traditional HKGs as candidate RGs were assessed in studies of strawberry fruit ripening, of which FaRIB413 (26-18S rRNA), FaACTIN, FaHISTH4, FaDBP and FaUBQ11 were recommended as appropriate RGs [14][15][16][17]. Unfortunately, these results were restricted in scope and rationalization to selection of the candidate genes evaluated.
Transcriptomic analyses are extensively used in investigations of complex molecular processes in plants. Deep RNA sequencing (RNA-seq) as a global evaluation technique provides a representative snapshot of a transcriptome given its globality, high resolution, and sensitivity. One strategy is to mine RNA-seq data sets for identification of the optimal RGs that are stably expressed over a diverse set of conditions. This approach has been successfully employed in several plant species, such as Arabidopsis [9], rice [18], and soybean [19]. Previously, Clancy et al. (2013) have identified a set of strawberry (Fragaria spp.) constitutively expressed RGs during strawberry fruit ripening by merging digital gene expression data with expression profiling; among these, FaCHP1 and FaENP1 were recommended as appropriate RGs [20]. However, this result were restricted in the statistical limitations of the study due to the small sample size. The extensive RNA-seq data sets previously generated for stages of receptacle development in strawberry provide valuable resources for screening of the optimal RGs across receptacle developmental stages [21][22][23][24].
In this study, we selected 3 "Commonly used" HKGs, 15 "Traditional" HKGs, and 9 novel genes as candidate RGs based on genome-wide and available RNA-seq data, which were assessed during receptacle development in nine independent experiments from eight strawberry cultivars. The results revealed a tendency for all novel RGs to show greater expression stability, compared with that of the "commonly used" and "traditional" HKGs, in transcriptome and RT-qPCR analyses. The genes RPT6 and RPN5A, subunits of ubiquitin proteasome, are recommended as the optimal combination of RGs in strawberry receptacle development. These findings provide additional useful and reliable RGs as resources for the accurate study of gene expression during receptacle development in cultivars of strawberry.

Identification of HKGs with stable expression during receptacle development in strawberry
Among the most frequently used RGs for RT-qPCR in studies of strawberry fruit are the genes encoding 26-18S rRNA, Actin, and GAPDH. These genes have been recognized as stably expressed HKGs and historically used as RGs in other plants. Previously, the potential of 16 pre-selected traditional HKGs were evaluated during fruit ripening [14][15][16][17]. However, the existence of additional superior RGs among these gene families has not been investigated previously. To address this shortcoming, we identified 6,6,13,3,16,19,8,42, 102, 54, 3 and 8 members of the Actin, GAPDH, Tubulin, EF1α, SWIB, QUL, FHA, bZip, ERF, UBC, PDC and HISTH4 gene families, respectively, in version 4 of the F. vesca genome assembly [25] (Figs. S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12 and S13). Here, 26-18S rRNA, CHP1, ENP1 and UBQ11 were not analyzed because they were not annotated as a gene in the strawberry genome, or no sequence information is provided in the previous reports. Then, 80 publicly available RNA-seq libraries, which includes data for strawberry receptacle development, were mined. These libraries include four receptacle development experiments for three cultivars of F. vesca, comprising 'Hawaii-4', 'Yellow Wonder 5AF7', and 'Ruegen', and five experiments for receptacle development for five cultivars of F. × ananassa, consisting of 'Sweet Charlie', 'Camarosa', 'Toyonoka', 'Benihoppe', and 'Neinongxiang'. For a detailed description of the RNA-seq samples see Table S1. All 80 libraries were mapped to the F. vesca genome assembly v4.0 (Table S1). To identify eligible RGs from the aforementioned HKG families for strawberry receptacle development, we used a similar approach as described by Dekkers et al. [26]. For identification, the expression level and stability of candidate RGs were evaluated: (i) expression abundance, with a cut-off mean FPKM value ≥100, and (ii) expression stability, with a cut-off mean CV value ≤0.2. The thresholds were applied to the mean of the nine experiment data sets. Genes with higher FPKM values showed increased expression abundance and those with a lower CV value were more stably expressed. A total of 15 transcripts from the HKG families showed superior abundance and stability of expression, namely FveACT6, FveTUA2, FveEF1ɑ1, FveEF1ɑ2, FveGPDH4.1, FveUBC5, FveUBC10, FveUBC12, FveUBC16, FveUBC18, FveUBC21, FveUBC46, FveUBC51, FveUBC50 (FaDBP) and FveHISTH4.1 (Fig. S14). Thus, we defined 26-18S rRNA, ACT6, and GPDH4.1 as the "Commonly used" HKGs set and the remaining eligible genes were defined as the "Traditional" HKGs set. .
To confirm further the expression stability in strawberry receptacle development, the "SRDS" RGs set were compared with the "Commonly used" and "Traditional" HKGs. We calculated the expression ratio per gene were obtained by dividing the expression value per sample by the average expression level in each experiment set from the RNA-seq data to evaluate expression stability (plotted in Fig. S16). The "Commonly used" HKGs showed considerable variation in expression over the 80 strawberry fruit libraries. In comparison, a majority of "Traditional" HKGs showed greater stability of expression. However, an even higher degree of expression stability was exhibited by the "SRDS" RGs, which suggested that this set contained superior RGs from these candidates (Fig. S16). To test this hypothesis, we ranked the candidate RGs into nine lists according to the expression stability based on the CV value in each experiment set from the RNA-seq data. Discrepancies in the rank positions of candidate RGs were observed among these lists. To provide a consensus, we used RankAggreg, a package for R using a Monte Carlo algorithm and establish a consensus ranking [27], to merge the nine outputs. The merged list revealed that "SRDS" RGs also showed greater expression stability except for UBC12 (Fig. 1c). Among the "SRDS" RGs, RPT6A and RPN5A were the top-ranked genes. In contrast, the "Commonly used" HKGs received the lowest rankings, which revealed their inferior expression stability.
The RNA-seq expression data for these candidate RGs were also analyzed using geNorm, which evaluates the expression stability of genes by calculating a stability value (M) for each gene. The greater expression stability of a gene, the lower the M value. A similar ranking trend was obtained in this analysis, although a slight change in the order of RGs in the middle rankings was observed (Fig. 1d). The results of these RNA-seq data analyses implied that on the basis of expression stability the "SRDS" RGs outperformed the "Commonly used" and "Traditional" HKGs in strawberry receptacle development.

Detection by RT-qPCR of RGs expression stability in strawberry receptacle development
To test the hypothesis that the "SRDS" RGs list included superior RGs for strawberry receptacle development, we validated the expression stability of the candidate RGs in strawberry receptacles by RT-qPCR. Eight visual developmental stages for Fragaria vesca cultivar 'Ruegen' and F. × ananassa were sampled: small green, big green, degreening, white, initial turning, late turning, partial red, and full red stages (Fig. 2). The quality of the isolated RNA from the fruit samples (Fig. S17) and specificity of RT-qPCR primers (Fig. S18) were thoroughly checked before further processing. For further confirmation, 27 candidate RGs (26-18S rRNA, UBQ11, CHP1, ENP1 were included in this analysis) were validated by RT-qPCR analysis. For a detailed description of the detection procedure see the Materials and Methods.
A flowchart of the procedure to evaluate the expression stability of the candidate RGs in the RT-qPCR analysis was shown in Fig. S19. The cycle threshold (CT) value is an index that represents gene expression in the RT-qPCR analysis. Gene with a lower variation of CT value show more expression stability, and with a high CT value show low expression abundance. If CT value are too high (> 30) or too low (< 15), a gene is generally considered inappropriate as an RG, because it's unreasonable expression abundance. The CT values for the 23 candidate RGs were pooled to evaluate their expression profiles, and a box-whisker plot showing the CT variation among 16 test samples was generated (Fig. 3). All candidate RGs exhibited appropriate CT values except 26-18S rRNA. The average CT values ranged from 9.51 (26-18S rRNA) to 28.91 (UBC10). The "SRDS" RGs showed appropriate average CT values ranging from 26.64 to 27.88, and lower expression variation (less than 0.76 cycles) compared with "Traditional" and "Commonly used" HKGs [expression variation ranged from 0.86 cycles (UBC50) to 2.58 cycles (TUA2)] (Fig. 3). These results indicated that the "SRDS" RGs showed greater expression stability than "Traditional" and "Commonly used" HKGs and were more suitable for normalization of genes with low-to medium-abundance expression profiles.
In addition, we evaluated and ranked the candidate RG expression stability in all samples, considering 'Ruegen' and 'Monterey' together, on the basis of different stability indices calculated using four software programs Fig. 1 Identification of specific reference genes in strawberry receptacle development based on RNA-seq data. To discover additional superior RGs during receptacle development, we adopted a screening procedure with cut-off values for coefficient of variation (CV) ≤ 0.15 and reads per kilobase per million (FPKM) ≥ 100 in nine RNA-seq data sets that include receptacle development experiments in strawberry. a Venn diagram showing nine candidate RGs identified from the complete genome. The numbers represent the gene numbers meet the criteria for the each RNA-seq data set. b Statistical analysis of CV and FPKM values of strawberry receptacle development specific ("SRDS") RGs, "Commonly used" HKGs and "Traditional" HKGs identified from the nine RNA-seq data sets. The CV analysis is shown on the left side of the figure, and the FPKM analysis is shown on the right side of the figure. Each data point in the box-plot is derived from one RNA-seq data set. The horizontal line in the box represents the median. The red dashed lines indicate the cut-off values. c Ranking of the candidate RGs into nine lists on the basis of expression stability from CV values in each experiment of the RNA-seq data set. The RankAggreg package for R was used to generate a consensus ranking from the nine lists. The merged list revealed that "SRDS" RGs showed greater expression stability except for UBC12. d Expression data for the candidate RGs were analyzed using geNorm to evaluate their expression stability by calculating a stability value (M) for each gene. Increase in gene expression stability corresponds with a lower M value. The results were consistent with the ranking of the RGs. The RNA-seq data implied that the expression stability of "SRDS" RGs was superior to that of the "Commonly used" HKGs and "Traditional" HKGs during strawberry receptacle development. The colors indicate different sets of candidate RGs in b-d. Note: 26-18S rRNA was not analyzed here because it was not annotated on the F. vesca genome assembly v4 (geNorm, NormFinder, BestKeeper, and Delta CT), which have been widely applied in studies of internal reference evaluation. The results were consistent in revealing that "SRDS" RGs showed superior expression stability compared with that of "Traditional" and "Commonly used" HKGs ( Fig. 4a, c, d, e). Among these genes, RPT6A and RPN5A were the most stable RGs. Strikingly, the ranking of "Commonly used" HKGs in the lowest ranks revealed their inferior expression stability compared with "SRDS" RGs. We next used RankAggreg to merge the four rankings (Fig. 4f). The results corroborated the aforementioned rankings from geNorm, NormFinder, BestKeeper, and Delta CT analysis (Fig. 4), and also  corresponded with the results of the RNA-seq data analysis (Fig. 1). Normalization of gene expression using multiple RGs may increase measurement accuracy in RT-qPCR analyses. Thus, we investigated the optimal number of RGs for normalization in strawberry receptacle development. This analysis was performed by computing the pairwise variation (PV; V n /V n + 1 ) using geNorm software. Once the PV value for n genes is below a cutoff of 0.15, which is a recommended threshold that is universally accepted, additional genes are considered not to improve normalization. The pairwise variation V2/3 value (0.126) was less than the threshold (Fig. 4b). Therefore, two RGs (RPT6A and RPN5A) in combination were sufficient for RankAggreg package for R was employed to merge the stability measurements obtained from the four tools using a Monte Carlo algorithm and to establish a consensus ranking of the RGs (f). The pairwise variation (V n /V n + 1 ) was calculated to determine the optimal number of RGs for normalization of gene expression (b) normalization in strawberry receptacle development. We next analyzed the stability and optimal number of candidate RGs in strawberry receptacle development, in separate analyses for 'Ruegen' and 'Monterey', which showed similar results for each cultivar (Figs. S20 and S21). These results indicated that the recommended "SRDS" RGs were applicable to different cultivars of strawberry.

Validation of recommended RGs for normalization of FveCHS1 expression
Chalcone synthase (CHS) is involved in anthocyanin biosynthesis during fruit ripening. Suppression of CHS activity in strawberry fruit by transient or stable transgenesis results in a substantial decrease in anthocyanin content [28]. By conjoint analysis of the F. vesca genome and RNA-seq data, we detected expression of a CHS gene (FveCHS1) that was strongly associated with anthocyanin content in cultivars of strawberry with red fruit. In detail, the expression of FveCHS1 in the receptacle slightly decreased after the green stage, increased sharply and peaked at the turning stage, and thereafter slightly declined at the red stage (Fig. 5a). By contrast, this expression profile was absent in cultivars with white fruit, such as 'Hawaii-4' (Fig. 5a). These results suggested that FveCHS1 facilitated anthocyanin biosynthesis during receptacle ripening in cultivars of strawberry. Thus, the FveCHS1 expression profile was employed to validate the reliability of the recommended RGs by RT-qPCR. Unsurprisingly, relative expression levels of FveCHS1 obtained by normalization against combination of RPT6A and RPN5A, the preferred recommendation above, indicated similar trends to those from the RNA-seq data analysis during receptacle development in cultivars of strawberry with red fruit (Fig. 5b). Meanwhile, the similar results also were exhibited by normalization against individual "SRDS" RGs (Fig. 5b). However, the trend in FveCHS1 relative expression level during receptacle development differed to varying degrees when normalized against "Commonly used" HKGs (Fig. 5c). For instance, when ACT6 was used for normalization, the FveCHS1 relative expression level at the FR stage was equal to that at the LT stage; a similar alteration was observed when GAPDH4.1 was used for normalization; although the overall trend was consistent, the extent of variation in FveCHS1 relative expression level was notably increased when 26-18S rRNA was used for normalization (Fig. 5c). Pairwise correlation analyses were performed by calculating the Pearson correlation coefficient (R) to estimate the similarity of the trend in CHS1 relative expression level when normalized against candidate RGs. Genes with higher R values showed the more closely resemble normalization effect. The result revealed that a considerably high survival R values were obtained when the CHS1 relative expression level was normalized by individual "SRDS" RGs compared to normalization by combination of RPT6A and RPN5, which as a reference standard. However, the R values were relatively low when in the CHS1 relative expression level was normalized against "Commonly used" HKGs compared to normalization by combination of RPT6A and RPN5 (Fig.  5d). Thus, in practical application, these results confirmed that use combination of RPT6A and RPN5A or individual "SRDS" RGs were superior to use of "Commonly used" HKGs for normalization of gene expression during receptacle development in strawberry cultivars.

Discussion
The inaccurate normalization are inadequate for analyzing gene expression simultaneously track more transcripts with greater sensitivity. In a species with diverse cultivars like strawberry, identifying a universally applicable constitutive RG transcripts in fruit development study increasing the challenge. To date, three traditional HKGs that encode the 26-18S rRNA, ACTIN, and GAPDH are most frequently used for RT-qPCR in strawberry fruit studies. Unfortunately, these RGs are utilized generally without validation of their stability, while directly learned from experience in other plants. Constitutive expression of some HKG transcripts, including the three "Commonly used" HKGs in strawberry, are often been queried, because they are experimentally validated as inappropriate RGs. For instance, traditional HKGs from a developmental series of Arabidopsis seed and pollen samples show highly variable expression [9]. Specifically, the ACTIN transcript as a research case has been widely tested and shown to be inconsistent in poplar [29], potato [30], and rice [31]. It improves the idea of setting aside the preconception that HKGs are evenly expressed under difference species, tissues and certain conditions is generally accepted. To address this shortcoming, several researches have taken the tries that selected several members of HKGs were assessed during strawberry fruit ripening, of which RIB413 (26-18S rRNA), ACTIN, HISTH4, DBP (UBC50) and UBQ11 were recommended as appropriate RGs [14][15][16][17]. However, in this study showed that these recommended transcripts are not stably expressed and are inappropriate for normalization during fruit development in strawberry cultivars (Figs. 1, 4 and 5). Similar findings were also observed in other Rosacea species like peach [32]. The results from these rosaceous crops revealed that few transcripts are truly stable expression over broad ranges of fruit developmental states. Although, the same effort was worked in this study, we selected a total of the addition 13 "Traditional" HKGs transcripts from the HKG families showed more stability of expression, the weaknesses for this method is restriction in the scope of selecting the candidates evaluated.
The promising mRNAs for normalization proved to be those identified from transcriptomic sequencing efforts, and represent transcripts from some uncharacterized genes [33]. Such genome-based approaches have proven useful in Arabidopsis [9] where several unexploited transcripts were identified that improved the precision of RT-qPCR analysis over traditional reference transcripts. Previously, Clancy et al. (2013) have been attempted to screen the superior RGs during strawberry fruit ripening by merging digital gene expression data with expression profiling, and CHP1 and ENP1 were considered as constitutively expressed RGs [20]. However, these potential constitutive candidates needed to be confirmed through more transcriptome sets, as a small sample size could include bias. In this study, extensive RNA-seq data (80 libraries) previously generated for receptacle development in strawberry were mined to validate superior RGs. These RNA-seq data including nine experiments set c Relative expression level of FveCHS1 when normalized against "Commonly used" HKGs. The results confirmed that an individual "SRDS" RG or combination of RPT6A and RPN5A were superior to "Commonly used" HKGs for normalization of gene expression during receptacle development of strawberry. The receptacle samples were collected at eight visual developmental stages from strawberry 'Ruegen' in c and d. SG (small green), BG (big green), DG (degreening), WT (white), IT (initial turning), LT (late turning), PR (partial red), and FR (full red). d Pairwise correlation analyses were performed by calculating the Pearson correlation coefficient (R) to estimate the similarity of the trend in CHS1 relative expression level when normalized against candidate RGs with eight cultivars of strawberry were particularly noteworthy since it rendered our more possibly to obtain superior RGs which are universally applied during fruit development in diverse strawberry cultivars (Table S1). Nine "SRDS" RGs were identified from the complete genome by a very strict screening criteria (Fig. 1). Specifically, the use of multiple contrasting detection methods is strength of this work. Viewed as groups, ranking the candidates with four RNA-seq analysis (CV, FPKM, geNorm and expression ratio assessment) and five RT-qPCR statistical methods (CT pooled, geNorm, BestKeeper, NormFinder and Delta assessment) is largely consistent. Finally, the reliability of the recommended RGs were once again validated by normalization of CHS1 expression during strawberry receptacle development. This study merges these three independent assessment strategy sets were consistently revealed that the "SRDS" transcripts as optimal RGs showed most stability expression and improved the precision of RT-qPCR, compared with that the "Commonly used" and "Traditional" HKGs, including CHP1 and ENP1, are shown to be less reliable. It forcefully demonstrates that these results are reliable and repeatable. Although it also exist at slightly different among the ranked candidates in orders due to various algorithms of methods were used for assessment [10][11][12][13].
Notably, these novel transcripts were recommended in this study are never expanded as RGs in any plant. Surprisingly, the majority of stably expressed candidates were associated with the ubiquitin proteasome system (UPS), which is highly conserved in eukaryotes and is important for plant development. Among these, seven of the nine "SRDS" RGs were associated with the UPS proteolytic pathway (Fig. S15), which revealed that this pathway continuously operates as a basic cellular function throughout strawberry receptacle development. These candidate RGs covered most of the processes of the UPS, including proteasome assembly, ubiquitination, energy supply, and ubiquitin recycling (Fig. S15). UBC genes encode E2 ubiquitin-conjugating enzymes, which catalyze the transfer of activated ubiquitin from E1 to the active site cysteine of the E2 via a trans (thio) esterification reaction. This reaction is achieved by connecting the SKP1-Cullin-F-box (SCF) complex to approximate the substrate. SKP is a core component of the SCF complex and serves as an adaptor protein to connect the Fbox and Cullin protein to recognize the target protein during ubiquitination [34]. Once a protein is tagged with a single ubiquitin, the signal is activated to attach additional ubiquitin molecules. This results in a polyubiquitin chain, which is bound to the 26S proteasome, allowing the latter to degrade the tagged protein. RPN and RPT genes encode subunits for assembly of the 26S proteasome [35]. The content of ubiquitin protein is dynamically balanced in plant cells. It requires the deubiquitination of the Endosomal Sorting Complex Required for Transports (ESCRTs) complex to recycle ubiquitin and maintain cytoplasmic levels of free ubiquitin [36]. VPS34 is a subunit of ESCRT-III, which is a complex for constituting the ESCRTs. ESCRTs profoundly shape signal transduction is based on the degradative sorting of ubiquitinated membrane proteins and deubiquitylation [37]. The UPS requires abundant ATP for normal operation [38]. F 0 F 1 -ATP synthase, a large multi-subunit complex within the inner mitochondrial (mt) membrane, is the site of conversion of the energy stored in complex molecules into ATP. In the present study, two fruit-specific RGs, ATPD and ATPE, encode two different subunits of the mt ATP synthase, namely the δ (ATP5) and the ε (ATP3) subunits. The role of subunit δ is to combine the pore-forming c and a subunits in the last step of ATP synthase assembly. Subunit ε combines with subunit γ to form a central stalk that non-covalently interacts with the membraneembedded, hydrophobic c-ring [39,40]. In addition, two uncorrelated UPS genes, AKR2B and YLS8, were also identified as receptacle-specific RGs in the current study. Membrane proteins are usually synthesized on free ribosomes in the cytoplasm, subsequently received by molecular chaperones before being targeted to their membranes. In Arabidopsis, AKR2B and its close homolog, AKR2A, are cytosolic factors that recognize the protein-targeting signals and deliver them to the chloroplast outer membrane [41]. YLS8 encodes a mitosis protein, but its precise function is unknown. YLS8 was previously observed to outperform traditional RGs, with regard to expression stability, throughout development in Arabidopsis [9]. Subsequently, in several plant species, YLS8 has been assessed and confirmed to show enhanced expression stability during development compared with conventional RGs using transcription data [42,43].
In general, the nine "SRDS" RGs identified in the present study perform basic physiological functions in plant cells and participate throughout the entire plant development period, which thus provides a biological basis for their stable expression.

Conclusions
In this study, we identified nine RGs specific to receptacle development in strawberry based on genome-wide analysis. The results of the multifaceted assessment consistently showed that the novel RGs showed improved expression stability compared with that of the "Commonly used" and "Traditional" HKGs in transcriptome and RT-qPCR analyses. Notably, the majority of stably expressed genes were associated with the UPS proteolytic pathway. Among these, two 26S proteasome subunits, RPT6A and RPN5A, are recommended as the optimal RGs combination. This finding provides additional useful and reliable RGs as resources for the accurate study of gene expression during receptacle development in cultivars of strawberry.

Plant material
Plants of Fragaria vesca cultivar 'Ruegen' and F. × ananassa cultivar 'Monterey' from the germplasm garden of College of Horticulture, Fujian Agriculture and Forestry University (Fuzhou City, Fujian Province, China) were prepared for this study. Plants were cultured in a greenhouse in Fuzhou, China. Growing conditions were maintained at 25°C under a 16 h photoperiod providing 120 μmol m − 2 s − 1 irradiance. We modified a previously reported six-stage division of strawberry fruit development [44] and exploited higher-density sampling. Fruit were harvested at eight developmental stages: small green (7 days after flowering, DAF), big green (14 DAF), degreening (18 DAF), white (21 DAF), initial turning (23 DAF), late turning (25 DAF), partial red (27 DAF), and full red (29 DAF) stages (Fig. 2). The achenes were separated from the receptacle. Each sample were randomly selected from eight fruits, then receptacle was cut into small pieces for mixing, immediately frozen in liquid nitrogen, and stored at − 80°C before isolating the total RNA. For all samples, four biological replicates were performed.

Identification and phylogenetic analysis of the HKG families in strawberry
Multiple strategies were used to search for members of the HKG families in strawberry. (1) Keyword searches were performed against the annotated strawberry protein databases; (2) hidden Markov model (HMM) searches with the conservative domain HMM profile for proteins, and all significant hits (HMMER E-value < e − 5 ) were subsequently evaluated; (3) Arabidopsis thaliana HKG sequences were used as queries in exhaustive BLASTP searches with standard parameters of the strawberry protein databases. All candidate protein sequences were analyzed using the InterProScan5 database (http://www. ebi.ac.uk/interpro/search/sequence-search) to verify the presence of the necessary domain; protein sequences lacking this domain were removed (Figs. S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12 and S13). The amino acid sequences of the selected HKGs were aligned using ClustalX [45]. Phylogenetic trees were constructed via the maximum likelihood method in MEGAX using the best amino acid substitution model [46]. Bootstrap analysis was performed with 1000 iterations using the parameters p-distance and partial deletion with 50% site coverage cutoff option. Strawberry HKGs were named based on the nomenclature established for Arabidopsis (Figs. S2, S3, S4, S5, S6, S7, S8, S9, S10, S11, S12 and S13).

Transcript assembly and quantification for RNA-seq
Eighty available RNA-seq libraries, which includes studies of strawberry receptacle development, from the National Center of Biotechnology Information database (NCBI) (https://www.ncbi.nlm.nih.gov/) were mined. All libraries data were mapped to the F. vesca reference genome assembly v4.0 using the HISAT2 aligner [47]. The percentage of uniquely aligned reads varied among samples ≥77.55% (Table S1), which indicated that the RNAseq data quality was acceptable. The StringTie software was used to calculate the fragment per kilobase of transcript per million mapped reads (FPKM) values [47]. Genes with a high FPKM value showed a high abundance of transcripts.

RNA extraction and cDNA preparation
Total RNA was extracted using the TRIzol reagent (TaKaRa, Dalian, China) in accordance with the manufacturer's instructions. RNA samples were assessed with ratios OD 260/280 > 2.0 and OD 260/230 > 1.8 (Fig.  S17). To eliminate genomic DNA contamination, equal amounts of total RNA (2 μg) in all RNA samples were treated with DNAase I (Fig. S17). Then, cDNA was synthesized using the PrimeScript™ RT Reagent Kit (Perfect Real Time; TaKaRa, Dalian, China). RNA extraction and cDNA synthesis from all samples was performed with four biological replicates.
Primer design, amplification specificity, and efficiency analysis for candidate RGs in RT-qPCR We used the primers for GAPDH4.1 [6,7], ACT6 [4,5], and 26-18S rRNA [2,3] confirmed in previous studies and widely used in studies of strawberry fruit, and designed new primers for the remaining candidate RGs using Premier 5.0 software ( Table 1). The PCR efficiency and specificity of the primers were thoroughly evaluated (Table 1, Fig. S18). In all indices the primers met the MIQE guidelines [48], which provide guidelines on the minimum information for publication of RT-qPCR experiments. Lin-RegPCR software was used to evaluate efficiency by calculating correlation coefficients (R 2 ) and amplification efficiency of the candidate RG primers in RT-qPCR (Table 1) [49]. The specificity of the primers for the candidate RGs was verified as follows. The presence of a single peak in the melting curve was confirmed after RT-qPCR amplification (Fig. S18a). Subsequently, the amplified products were analyzed by 2.5% agarose gel electrophoresis, and only one band of the expected product size was detected (Fig. S18b).
To assay for gene expression, 20 μl of reaction mixture corresponded to 5 ng total cDNA in one PCR reaction system (TaKaRa SYBR® PrimeScript™ RT-PCR Kit (Perfect Real Time)). The reaction was carried out using a Roche LightCycler® 480II real-time PCR cycler (Roche Diagnostics, Mannheim, Germany) in accordance with the kit manufacturer's instructions (TaKaRa, Dalian, China). Reaction mixtures were incubated for 10 min at 95°C for pre-incubation followed by 45 amplification cycles of 15 s at 95°C, 15 s at 60°C, and 20 s at 72°C. A dissociation curve was generated from 60 to 95°C. Four biological replicates were amplified for all samples. Expression levels of candidate RGs in all samples were determined on the basis of their quantification cycle values (CT). The ΔΔC q method was used to determine CHS1 mRNA levels among samples.

Statistical analysis of the expression stability of the candidate RGs
To determine the expression stability of the candidate RGs, gene-stability measure (M), stability (Stab), coefficient of variance (CV), and standard deviations (SD) values were calculated using geNorm [10], BestKeeper [11], NormFinder [12], and Delta CT [13] software, respectively. Increasing stability of gene expression corresponds to lower M, Stab, CV, and SD values. The RankAggreg package for R was employed to merge the stability measurements obtained from the four tools using a Monte Carlo algorithm and establish a consensus ranking of RGs [27]. The rankings from the four terms M, Stab, CV, and SD were used as input.
Additional file 1: Figure S1. The structure of strawberry fruit in 'Ruegen'. Figure S2. Identification, phylogenetic and domain analyses of the Actin gene family in strawberry and Arabidopsis. Figure S3. Identification, phylogenetic and domain analyses of the GAPDH gene family in strawberry and Arabidopsis. Figure S4. Identification, phylogenetic and domain analyses of the Tubulin gene family in strawberry and Arabidopsis. Figure S5. Identification, phylogenetic and domain analyses of the EF1α gene family in strawberry and Arabidopsis. Figure S6. Identification, phylogenetic and domain analyses of the QUL gene family in strawberry and Arabidopsis. Figure S7. Identification, phylogenetic and domain analyses of the SWIB gene family in strawberry and Arabidopsis. Figure S8. Identification, phylogenetic and domain analyses of the FHA gene family in strawberry and Arabidopsis. Figure  S9. Identification, phylogenetic and domain analyses of the UBC gene family in strawberry and Arabidopsis. Figure S10. Identification, phylogenetic and domain analyses of the AP2/ERF gene family in strawberry and Arabidopsis. Figure S11. Identification, and phylogenetic and domain analyses of the bZip gene family in strawberry and Arabidopsis. Figure S12. Identification, phylogenetic and domain analyses of the PDC gene family in strawberry and Arabidopsis. Figure  S13. Identification, phylogenetic and domain analyses of the HISTH4 gene family in strawberry and Arabidopsis. Figure S14. Identification of qualified HKGs in strawberry receptacle development based on RNA-seq data. Figure S15. Schematic illustration of the cellular functions of nine "SRDS" RGs. Figure S16. Relative expression of candidate reference genes from RNA-seq data. Figure S17. Strawberry receptacle RNA sample quality assessment. Figure S18. Specificity assessment of RT-qPCR primers. Figure S19. Flow chart showing procedure for RT-qPCR analysis of candidate reference genes during strawberry receptacle development. Figure S20. Expression stability of candidate reference genes of 'Ruegen' analyzed by RT-qPCR. Figure S21. Expression stability of candidate reference genes of 'Monterey' analyzed by RT-qPCR. Table S1. Description of 80 samples with receptacle development for RNA-seq in strawberry. Table S2. The information of HKG families in strawberry and Arabidopsis.