Selecting optimal reference genes for breast cancer research

Background Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) is the most sensitive technique for evaluating gene expression levels. Choosing appropriate reference genes (RGs) is critical for normalizing and evaluating changes in the expression of target genes. However, uniform and reliable RGs for breast cancer research have not been identied, limiting the value of target gene expression studies. Here, we provide a novel approach for mining RGs by using the RNA-seq dataset to identify reliable and accurate RGs that can be applied to different types of breast cancer tissues and cell lines. Methods First, we compiled the transcriptome proling data from the TCGA database involving 1217 samples to identify novel RGs and then ten genes (SF1, TARDBP, THRAP3, QRICH1, TRA2B, SRSF3, YY1, DNAJC8, RNF10, and RHOA) with relatively stable expression levels were chosen as novel candidate RGs. Additionally, six conventional RGs (ACTB, TUBA1A, RPL13A, B2M, GAPDH, and GUSB) were also selected. To determine and validate the optimal RGs we performed qRT-PCR experiments on 87 samples from 5 types of surgically excised breast tumor specimens including HR+HER2-, HR+HER2+, HR-HER2-, HR-HER2+, breast cancer after neoadjuvant chemotherapy (NAC) and their matched para-carcinoma tissues, furthermore, we also included a benign breast tumor sample. Six biological replicates were included for each tissue. Moreover, we assessed 7 breast cancer cell lines (MCF-10A, MCF-7, T-47D, MDA-MB-231, MDA-MB-468, as well as MDA-MB-231 with either CNR2 knockdown or overexpression; 3 biological replicates for each line). Five statistical algorithms (geNorm, NormFinder, ΔCt method, BestKeeper, and ComprFinder) were used to assess the stability of expression of each RG across all breast cancer tissues and cell lines. Results

Methods First, we compiled the transcriptome pro ling data from the TCGA database involving 1217 samples to identify novel RGs and then ten genes (SF1, TARDBP, THRAP3, QRICH1, TRA2B, SRSF3, YY1, DNAJC8, RNF10, and RHOA) with relatively stable expression levels were chosen as novel candidate RGs. Additionally, six conventional RGs (ACTB, TUBA1A, RPL13A, B2M, GAPDH, and GUSB) were also selected. To determine and validate the optimal RGs we performed qRT-PCR experiments on 87 samples from 5 types of surgically excised breast tumor specimens including HR+HER2-, HR+HER2+, HR-HER2-, HR- Results Our results show that RG combinations SF1+TRA2B+THRAP3 and THRAP3+RHOA+QRICH1 showed stable expression in breast cancer tissues and cell lines, respectively, and that these two combinations displayed good interchangeability. Therefore, we propose that the above two combinations are optimal triplet RGs for breast cancer research.
Conclusions In summary, we identi ed novel and reliable RG combinations for breast cancer research based on a public RNA-seq dataset which lays a solid foundation for accurate normalization of qRT-PCR results across different breast cancer tissues and cells.

Background
Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) is a highly sensitive and low-cost technique that is widely used in molecular biology [1]. However, the accuracy and interpretation of its results are determined by the stability of the selected reference genes (RGs) [2]. Hence, the selection of suitable RGs is the rst aim of any research system dedicated to the investigation of differential gene expression [3]. Furthermore, the simultaneous use of multiple RGs will result in more accurate data on target gene expression [2,4].
Breast cancer is the most common malignancy in females and accounts for approximately 30% of all cancers diagnosed [5]. Based on the expression of hormone receptors (HR), including the estrogen receptor (ER), progesterone receptor (PR), and the human epidermal-growth-factor receptor 2 (HER-2), breast cancer can be classi ed into four subtypes including HR + HER2-, HR + HER2+, HR-HER2+, and HR-HER2- [6]. During the course of breast cancer treatment, subtype status determines the use of neoadjuvant chemotherapy (NAC). In addition, breast disease also includes benign tumors [7].
Tumorigenesis and breast cancer metastasis are associated with gene expression changes that are most commonly detected using qRT-PCR [8]. In previous breast cancer studies commonly used RGs included beta-actin (ACTB), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), beta-glucuronidase (GUSB), ribosomal protein L13a (RPL13A), and tubulin alpha 1a (TUBA1A) [3,8,9]. However, research has indicated that these RGs are not consistently expressed across different tissues and experimental conditions [8,10,11]. Therefore, it is crucial to identify new RGs whose expression across various breast cancer tissues is more consistent.
Many novel RGs have been predicted and validated in many species and disease models, such as traumatic brain injury [12], Euscaphis konishii Hayata [13], Salix viminalis [4], Oryza sativa [14], Gentiana macrophylla [15], Homo sapiens [16], and Rhizophora apiculata [17]. However, to our knowledge, few systematic studies to validate potential RGs have been conducted for breast cancer. Available studies involved either tissues or cell lines (but not both), and the RGs concerned were not novel [3,8,[18][19][20]. Considering the enormous threat breast cancer poses to human health, it is urgently necessary to identify RGs that are more relevant to a wide range of breast cancer tissues and cells across several conditions [21][22][23]. In this work, we hypothesized that novel RGs for breast cancer research could be identi ed and validated using an mRNA-seq dataset.
To this end, we employed the mRNA-seq datasets from The Cancer Genome Atlas (TCGA) to discover novel RGs. Ten relatively stably expressed genes (SF1, TARDBP, THRAP3, QRICH1, TRA2B, SRSF3, YY1,  DNAJC8, RNF10, and RHOA) and six traditional RGs (ACTB, TUBA1A, RPL13A, B2M, GAPDH, and GUSB) were selected as the candidate RGs. The qRT-PCR experiments were performed on different experimental samples including six types of breast cancer tissues and seven different breast cancer cell lines. The stability of expression of these candidate RGs was calculated using geNorm [24], NormFinder [25], ΔCtmethod [26], BestKeeper [27], and ComprFinder [28]. Finally, the optimal RGs were validated and con rmed. Our study signi cantly improves upon previous work in breast cancer research.

Breast cancer tumor
Breast tumor and para-carcinoma tissues were supplied by the Breast Tumor Biobank of the Three Gorges Hospital A liated with Chongqing University. The fresh tissues were obtained from patients with written informed consent and with permission of the Three Gorges Hospital A liated with Chongqing University Clinical and the Laboratory Research Ethical Council. All tissues were stored frozen at -80℃ after pathologic evaluation. We collected a total of 66 tissue samples including benign tumor tissues (n = 6), as well as tissues from four subtypes of breast cancer including HR+/HER2-(n = 6), HR+/HER2+ (n = 6), HR-/HER2-(n = 6), HR-/HER2+ (n = 6) and their paired para-carcinoma tissues (n = 6 for each) from 24 patients who were diagnosed with breast cancer and from 6 patients who were diagnosed with breast cancer and then were treated with NAC before surgery. The para-carcinoma tissue samples had been taken from outside of the histopathological tumor border (3 cm) in the same excisional biopsy specimen. The clinical patient information is shown in Additional le 1: Table S1. . All culture media were supplemented with 20U/mL penicillin, 100 mg/mL streptomycin, and 10% heat-inactivated fetal bovine serum (FBS, Australia, Gibco). Cells were grown at 37℃ in a humidi ed atmosphere including 5% CO 2 . At the end-point of each experiment, the nal pH of the supernatant was always measured by a digital pH-meter (pH 301, HANNA Instruments, USA).

Total RNA Extraction and cDNA Synthesis
Total RNA was isolated with RNAiso Plus (Takara, Dalian, China) using the phenol-chloroform method. Extracted RNA was quanti ed using Nanodrop One (ThermoFisher, Wilmington, USA) and its integrity was checked on a 1% agarose gel. Only RNA samples with A260/A280 ratios between 1.9 and 2.2 and A260/A230 ratios greater than 2.0 were used for cDNA synthesis. Total RNA (1 µg) was reverse transcribed into cDNA using random primers or an oligo dT primer using a PrimeScript RT reagent Kit with gDNA Eraser (Takara, Dalian, China), according to the manufacturer's protocol [29]. All cDNA samples were diluted 1:8 with RNase-free water and stored at -20℃.
The CV was de ned as the CV value of the 1217 FPKM values of every gene. The DPM parameter was introduced for the identi cation of the RGs on the Pattern Gene Finder [33]. The FC-5% was de ned as the fold change between the top 5% high expression levels divided by the bottom 5% within 1217 pro les. The standard criteria of candidate RGs were relatively high expression levels and low variation according to the results from FPKM, CV, DPM, and FC-5% analyses.
Furthermore, two frequently used RGs (ACTB and GAPDH) and four RGs (GUSB, RPL13A, TUBA1A, and B2M) from previous studies were also assessed along with the novel candidate RGs. All RGs were ampli ed using qRT-PCR experiment for subsequent determination and validation. The probability density curves were drawn using Matlab scripts from our previous study [28]. Venn diagram analysis was performed using a webtool (http://www.omicshare.com/tools).

Primer Design and Ampli cation E ciency Analysis
The sequences of all genes used in this study were obtained from the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/). Using Primer-BLAST, primers were designed to all transcripts, with T m values around 60 °C, GC percent 45-55%, primer lengths of 18-24 bp, and product length of 80-250 bp. Primers were analyzed with Oligo Analyzer v3.1 (https://eu.idtdna.com/calc/analyzer) to detect potential self-and hetero-dimers [34]. The primers were synthesized by the Beijing Genomics Institute (Beijing, China). Primer speci cities were con rmed by melting curve analysis.

qRT-PCR analysis
All qRT-PCR runs were carried out in a qTower2.2 PCR System (Analytik Jena, Germany). Reaction mixtures containing 7.5 µL TB Green Premix Ex Taq II (2X, Tli RNaseH Plus), 0.3 µL ROX Reference Dye II (50X, TaKaRa, Dalian, China), 1.5 µL cDNA, 0.6 µL each of forward and reverse primers ( nal concentration 1 µM), and 4.6 µl nuclease-free water were prepared in MicroAmp fast optical 96-well plates (ThermoFisher, USA). Ampli cation conditions were set as follows: 95℃ for 30 s, followed by 40 cycles of 95℃ for 5 s and 60℃ for 34 s. Melting curve analysis was performed from 60-95℃. Reaction mixtures containing no template were used as negative controls. All samples were analyzed with three technical replicates. To test the ampli cation e ciency of each paired primer, serial ten-fold dilutions (1:10 3 -1:10 10 ) of the primer corresponding PCR product were used to generate a standard curve [35]. The coe cient of determination (R 2 ) and slope (S) values were calculated from the standard curves and primer e ciencies (E) were calculated as 10^(1/S)-1. The qRT-PCR experiments and analyses in this study were performed according to the Minimum Information for Publication of Quantitative Digital PCR Experiments (MIQE) guidelines [36].

Analysis of Gene Expression Stability
The cycle threshold (Ct) results from all runs were integrated into a data matrix. Then the data matrix was evaluated by four algorithms: geNorm [24], NormFinder [25], ΔCt method [26], and BestKeeper [27]. Finally, the gene stability values from the above four algorithms were further evaluated by the ComprFinder method [28] (Fig. 1D).

Validation of the candidate reference genes
To verify the reliability of the stable RGs, four target genes including MAPK9 and MAPK3 from the extracellular signal-regulated kinase (ERK) signal pathway, and two other vital functional genes (FAAH, encoding fatty acid amide hydrolase, and HIF1A, encoding hypoxia-inducible factor 1-alpha) were chosen for validation (Fig. 1E). These target genes play an important role in the initiation and metastasis of breast cancer [37][38][39][40][41]. The independent-sample t-test was performed using Microsoft Excel, and the graphs were plotted using GraphPad Prism 7. The results are presented as Mean ± standard error of the mean (SEM), * P < 0.05, ** P < 0.01. For multiple gene combinations, the geometric mean of their Ct values was calculated. The relative expression levels were calculated using the 2 −ΔΔCt method. To further evaluate the internal relationship of these different types of single-or multi-RG combinations, correlation analysis was performed as previously described [28]. Additionally, correlation analysis was also performed on the p-value dataset yielded in t-test analysis under different types of normalized factors [42].

Identi cation of candidate RGs based on a public transcriptomic database
Transcriptome sequencing data of 1217 breast cancer samples were obtained from the TCGA database. Next, 15450 unigenes that were identi ed after processing were evaluated by FPKM (high expression level, FPKM ≥ 10), CV (low variability as determined by the coe cients of variation, CV ≤ 40%), FC-5% (the top 5% of 1217 samples divided by the lowest 5%, FC-5%≤5), and DPM (DPM ≤ 0.3) values. The results for the different statistical algorithms, shown in Fig. 2, were as follows: (1) FPKM. A total of 4723 genes satis ed the requirement (30.57% of 15450, the blue area in Fig. 2A).
Gene overlap between the four algorithms was identi ed using a Venn diagram with 4-color blocks (blue, purple, green, and red), showing that 197 genes (Fig. 2E) satis ed all four requirements. Of these 197 genes, 10 genes (SF1, TARDBP, THRAP3, QRICH1, TRA2B, SRSF3, YY1, DNAJC8, RNF10, and RHOA) were selected as novel candidate RGs due to their higher FPKM values and easier primers design. In addition, GUSB, TUBA1A, RPL13A, and B2M, which previous studies suggested being stable RGs in breast cancer research, and two classical RGs, ACTB and GAPDH, were also considered. These genes were ranked based on their CV values (Table 1).  Figure S1).

Ct values of candidate reference genes
The mean Ct value (average of 3 technical replicates) for every sixteen RGs are shown in Fig. 3 and Additional le 1:  91). However, the Ct values of breast cancer cell lines, were overall lower than those of breast cancer tissues (Fig. 3B). A similar result of standard deviations was obtained in the breast cancer cells. To estimate the gene expression stability of these candidate RGs, more scienti c algorithms will have to be introduced and used.

Expression stability of candidate reference genes
In this study, the qRT-PCR data matrix was analyzed using ve differential algorithms: geNorm, NormFinder, BestKeeper, ΔCtmethod, and ComprFinder.

geNorm analysis
Gene expression stability was evaluated by the M value using geNorm analysis. This program determines the pairwise variation of each gene with all other analyzed genes under the same experimental conditions, the lower the M value, the higher the gene expression stability. In the breast cancer tissue group, the three most stably expressed genes (with the lowest M value) were SF1, THRAP3, and TARDBPwhile GAPDH, DNAJC8, and B2M were the least stably expressed genes ( Table 2). In the breast cancer cell group THRAP3, RHOA, and QRICH1 were the top three stably expressed genes, while B2M, TUBA1A, and ACTB were the least stably expressed genes (Table 3). For all samples, TARDBP was the most stably expressed gene, followed by SF1 and QRICH1. Conversely, TUBA1A, B2M, and ACTB were the least stably expressed RGs (Additional le 1: Table S4).

NormFinder analysis
Based on variance analysis to calculate the stable value of each gene, a higher NormFinder value indicates a less stably expressed gene. In the breast cancer tissue group, TRA2B, RHOA, and THRAP3 were the most stable genes, and DNAJC8, GAPDH, and B2M were the most unstable genes ( Table 2). In the breast cancer cell group, THRAP3, DNAJC8, and RPL13A were the three most stably expressed genes, while TUBA1A, B2M, and ACTB were the least stably expressed genes (Table 3). For all breast cancer tissue and cell samples, THRAP3, RHOA, QRICH1 were the most stably expressed genes, and TUBA1A, B2M, ACTB were the least stably expressed RGs (Additional le 1: Table S4).

BestKeeper analysis
To further analyze the expression stability of the RGs, BestKeeper was applied, in which a lower std-value indicates a more stably expressed RG. As shown in Table 2, in the breast cancer tissue group DNAJC8, RPL13A, and TARDBP were the most stably expressed genes, while ACTB, B2M, and GAPDH were the least stably expressed genes ( Table 2). In the breast cancer cell line group, THRAP3, RHOA, and QRICH1 were the three most stably expressed genes, while B2M, ACTB, and TUBA1A were the least stably expressed genes (Table 3). For all samples combined, DNAJC8, RPL13A, and TUBA1A were the most stably expressed genes, while GAPDH, RNF10, and ACTB were the least stably expressed RGs (Additional le 1: Table S4).

ΔCt analysis
According to the analysis results of the ΔCt method, TRA2B, RHOA, and THRAP3 were the most stably expressed genes, while DNAJC8, GAPDH, and B2M were the least stable genes in the breast cancer tissue group (Table 2), which is consistent with the analysis according to NormFinder. In addition, in the breast cancer cell lines THRAP3, TARDBP, and YY1 were the most stably expressed genes, while B2M, TUBA1A, and ACTB were the least stably expressed genes (Table 3). For all samples combined, THRAP3, RHOA, and QRICH1 were the most stably expressed genes, while TUBA1A, B2M, and ACTB were the least stable RGs (Additional le 1: Table S4).
A comprehensive ranking of the four methods examined The ComprFinder algorithm was carried out to obtain a nal score (FS) which was used to rank the potential RGs. In the breast tumor tissue group, the 3 most stably expressed RGs were SF1, TRA2B, and THRAP3 (Table 2). In the breast cancer cell lines, THRAP3, RHOA, and QRICH1 were the most stably expressed RGs (Table 3). For all samples combined, we ranked the RGs from the highest to the lowest stability as follows: THRAP3 > RHOA > QRICH1 > YY1 > TRA2B > RPL13A > SF1 > SRSF3 > GUSB > TARDBP > DNAJC8 > RNF10 > GAPDH > TUBA1A > B2M > ACTB. Interestingly, the top 5 most stable genes (THRAP3, RHOA, QRICH1, YY1, and TRA2B) were novel RGs. In contrast, the traditionally used RGs TUBA1A, B2M, and ACTB were the least stably expressed RGs.
The research presented here con rmed that THRAP3, RHOA, QRICH1, YY1, and TRA2B were the most stable RGs in all samples with FS values of 0.064, 0.101, 0.122, 0.151, and 0.161, respectively (Table S4). These promising results warranted further validation of the selected RGs.

Validation of the selected genes (1): comparison of target gene pro les when using different normalized RGs
To verify the reliability of the selected RGs, the expression pro les of MAPK3, MAPK9, FAAH, and HIF1A were determined in different breast cancer tissues and cell lines with different normalization factors (NF, i.e., the single-or multi-RG combinations). Our results indicated that SF1, TRA2B, and THRAP3 were the top 3 stably expressed RGs in breast cancer tissues and that THRAP3, RHOA, and QRICH1 were the top 3 stably expressed RGs in breast cancer cell lines. Moreover, ve genes (SF1, TRA2B, THRAP3, RHOA, and QRICH1) were the top 5 stably expressed candidate RGs in all samples. Therefore, we considered the multi-RG combination SF1 + TRA2B + THRAP3 + RHOA + QRICH1 as the most promising NF for breast cancer research (both in breast cancer tissues and cells). Thus, the multi-gene combinations including SF1 + TRA2B + THRAP3 + RHOA + QRICH1, SF1 + TRA2B + THRAP3, THRAP3 + RHOA + QRICH1, SF1 + THRAP3, THRAP3 + RHOA, and single RGs including SF1, TRA2B, THRAP3, RHOA, and QRICH1 were compared. In addition, ACTB, GAPDH, and ACTB + GAPDH were also used for comparison with the novel RGs. In total, 13 different multi-RG combinations or single RGs were assessed as NFs. For multiple gene combinations, the geometric average of their Ct value was calculated. The relative gene expression level was calculated as 2 −ΔCt , ΔCt = Δ (Ct targetgenes -Ct RGs ).
As shown in Fig. 4A, the expression of MAPK3 was signi cantly higher (P 0.05) in HR + HER2-cancer tissue than in para-carcinoma tissue or benign tumor tissue when assessed by 5 or 3 multi-gene RG combinations. However, the expression pattern of MAPK3 changed when we used single or 2 multi-gene RG combinations, such as SF1 + THRAP3, SF1, RHOA, or QRICH1. Importantly, when we investigated the least stably expressed RGs (ACTB, GAPDH, or ACTB + GAPDH), the expression of MAPK3 was signi cantly changed compared with the most stably expressed RGs.
As shown in Fig. 4B, when using 5 or 3 multi-gene combinations, the expression level of the MAPK9 gene was higher in HR + HER2-cancer tissue than in para-carcinoma tissue (P 0.05), while there was no signi cant difference between para-carcinoma tissue and benign tumor tissue. This may lead to small errors when using single or two multi-gene combinations. For example, when the less stably expressed genes ACTB, GAPDH, or ACTB + GAPDH, were used as the NF, the expression of MAPK9 did not show a clear expression trend compared with those of 5 or 3 multi-gene combinations.
The complete relative expression levels (2 −ΔCt ) of MAPK3, MAPK9, FAAH, and HIF1A genes normalized using all 13 types of single or multiple RG combinations are listed in Additional le 1: Table S5 and Table   S6.
Validation of the selected genes (2): the relationship among different normalized RGs Based on the method described in our previous study [28], the relationship among different normalized RGs was explored. As shown in Additional le 1: Figure S2, there was a high correlation (R 2 from 0.815 to 0.979 in breast cancer tissues, and R 2 from 0.927 to 0.995 in breast cancer cells) between stable RGs and SF1 + TRA2B + THRAP3 + RHOA + QRICH1. There was also a moderate to high correlation (R 2 from 0.621 to 0.709 in breast cancer tissues, and R 2 from 0.600 to 0.916 in breast cancer cell s) between unstable RGs and SF1 + TRA2B + THRAP3 + RHOA + QRICH1. There were few differences between the most stably expressed RGs and the least stably expressed RGs. Therefore, we performed additional analyses of their normalized e cacy, including a correlation analysis on the p-value yielded by the t-test analysis (see Method section).

Discussion
The qRT-PCR technique is one of the most valuable and reliable research tools used to quantify the expression of a target gene under different experimental conditions. Proper use of NFs is necessary to get a reliable estimate of gene expression in different types of breast cancer tissues and cells to avoid detecting variations that are not cancer-speci c [43][44][45]. Therefore, the selection of the appropriate RGs for breast cancer research is important when using qRT-PCR to quantify gene expression. Many studies use a single endogenous control for normalization, which can in uence the statistical results and may lead to erroneous data interpretation [2,46]. In fact, in the present study no single RGs were identi ed that were stably expressed in all tissues or cell types across different types of breast cancer [7,47,48].
Theoretically, RGs should be stably expressed in all samples, and their expression levels should be unaffected by the external environment, e.g. during tumorigenesis [31,49]. The selection and validation of RGs have to be corroborated by using a large number of samples [50,51]. To implement this idea, in this study we collected a large number (n = 87) of samples including 6 types of breast cancer tissues and 7 types of breast cancer cell lines. This allowed us to obtain strong results and conclusions. There was a great diversity of samples in our research for the following reasons: a) both benign and malignant tumor types were chosen, b) breast cancer samples following neoadjuvant chemotherapy were included, c) the breast cancer cell lines included overexpression and knockdown groups. With the above caveats explained, we propose that we have identi ed combinations of RGs that have high applicability in breast cancer research and treatment.
The target genes that were used in this study are involved in different biological processes of breast carcinogenesis and metastasis. Particularly, tumorigenesis, proliferation, apoptosis, and invasion are associated with many genes and signaling pathways. For example, genes like MAPK3 and MAPK9 encoding MAP kinases of the ERK signal pathway participate in transcription factor regulation of many biological processes [52,53]. Recently, novel results have indicated proteins that serve important roles during the process of cancer development. FAAH is a membrane-bound protein belonging to serine hydrolase family of enzymes that plays a signi cant role in the termination of signalling of fatty acid amides (FAAs), a class of bioactive lipids, both in the central nervous system and in some cancer tissues [54]. Hypoxia-inducible factors play an important role in the development of tumors, therefore the study functions of Hypoxia-inducible factors are also indispensable about cancer samples (HIF1A) [40,55]. Therefore, to con rm the effects of these genes on the vital regulatory mechanisms in breast cancer, we compared the potential role of novel RGs (SF1, TRA2B, THRAP3, RHOA, QRICH1) vs. traditional RGs (ACTB, and GAPDH) in target gene expression normalization.
In our study, ve algorithms were used to determine the stability of the expression of 16 candidate RGs across several different types of breast tumors and breast cancer cell lines. However, even for the same algorithm, the results varied between breast cancer tissues and cell lines. The top three genes for breast cancer tissues and cell lines were SF1 + TRA2B + THRAP3 and THRAP3 + RHOA + QRICH1, respectively, so a total of 5 RGs (SF1, TRA2B, THRAP3, RHOA, QRICH1) should be considered. However, simultaneous investigation of all ve RGs would require a lot of effort.
There are still no speci c literature reports prescribing how many candidate RGs should be used for qRT-PCR-dependent studies [56]. In particular, it is unknown which single or multiple gene combinations (SF1 + TRA2B + THRAP3 + RHOA + QRICH1, SF1 + TRA2B + THRAP3, THRAP3 + RHOA + QRICH1, SF1 + THRAP3, THRAP3 + RHOA, SF1, TRA2B, THRAP3, RHOA, or QRICH ) should be used. Considering that our results indicate that the single gene performances of both novel and traditional RGs, are not adequate, we propose that these types of studies should not be based on the use of single RG, even if they are top level RGs. The double gene combinations SF1 + THRAP3 and THRAP3 + RHOA (Fig. 4A-D) showed similar gene expression pro les consistent with SF1 + TRA2B + THRAP3 + RHOA + QRICH1, SF1 + TRA2B + THRAP3, and THRAP3 + RHOA + QRICH1. However, the SF1 + THRAP3 combination behaved similarly to the 5 or 3 gene combinations except for the MAPK3 and MAPK9 expression. Meanwhile, the THRAP3 + RHOA combination behaved similarly to the 5 or 3 gene combinations except for the MAPK9 expression. Therefore, considering the need for normalization accuracy, double RGs are not the best choice either.
The expression pattern of target genes was the same when 3-gene combinations or 5-gene combinations were used and they can be applied to various factors in breast cancer research. However, 3 RGs is a more manageable number for qRT-PCR experiments than 5 RGs. Therefore, we recommend that SF1 + TRA2B + THRAP3 and THRAP3 + RHOA + QRICH1 are adopted as the RG combinations for breast cancer tissue and cell line research, respectively. In the case of studies including both breast cancer tissue and cell line research, the THRAP3 + RHOA + QRICH1 combination would be optimal.
In this study, we not merely veri ed the use of conventional RGs but also identi ed and selected more appropriate novel RGs for breast cancer research. The use of a single RG is not recommended for breast cancer research. Similarly, the use of double RGs should be avoided. These ndings are similar to what has been suggested in most of the studies using transcriptomic datasets [57]. Our results suggest that the recommended number of RG is at least three for breast cancer tissues or cell lines. Nevertheless, these promising results require further veri cation of target genes in order to obtain more reliable data sets.

Conclusions
In this study, we tested sixteen different candidate RGs in six different breast cancer tissues and seven breast cancer cell lines, using ve different statistical algorithms for evaluation. Our results indicate that SF1 + TRA2B + THRAP3 and THRAP3 + RHOA + QRICH1 are promising RG combinations for e cient gene normalization under different conditions. Furthermore, the availability of these RGs and the stability of their expression in various tumor tissues and cells will allow performing future studies focusing on genes essential for breast cancer biology, and choosing a reliable and appropriate RG combination will allow more accurate assessments of differential gene expressions in breast cancer research. Declarations analyzed and interpreted the data, MYW, JPZ and XHL revised the manuscript. All authors read and approved the nal manuscript.

Funding
This work was supported by the Natural Science Foundation Project of Chongqing, China (cstc2016jcyjA0338; cstc2020jcyj-msxmX0049; cstc2018jcyjAX0732; cstc2019jcyj-msxmX0823). This work was also supported in part by the National Natural Science Foundation of China (81900525) and the Doctoral Research Fund of Chongqing University Three Gorges Central Hospital (2017BSKYQDJJ06).