Identification of Novel Reference Genes Using Multiplatform Expression Data and Their Validation for Quantitative Gene Expression Analysis

Normalization of mRNA levels using endogenous reference genes (ERGs) is critical for an accurate comparison of gene expression between different samples. Despite the popularity of traditional ERGs (tERGs) such as GAPDH and ACTB, their expression variability in different tissues or disease status has been reported. Here, we first selected candidate housekeeping genes (HKGs) using human gene expression data from different platforms including EST, SAGE, and microarray, and 13 novel ERGs (nERGs) (ARL8B, CTBP1, CUL1, DIMT1L, FBXW2, GPBP1, LUC7L2, OAZ1, PAPOLA, SPG21, TRIM27, UBQLN1, ZNF207) were further identified from these HKGs. The mean coefficient variation (CV) values of nERGs were significantly lower than those of tERGs and the expression level of most nERGs was relatively lower than high expressing tERGs in all dataset. The higher expression stability and lower expression levels of most nERGs were validated in 108 human samples including formalin-fixed paraffin-embedded (FFPE) tissues, frozen tissues and cell lines, through quantitative real-time RT-PCR (qRT-PCR). Furthermore, the optimal number of nERGs required for accurate normalization was as few as two, while four genes were required when using tERGs in FFPE tissues. Most nERGs identified in this study should be better reference genes than tERGs, based on their higher expression stability and fewer numbers needed for normalization when multiple ERGs are required.


Introduction
Gene expression analysis is becoming more important in diagnostic fields as it allows for the identification of novel biomarkers relevant to diseases. Endogenous reference genes (ERGs) are widely used to normalize the mRNA level in the relative quantification to provide an accurate comparison of gene expression between different samples [1]. Traditional ERGs (tERGs), such as GAPDH and ACTB, have been used in expression studies without proper validation because of the assumption that they are expressed at constant levels across different samples and regardless of experimental treatments [2,3]. However, several reports have shown that the expression of tERGs can vary in different tissues and be regulated by experimental treatments or pathological state [2,[4][5][6][7][8][9][10]. As the use of inappropriate ERGs in relative quantification of gene expression can result in biased expression profiles [4,11,12], the selection of proper ERGs is essential for accurate measurement especially in quantitative methods including qRT-PCR, which is a highly sensitive and accurate method [13].
Although there have been a number of previous studies aimed at finding suitable ERGs, most of them have focused on selecting the most stable genes from commonly used ERGs [14][15][16][17]. Moreover, the identification of novel ERGs (nERGs) has been based primarily on microarray data [10,[18][19][20][21]. Although short-oligo microarrays such as Affymetrix GeneChips, have the advantage of being highly sensitive in detecting low abundance transcripts (nearly 3-20 transcripts per million (tpm)) [22], they have some disadvantages such as inaccurate cross hybridization between probes and transcripts, differences in hybridization efficiencies between probe sets, limited linear range of signal, and incorrect annotation of transcripts [23,24]. Therefore, an approach using only microarray data may not be sufficient to identify the most suitable ERGs.
Although an ideal universal ERG may not exist [1,15], a combination of large expression data from different platforms is expected to complement the limitation of each platform [25] and allow for the identification of more suitable ERGs.
Here, we describe an algorithm for the identification of nERGs using the publicly available human gene expression datasets in addition to in-house microarray data. The expression of selected nERGs in datasets was validated by qRT-PCR in 108 human samples and their expression stability was compared to that of tERGs.

Ethics Statement
This study was approved by the institutional review board of Samsung Medical Center. Patients' written informed consent was not required to be obtained as the institutional review board approved the use of human tissues in this study without patients' consent with the reason that the data were analyzed anonymously and patients could not be identified.

EST, SAGE, microarray gene expression dataset construction
EST and SAGE human gene expression data were collected from the publicly available Cancer Genome Anatomy Project (CGAP) site (http://cgap.nci.nih.gov/). Microarray data was obtained from the Cancer Genome Expression Database of LG Life Sciences, which is based on the Affymetrix HG-U133 (for the samples included in microarray data, see Table S1) [26]. A detailed description of each dataset construction is provided in the Supplementary Materials and Methods (Text S1) and shown schematically in Figure 1.

Algorithm for the identification of candidate HKGs and 13 nERGs
The methodology used to identify nERGs is outlined in Figure 1. First, to identify nERGs, we searched for candidate HKGs whose expression is detected in most tissues using 0's proportion in EST and SAGE datasets. 0's proportion is defined as the proportion of different tissues in which a given gene is not expressed and was calculated as follows: Figure 1. Flowchart of the methodology for identification of nERGs. 2,087 candidate HKGs were first identified by selecting the genes meeting the following criteria: 0's prop ,0.4 in EST, ,0.1 in shortSAGE and ,0.3 in longSAGE. 0's prop represents 0's proportion (number of tissues in which the gene is not expressed/total number of tissues, 0#0's prop#1). Among the candidate 2,087 HKGs, 13 nERGs with the lowest CVs were further identified by selecting the genes common to all four datasets among the genes with the 400 lowest CVs (approximately 20% of candidate HKGs). In this equation a value of 0 indicates that the gene is expressed in all tissues, and a value of 1 indicates that the gene is not expressed in any tissue. If a gene was expressed in any one library of multiple libraries corresponding to the same tissue, that gene was considered to be expressed in that tissue. A total of 2,087 candidate HKGs with low 0's proportions in all three datasets were selected (Table S2). Cut-off values for 0's proportion in each dataset were determined to include the majority of the 575 HKGs identified from a previous Affymetrix U95A chip [27] (see Figure 1, Text S1). Affymetrix (Affy) data for 5,317 probe sets corresponding to 2,087 HKGs were obtained. Mean gene expression and CV (%) for each gene were calculated in each dataset. The Student's t-test was used to assess whether the mean gene expression between candidate HKGs and non-HKGs was statistically different. When this set of 2,087 HKGs was classified functionally using the Functional Classification Catalogue (FunCat, Version 2.0) [28], known functions were available for only 1,318 genes of them (for details, see Text S1). For CpG island analysis, sequences upstream of the annotated transcription start site of RefSeq genes were obtained from the UCSC site and CpG islands in their upstream sequences were searched using the CpGIE software [29]. The statistical significance of differences in the frequency of genes with CpG islands between HKGs and non-HKGs was determined using Z-test (R program, http://www. R-project.org). P,0.05 was considered to be significantly different.
Among the 2,087 HKGs, potential ERGs were further identified according to the following process: 1) the CV was calculated for each UniGene cluster and 2) the genes in each dataset were ranked by ascending CV values. Among the first 400 genes with the lowest CVs (approximately 20% of 2,087 HKGs), we found 13 nERGs common to all four datasets (i.e. EST, LongSAGE, ShortSAGE, and Affy). The CV values between tERGs and nERGs were compared using Wilcoxon rank sum test.

Genomic variations of nERGs and tERGs
The Database of Genomic Variants (http://projects.tcag.ca/ variation/) was used to search for genomic variations of the 2,087 HKGs, including nERGs and tERGs.

RNA preparation from human frozen, FFPE tissues and human cancer cell lines and qRT-PCR
The Department of Pathology at Samsung Medical Center provided 26 human frozen tissues and 60 FFPE tissues and the Korea Cell Line Bank (KCLB) or ATCC provided 22 human cancer cell lines (Table S3). The 60 FFPE tissues consisted of 10 breast cancers, 8 normal stomach, 9 stomach cancers, 10 normal ovaries, 4 ovarian adenomas, 9 ovarian borderline tumors and 10 ovarian cancers.
Total RNA isolation and cDNA synthesis from human samples are described in detail in the Supplementary Materials and Methods (Text S1). Total RNA from frozen tissues and cell lines was isolated using Trizol (Invitrogen Life Technologies). The inclusion criteria for the RNA samples were A260/280$1.80 and rRNA (28S/18S) ratio$1.0. The integrity of RNA from frozen tissues and cell lines was assessed using an Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA). Paradise Whole Transcript RT Reagent system (Arcturus, CA, USA) was used for RNA isolation and RT of FFPE tissues. PCR primers and Universal Probe Library (UPL) numbers for this study are provided in Table 1. Primers and probes for each gene were designed to have a short amplicon size. All PCR reactions were performed in a Lightcycler 2.0 (Roche Applied Science) according to standard procedures (for details, see Text S1). PCR efficiency for each gene in frozen tissues and cell lines was determined by both the serial dilution method using Lightcycler 4.0 software and estimation using the LinRegPCR program [30] (Table 1, Text S1). To minimize experimental variation, the same gene in different samples was tested in the same PCR run.

Gene expression stability analysis in qRT-PCR
To analyze gene expression stability, we used the geNorm v.3.4 [1] and NormFinder software [31]. Cp values were converted into relative quantities for analysis with geNorm and NormFinder, taking into consideration the PCR efficiencies of the genes, as shown in Table 1 (for details, see Text S1). The optimal number of ERGs for normalization was also determined using the geNorm program. The statistical differences in gene expression stability values between nERGs and tERGs were determined using Wilcoxon rank sum test. The Pearson and Spearman correlation between the gene expression stability calculated by geNorm or NormFinder (M for geNorm and S for NormFinder) and the CV calculated in each dataset were analyzed using R statistical software. P,0.05 was considered to be significant.

Results
Identification and characterization of 2,087 candidate HKGs 2,087 candidate HKGs were first identified using 0's proportion in each dataset ( Figure 1, Table S2). According to the functional classification by FunCat [28], genes with a variety of basic cellular functions were included in this list. In particular, proteins mediating protein fate (23%) and cellular transport (21%) had the highest frequency ( Figure 2A). This is in contrast to the previously reported classifications of HKGs, in which metabolic and ribosomal proteins were enriched [27,32]. We compared the frequency of genes with CpG islands in the upstream sequences of transcription start sites in HKGs relative to non-HKGs. Most HKGs (70%) were found to possess a CpG island within 1,000 bp from the transcription start site, consistent with previous studies [33,34], while fewer CpG islands were found in the upstream sequences of non-HKGs (P,0.001) ( Table 2). Mean expression level of HKGs was significantly higher than that of non-HKGs in all datasets (P,0.001) ( Figure 2B), also consistent with previous work [27] (for detailed description on the expression of 2,087 HKGs in 4 datasets, see Text S2). CV values of the 2,087 genes showed a poor correlation between the four datasets, whereas gene expression showed a relatively higher correlation (Table S4).
The gene expression for each of the 13 nERGs showed a significant correlation between datasets (P,0.001, were not significant. CVs between the datasets were not significantly correlated (P.0.05, Table S6).
To further confirm the suitability of the nERGs, gene copy number variations (CNVs) of the nERGs, which can affect the gene expression, were investigated by searching the Database of Genomic Variants. As shown in Table 4, only OAZ1 and DIMT1L were found to be located in a chromosome region where CNVs were reported, indicating their expression might be deregulated by genomic aberrations.

Comparison of tERGs and nERGs in dataset
We compared the 13 nERGs with 13 commonly used tERGs: GAPDH, ACTB, HPRT1, PPIA, B2M, TBP, HMBS, RPLP0, PGK1, GUSB, TFRC, H6PD, and ALAS1. The mean expression of the nERGs was relatively lower than the highly expressed tERGs, including GAPDH, ACTB, B2M, and PPIA, in all datasets and was expressed at levels similar to those tERGs, which had lower expression levels ( Figure 3, Table S7). With respect to variation, most of the tERGs showed relatively higher variation than the nERGs and the mean CV values of nERGs were significantly lower than those of tERGs (P,0.001 in EST, ShortSAGE, and Affymetrix, P = 0.003 in LongSAGE, Table S8), supporting the notion that the identified nERGs are generally expressed more stably and at relatively lower levels than tERGs.
A search for genomic variation of tERGs revealed that many of them (ACTB, GAPDH, PGK1, B2M, TBP, TFRC, ALAS1) were located in the genomic locus where CNVs are known, whereas only 2 genes among the nERGs were found in those regions ( Table 4), suggesting that the higher expression variation of some tERGs might be due in part to CNVs.

Validation of nERGs by qRT-PCR
To validate both the level and stability of gene expression of nERGs selected from the four datasets, the expression of 13 nERGs and 8 tERGs was measured in a total of 108 human samples, including 26 frozen tissues, 22 cancer cell lines and 60 FFPE tissues, by qRT-PCR using Taqman probes (refer to Text S1 for an explanation for why 8 tERGs among 13 tERGs were chosen for qRT-PCR). Except PPIA, a small amplicon for each gene was designed for the reliable measurement of its expression, especially in FFPE tissues where RNA from these samples is frequently degraded. When the PCR efficiency of each gene was determined using the serial dilution method, each gene was amplified at 90-100% efficiency ( Table 1). The CVs of Cp values confirmed that the between-assay precision in two or three repeats was within 5% (data not shown).
First, the expression profiles of these genes in each of the 48 samples, including frozen tissues and cancer cell lines, are presented in Figure 4A. The 13 nERGs were constitutively expressed in all 48 samples. Seven tERGs showed a wide range of expression (Cp: 13.52 to 29.39), but H6PD was not widely expressed in frozen tissues and this gene was consequently excluded from subsequent calculations. The Cp values of 13 nERGs ranged from 18.90 to 28.79 ( Figure 4B). tERGs could be divided into highly expressed genes (median ,20 cycles) and lowly expressed genes (median.23 cycles). Highly expressed genes included B2M, PPIA, GAPDH, ACTB and lowly expressed genes consisted of HPRT1, TBP and HMBS. Of the nERGs, 12 genes displayed an intermediate expression level between the highly expressed and the lowly expressed tERGs ( Figure 4B). The mean expression level of nERGs was significantly lower than that of highly expressed tERGs, whereas it was significantly higher than that of lowly expressed tERGs (P,0.001, Table S9). ZNF207 was the most highly expressed gene, followed by UBQLN1 and CUL1. OAZ1 was the gene with the weakest expression.
We further investigated the expression of the 13 nERGs by qRT-PCR in 60 FFPE tissues to test whether the nERGs could be used in such tissues showing the significant degradation of mRNA. Except DIMT1L, the expression of all genes was measurable in all 60 samples. The Cp range for tERGs was 18.85 to 33.02 and for nERGs was 23.33 to 31.58 ( Figure 4B). DIMT1L was not amplified in 5 samples and therefore was excluded from further expression stability analysis. The expression pattern in the FFPE tissues was similar to that of previous 48 samples (26 frozen tissues and 22 cancer lines) despite the discrepancy in sample types. Remarkably, PPIA expression which was detected at high level in frozen tissues/cell lines was observed at markedly decreased level in FFPE tissues. This observation might be due to the long amplicon size of PPIA (326 bp), whereas the amplicon size of other genes is small ranging from 60 to 110 bp ( Table 1), indicating that small size of amplicon is required for the detection of gene expression in FFPE tissues in which RNA is frequently degraded.

Gene expression stability of nERGs
We first assessed the gene expression stability (detailed in Text S1) in 48 samples, including 26 frozen tissues and 22 cell lines based on qRT-PCR using two programs, geNorm and NormFinder. All genes tested displayed relatively high expression stability with low M values (,0.9), which were below the default limit of 1.5 in geNorm (Table 5a). GPBP1 and CUL1 were identified as the two most stable genes. B2M was the least stable gene and had the highest M value (0.888), followed by ACTB (0.843), HMBS (0.815), and GAPDH (0.793). When calculated by NormFinder, TBP and  PAPOLA were the two most stable genes (i.e. having the lowest S values) (Table 5a). Similar to the results from geNorm, tERGs including B2M, ACTB, GAPDH, HMBS and HPRT1, were found to have less stable expression than nERGs. The mean gene expression stability values in nERGs by geNorm and NormFinder were significantly lower than those in tERGs in both 48 samples and 60 FFPE tissues (P,0.05, Tables S10). Both analyses demonstrated that most nERGs showed relatively higher expression stability compared to tERGs, suggesting that nERGs are more suitable for normalization. Moreover, even when gene expression stability was analyzed with relative expression level calculated by PCR efficiencies using the LinRegPCR program, most of nERGs showed a more stable expression than tERGs (data not shown).
Pearson correlation analysis revealed the higher concordance of both M and S values with CVs in the EST and shortSAGE ( Table 6) than in the Affymetrix. High correlation between M and S (Pearson correlation: 0.953, P,0.001) was also observed, indicating that both analyses produced similar results.
Consistently with frozen tissues and cell lines, both gene expression stability values demonstrated that most nERGs with low stability values are superior to tERGs in terms of expression stability in FFPE tissues (Table 5b), proving the usefulness of our nERGs as reliable measurements of gene expression in those tissues. GPBP1 and PAPOLA were the two least variable genes in geNorm and ARL8B and LUC7L2 were the top two ranked genes in NormFinder. However, in the analysis by each tissue, FBXW2, TRIM27, and CUL1 showed high stability values in breast, ovary, and stomach, respectively ( Table 7), suggesting that they have high expression variation in each tissue. Also, S values by NormFinder in the ovary and stomach FFPE tissues were calculated based on the combination of intra-and inter-group variations between normal and tumor samples. The relatively high S values of TRIM27 in the ovary and CUL1 in the stomach suggest that their expression might be regulated in specific tumors compared to their normal tissues.
The optimal number of ERGs for normalization was determined using geNorm. In both the 48 human frozen and cell line samples and 60 FFPE tissues, the optimal number of nERGs required for normalization was fewer than when using tERGs ( Figure 5). Four tERGs and three nERGs were calculated as the optimal number of ERGs needed in the 48 samples when using a V of 0.15 as the cut-off value [1]. In the FFPE samples, V2/3 was under 0.15 when using nERGs, suggesting that only two genes are sufficient for optimal normalization, whereas four of seven tERGs were necessary for accurate normalization. This indicates that fewer ERGs are required for optimal normalization when using our nERGs rather than using tERGs.

Discussion
In the present study, we identified nERGs in human samples using a comparative analysis of different large datasets of human gene expression profiles, while previous attempts to identify nERGs that are superior to tERGs were limited to the analysis of a microarray [10,[18][19][20][21] or EST data [35].
Candidate HKGs were first selected and included 2,087 genes, which is a larger number of genes than previously identified by other groups [27,32,36]. Their characteristics, including high levels of expression [27,36] and the prevalence of CpG islands in the promoter regions [33,34], were in line with previous studies based on smaller numbers of HKGs, reflecting that ''real'' HKGs showing constitutive expressions in all tissues are enriched in our list. Thus, this list can be used as a reliable source for the study of HKGs.
The 13 nERGs further identified from candidate HKGs showed relatively lower CV and lower expression than most of the tERGs in all datasets. These findings were further confirmed by qRT-PCR using frozen tissues and cell lines. Generally, the expression of 13 nERGs was lower than the highly expressed tERGs and higher or similar to the weakly expressed tERGs. The expression stability values of the nERGs calculated by both programs also demonstrated that nERGs are generally more stably expressed than tERGs. Although there were slight differences in their rankings between the two programs, PAPOLA, CUL1, TBP, LUC7L2, GPBP1 and TRIM27 were found to be the genes with the most stable expression in 48 samples. The observation that TBP is one of the most stable genes is not surprising because relatively lower variability of TBP among traditional ERGs was already expected in datasets including EST and ShortSAGE (Figure 3). On the other hand, our data further supported the unsuitability of the most commonly used ERGs, like GAPDH, ACTB, for normalization, in line with previous works [7,8,14].
Our nERGs were also successfully validated by qRT-PCR in FFPE tissues. Despite the usefulness of archival FFPE tissue specimens in conjunction with clinical data, frequent degradation of RNA from FFPE tissues has been regarded as an obstacle in the gene expression analysis of those samples [37]. The expression of nERGs was measurable with reliable Cp values in all 60 FFPE tissues and most of the nERGs outperformed tERGs with respect to expression stability. Furthermore, the expression level of target gene can be calculated relative to the expression level of one or multiple nERGs using standard curve or comparative Ct(Cp) method in quantitative gene expression analyses, including qRT-PCR. The most suitable ERG or ERGs in the designed experiment can be selected among 13 nERGs or combination of tERGs and 13 nERGs based on gene expression stability values calculated by the geNorm and/or NormFinder program. Recently developed PCR array, high throughput gene expression measurement using qRT-PCR, also requires more suitable ERGs than conventional tERGs for accurate quantification of gene expression [38]. Recently, normalization using the geometric mean of multiple ERGs has been considered to be more accurate for normalization [1], especially in situations when no optimal ERG has been identified  [31]. Moreover, the use of multiple ERGs is more important in the expression analysis using FFPE tissues, where poor RNA quality causes additional variations in gene expression [39]. Five tERGs are included for the normalization of gene expression in the Oncotype DX TM assay (Genomic Health) which is a clinical validated multigene qRT-PCR test using FFPE tissues to predict recurrence of breast cancer [40]. However, the superiority of our nERGs was demonstrated by showing that a fewer number of genes is required for normalization when using the nERGs as compared with using the tERGs. Considering the limited amounts of samples or experiment cost, nERGs also outperformed previous tERGs when multiple genes are needed for reliable normalization. Taken together, nERGs will be a better choice for more reliable normalization in gene expression analysis using FFPE tissues. The superiority of nERGs over tERGs is based in their lower expression level as well as their high expression stability. Use of ERGs with expression levels similar to target genes is recommended so that the comparisons fall on the same linear scale [19,41,42]. Therefore, our nERGs, with relatively lower expression, rather than GAPDH or ACTB showing high expression, can be better candidates for normalization of a wide range of genes, including weakly expressed genes. This is significant given that the majority of transcripts in human tissues are expressed in low abundance [36].
Remarkably, we observed no significant correlation between the stability values calculated from qRT-PCR data and CV in the Affymetrix data, which is in contrast to the significant correlation between stability values and CV in the EST and SAGE dataset. This suggests that EST or shortSAGE may be better sources for exploring nERGs rather than microarrays, which have traditionally been used as a source for screening ERG and supports our initial hypothesis that the microarray might not be a good source for ERGs. Furthermore, nERGs identified here might be used as reference for relative measurements of gene amplification, which is a frequent genetic alteration leading to unregulated gene expression in cancer [43]. As the relatively constant expression of these genes in both normal and tumor tissues provides the possibility that these genes are located in a chromosomal region in which no genetic alterations are found in human tumors, we investigated the genomic CNVs of nERGs using publicly available databases. Most nERGs, except OAZ1 and DIMT1L, were located in genomic regions where CNVs were not reported, whereas many tERGs were located in regions with CNVs. The relatively lower expression stability of OAZ1 and DIMT1L, as well as tERGs like GAPDH and ACTB, might be explained by these genomic aberrations. However, the suitability of nERGs as a reference for the relative measurement of gene amplification remains to be further investigated and validated through experiments. Meanwhile, even genes with genomic variations can be used for the normalization in gene expression, provided that their expression is not affected by genomic aberrations.
In conclusion, we have identified a set of candidate HKGs and nERGs based on a comparative analysis of EST, SAGE, and Affymetrix datasets. This is the first study using three different platforms to identify nERGs, and most of the 13 ERGs identified in these datasets were demonstrated to be more stably expressed than tERGs through the validation using qRT-PCR in large human samples including cell lines, frozen tissues and FFPE tissues. Moreover, these nERGs were expressed at relatively lower levels than most commonly used high expressing tERGs, making them more suitable for normalization of transcripts from a wide range of expression levels. We have also shown that fewer ERGs are required for accurate normalization using nERGs than using tERGs, especially in FFPE tissues when the use of multiple ERGs is required.