Comparison and identification of estrogen-receptor related gene expression profiles in breast cancer of different ethnic origins.

The interactions between genetic variants in estrogen receptor (ER) have been identified to be associated with an increased risk of breast cancer. Available evidence indicates that genetic variance within a population plays a crucial role in the occurrence of breast cancer. Thus, the comparison and identification of ER-related gene expression profiles in breast cancer of different ethnic origins could be useful for the development of genetic variant cancer therapy. In this study, we performed microarray experiment to measure the gene expression profiles of 59 Taiwanese breast cancer patients; and through comparative bioinformatics analysis against published U.K. datasets, we revealed estrogen-receptor (ER) related gene expression between Taiwanese and British patients. In addition, SNP databases and statistical analysis were used to elucidate the SNPs associated with ER status. Our microarray results indicate that the expression pattern of the 65 genes in ER+ patients was dissimilar from that of the ER- patients. Seventeen mutually exclusive genes in ER-related breast cancer of the two populations with more than one statistically significant SNP in genotype and allele frequency were identified. These 17 genes and their related SNPs may be important in population-specific ER regulation of breast cancer. This study provides a global and feasible approach to study population-unique SNPs in breast cancer of different ethnic origins.


Introduction
Breast cancer is one of the most common cancers for women in the world as it ranks number one among other cancers in developed countries, and ranks fourth in Taiwan. Current research suggests that interactions between genetic variants and a wide range of environmental factors may contribute to the development of breast cancer. Available evidence indicates that genetic variance within the population plays a role in the probability of breast cancer development, with a low incidence in certain groups of Asian women to the highest in Caucasian women (Hsiao et al. 2004).
With microarray technique, large amounts of gene expression data can be obtained in a short period of time. Gene expression profi ling is a powerful tool for identifying gene activity patterns, which enables the distinction among various subtypes of breast cancer (including luminal subtypes A and B, and ERBB2 between basal and normal) (Sorlie et al. 2001). According to the data from both clinical and animal studies, estrogen is crucial to the development and progression of breast cancer. Estrogen mediates its effects through the estrogen receptor (ER), which serves as the basis for many therapeutic interventions (Deroo and Korach, 2006). More than two-thirds of breast cancers show estrogen receptor expression at the time of diagnosis, and immunohistochemical detection of estrogen receptor expression is routinely used in making decisions on hormonal therapy for breast cancer (Holst et al. 2007). Gene variants in steroid hormone related genes, ESR1, ESR2, PGR, and HSD17B1 have been identifi ed to be associated with either an increased or decreased risk of breast cancer (Feigelson et al. 2006;Gold et al. 2004); however, the exact associations remain unclear. Furthermore, ER-α allelic variants have been reported to be associated with the risk for breast cancer (Gold et al. 2004) in Caucasians and in Taiwanese (Hsiao et al. 2004). Certain single nucleotide polymorphisms (SNPs) may infl uence the regulation of ERs and coregulators on tumor development and progression. In this study, we developed an approach to fi nd out population-unique SNPs in breast cancer of different ethnic origins by comparing the gene expression profi les of two different populations and related SNP data.

Materials and Methods
Tumor tissue samples and examination of ER Surgical specimens of breast cancer tumor tissue were freshly collected and snap frozen from patients who underwent surgery at National Taiwan University Hospital (NTUH) between 2002 and 2005. Cancer samples containing relatively pure tumor, as defi ned by greater than 50% tumor cells per highpower fi eld examined in a section adjacent to the tissue used, were included in this study. All the paraffi n sections of breast cancer specimens (3-5 m in thickness ) on slides were processed in Ventana's automated staining system (BenchMark â LT) (Ventana Medical System Inc., Tucson , AZ, U.S.A.) for the immunohistochemical stain (IHC). Firstly the slides were probed with CONFIRMTM anti-Estrogen Receptor (SP1) rabbit monoclonal primary antibody Ventana Medical System Inc.). Secondly, to localize and visualize ER protein within the specimen, iVIEW TM DAB Detection kit (Catalog # 760-091, Ventana Medical System Inc.) was applied. The negative control slides for tumor specimens were solely stained using iVIEWTM DAB Detection kit (Catalog # 760-091, Ventana Medical System Inc.).

RNA extraction and oligo microarray
Total RNA was extracted by Trizol ® Reagent (Invitrogen, U.S.A.), followed by RNeasy Mini Kit (Qiagen, Germany). Purifi ed RNA is quantifi ed at OD260nm by a ND-1000 spectrophotometer (Nanodrop Technology, U.S.A.) and qualitycontrolled by Bioanalyzer 2100 (Agilent Technology, U.S.A.). A human reference RNA pooled from 10 cell lines (Stratagene, U.S.A.) was used to serve as reference in microarray comparison. 0.5 g of total RNA was amplifi ed by a Low RNA Input Fluor Linear Amp kit (Agilent Technologies) and labeled with Cy3 or Cy5 (CyDye, PerkinElmer, U.S.A.) during the in vitro transcription process. Tumor RNA was labeled with Cy5 and RNA from Universal Human Reference RNA was labeled with Cy3. 2 g of Cy-labeled cRNA was fragmented to an average size of about 50-100 nucleotides by incubating with fragmentation buffer at 60 °C for 30 minutes. Correspondingly fragmented labeled cRNA is then pooled and hybridized to Human 1A (version 2) oligo microarray (Agilent Technologies) at 60 °C for 17 h. After washing and drying with nitrogen gun blowing, microarrays are scanned with an Agilent microarray scanner (Agilent Technologies, U.S.A.) at 535 nm for Cy3 and 625 nm for Cy5. Scanned images are analyzed by Feature Extraction software 6.0 (Agilent Technologies, U.S.A.), and each feature is quantifi ed by Feature Extraction to output the signal and background intensity; the data are substantially normalized by rank-consistency-fi ltering LOWESS method.

Microarray data analysis
In this study, we used microarray technique to profi le the gene expression of 59 breast cancer patients in Taiwan with primary invasive breast carcinoma. For the comparison of the gene expression profi les between Taiwanese and U.K. patients, we used U.K. breast cancer data from the National Cancer Institute (NCI) which were obtained from the online supplementary materials of Sotiriou et al. (Sotiriou et al. 2003). Detailed information and clinical characteristics for breast cancer patients in the U.K. and Taiwan are shown in Table 1. The gene expression datasets from our microarray results and that of NCI were grouped into ER+ and ER− respectively according to their clinical prognosis variables shown in Figure 1. To identify differentially expressed genes between the two groups and to increase the accuracy of signifi cant gene selection, combination of Signifi cance Analysis of Microarrays (SAM) (Tusher et al. 2001) and Optimal Discovery Procedure (ODP) (Storey, 2005) were jointly used for the selection of differentially expressed genes. were manually screened for further analysis. In order to collect detailed information on the rest of the SNPs, we referred to dbSNP and retrieved the genotype and allele frequencies of the two populations. Through Pearson's chi-square tests, the SNPs of our candidate genes that were signifi cantly differentiated in both genotype and allele frequencies between European American and Chinese were identifi ed. We utilized R software for the statistical analysis of SNPs.

Differentially expressed genes in ER+/ER− breast cancer subgroups
First, we used immunohistochemical stain to examine ER+/ER− breast cancer tissue subgroups (Fig. 2). The microarray data were grouped into ER+ and ER− groups based on the ER status of breast cancer patients. In order to select the differentially expressed genes, we combined two algorithms, Signifi cance Analysis of Microarray (SAM) (Tusher et al. 2001) and Optimal Discovery Procedure (ODP) (Storey, 2005). Using two-class unpaired SAM supervised analysis, and setting the imputation engine at 10 K-nearest neighbors, we found 78 differentially expressed genes from our data, in which 31 of them were up-regulated and 47 were down-regulated. After applying ODP algorithm and imputing missing data, 737 genes with q-value under cut-off at 0.01 were then selected. 65 genes were simultaneously picked up by both SAM and ODP analysis, and were regarded as the most signifi cant genes. The fl owchart for candidate gene selection is shown in Figure 1. The log 2 transformed data of the genes differentially expressed between ER+ and ER− groups are shown in the hierarchical clustering diagram (Fig. 3a).
The expression pattern of the 65 genes in ER+ patients (Fig. 3a, left) was dissimilar from that of the ER− patients (Fig. 3a, right). Similarly, in the NCI dataset, 107 genes were selected using SAM algorithm (genes with missing gene symbol were excluded); 56 of them were up-regulated and 51 were down-regulated. 900 genes were considered as signifi cant genes by the ODP algorithm, and these genes included the 88 genes selected by SAM, which were regarded as the most signifi cant genes in the NCI dataset. Hierarchical clustering diagram displays the gene expression pattern of these 88 genes in Figure 3b.

Comparison of differentially expressed genes in our and NCI data
The microarrays used in our and NCI datasets have 3197 gene probes in common, and were designated as our candidate gene pool for analysis. Shown in Figure 4, with regard to our candidate gene pool, in the NTUH dataset, 9 of 65 signifi cant genes fell in this block region (3197 common genes for both NTUH and NCI datasets); while 68 of 88 genes fell in the same block in the NCI dataset. Five genes found in the overlapping region were the common genes among the signifi cant genes from the two datasets, implying these genes were differentially expressed in both Taiwanese and U.K. patients. They are basic transcription factor 3 (BTF3), cyclin-dependent kinase inhibitor 2A (CDKN2A), estrogen receptor 1 (ESR1), GATA binding protein 3 (GATA3), and trefoil factor 3 (TFF3). All of these fi ve genes have already been reported to be associated with ER status in human breast cancers. There were 67 differentially expressed candidate genes appearing exclusively in either the Taiwanese or the U.K. patients, as depicted by the Venn diagram (Fig. 4). These 67 mutually exclusive genes drew our attention since they might refl ect the differences between the two populations, and may potentially infl uence the ER status in each population. Further details about these genes are shown in Table 2.

Association analysis of SNPs in different ethnic origins
Since the SNP information of many human genes is available, we searched for possible SNP variations of these 67 genes between the representative populations of Taiwanese and U.K. patients in public databases. These results, namely the population-unique SNPs in the mutually exclusive genes, may imply the associations of these SNPs with ER status in breast cancer of different ethnic origins. Perlegen Genotype Browser and dbSNP, two public SNP databases, were useful for our association analysis of SNPs. Perlegen Genotype Browser includes SNP information on three ethnic populations, African American, European American, and Chinese. We chose the SNPs of European American and Chinese to represent the U.K. and Taiwanese populations respectively for further statistical analysis. Using a cutoff p-value of 0.001, Pearson's chi-square test identifi ed 17 candidate genes which could be potential focuses in breast cancer development, including damage-specifi c DNA binding protein 2 (DDB2), ATP-binding cassette, sub-family D (ALD) and member 3 (ABCD3), ATPase, Na+/K+ transporting, beta 3 polypeptide (ATP1B3), sphingosine kinase type 1 interacting protein (SKIP), developmentally regulated GTP binding protein 1 (DRG1), interleukin-1 receptorassociated kinase 1 (IRAK1), keratin 7 (KRT7), dipeptidyl-peptidase 6 (DPP6), E2F transcription factor 3 (E2F3), fucosyltransferase 8 (FUT8), hydroxysteroid (17-beta) dehydrogenase 4 (HSD17B4), lipin 1 (LPIN1), myosin VI (MYO6), nuclear factor I/B (NFIB), protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1), syndecan 4 (SDC4), and seven in absentia homolog 2 (SIAH2). These genes had more than one statistically significant SNP (p-value Ͻ 0.001) in genotype and allele frequency between our and NCI datasets. A total of 83 SNPs among the 17 genes were identifi ed. Genes with the most statistically signifi cant SNPs between European American and Chinese are as follows: DPP6 (19 SNPs), followed by HSD17B4 (13 SNPs), ABCD3 (10 SNPs), and FUT8 (9 SNPs). The remaining genes have fewer than fi ve significant SNPs. In Table 3, the differential genes with their possible related SNPs are presented.

Functional study of identifi ed genes
In order to understand the functions of the 17 identifi ed genes and their possible relationships with ER expression, we used DAVID database (Database for Annotation, Visualization and Integrated Discovery) (Dennis et al. 2003) to perform this analysis. These genes were functionally classifi ed into the following six categories: alternative splicing (DPP6, PTPNS1, NFIB, MYO6, SKIP, IRAK1, DDB2, and FUT8), signal-anchor In order to compare the difference of differentially expressed genes selected in NTUH and NCI datasets, we only focus on the gene probes common in both datasets. After fi ltering out the genes only present in either NTUH or NCI datasets, a total number of 3197 genes were left, and were considered as our common candidate gene pool for further identifi cation. In the NCI dataset, 68 of the 88 differentially expressed genes (refer to Fig. 1) were in the pool; while NTUH dataset has 9 genes left in the pool. The 67 mutually exclusive genes were selected for further SNP analysis. ( D P P 6 , AT P 1 B 3 , F U T 8 ) , s h 3 -b i n d i n g (PTPNS1, FUT8), disease mutation (HSD17B4, MYO6, ABCD3, DDB2), peroxisome (HSD17B4, ABCD3), and nucleotide-binding (MYO6, ABCD3, IRAK1, DRG1). As expected, all of the observed functions are either directly or indirectly involved in the regulation of estrogen receptors, in particular the genes closely associated with alternative splicing, sh3-binding, disease mutation, and peroxisome (Bonofi glio et al. 2005;Pfeffer et al. 1996;Troester et al. 2006;Zhou et al. 2006).

Discussion
According to the result of SNP analysis, 83 SNPs from the 17 differentially expressed genes in either the Taiwanese or U.K. ER+/ER− breast cancers were identifi ed ( Table 3). The fi ndings indicate these 17 genes and their related SNPs may be important in the ER regulation in breast cancer of different populations. Since many reports have showed the association of genetic variations of estrogen receptor and breast cancer (Iwase H, 1996;Kang et al. 2002;Roodi et al. 1995), it is worthy to presume that the 83 identifi ed SNPs in ERrelated genes may very well have either direct or indirect infl uence on the ER status in breast cancer of different ethnic populations. Here we used SNPs to represent the polymorphisms between Taiwan and U.K. populations since they are good indicators for measuring the genetic differences between two different ethnic origins. Moreover, among the 17 signifi cant genes mentioned above, only DDB2 was from our data, while the rest of the 16 genes were from the NCI data. DDB2 is known for its function in DNA binding while it also acts as a tumor suppressor. A recent study provides further evidence that rs830083 polymorphisms in DDB2 may contribute to the etiology of lung cancer in Chinese population ). This corresponds to our study in that the SNPs we identifi ed in DDB2 may play a role in specifi c regulation of estrogen receptor in Chinese breast cancer patients. Except for DDB2, the expressions of the other 16 genes, ABCD3, ATP1B3, SKIP, DRG1, IRAK1, KRT7, DPP6, E2F3, FUT8, HSD17B4, LPIN1, MYO6, NFIB, PTPNS1, SDC4, and SIAH2, are different in ER-positive and ER-negative U.K. patients. ATP1B3 is derived from the primary differentiation event during mammalian development (Adjaye et al. 2005). IRAK1 has been proposed that one SNP within it, when combined with highrisk genotype at TLR6-1-10, conferred a signifi cant increase in the risk for prostate cancer, suggesting synergistic effects between sequence variants in IRAK1 and the TLR 6-1-10 gene cluster (Sun et al. 2006). LPIN1 is reported to be a candidate gene for human lipodystrophy syndromes as common SNPs in LPIN1 of lipodystrophy patients have been identified (Cao and Hegele, 2002). Limited research has been done on the genetic variants in these genes, but the correlations of these ERrelated genes and tumorigenesis are likely to correspond to an increase in susceptibility for breast cancer. Cytokeratin 7 (encoded by KRT7/CK7) is found in the majority of type 1 papillary renal cell carcinomas and chromophobe renal cell carcinomas, and its expression profi le alteration is particularly associated with tumorigenesis of primary adenocarcinoma of the small intestine (Chen and Wang, 2004;Mazal et al. 2005). SDC4 is a celladhesion molecule related to the enhanced adhesion of cancer cells to fi bronectin (Koike et al. 2004), and functions as a receptor in intracellular signaling. A study has showed SKIP to be a protein likely to participate in the regulation of SPHK1 activity modulation, but much about its functions remain unknown. The association between estrogen and SIAH2 has been illustrated by a mechanism in which the estrogen-ER complex markedly reduces the level of N-CoR through a process related to the up-regulation of SIAH2 and the subsequent targeting of N-CoR for proteasomal degradation (Frasor et al. 2005).
The genes, BTG2, ISL1, MCP (also known as CD46), SIAH2, and XBP1, in the list of mutually exclusive subset shown in Table 2 were previously known to be associated with ER status (Frasor et al. 2005;Gay et al. 2000;Kawakubo et al. 2006;Lacroix and Leclercq, 2004;Rushmere et al. 2004). These fi ve ER-regulated genes were only observed to be differentially expressed in the NCI dataset, but not in our dataset. Intriguingly, the majority of published reports used breast cancer cell lines extracted from Caucasian patients in their studies. One possible explanation for this phenomenon is that the association of these fi ve genes and estrogen is solely limited to Caucasian breast cancers patients.
In addition, fi ve genes, BTF3, CDKN2A, ESR1, GATA3, and TFF3, fell in the overlapping subset as shown in Figure 4, meaning they were identifi ed to be differentially expressed in both our and the  NCI datasets. All of these genes have been identifi ed and studied to play roles in ER regulation (Doane et al. 2006;Green et al. 2007;Milde-Langosch et al. 2001;Oh et al. 2006). According to a study, specifi c interaction between BTF3 and ERalpha has been verifi ed in vivo and in vitro; moreover, BTF3 may infl uence the mechanism by which the AF-1 (transcriptional activation function) of ERalpha simulates gene expression (Green et al. 2007). GATA3, TFF3, and ESR1 are three estrogen-regulated genes known for their over-expression in luminal subtype A, which is primarily composed of ER+ (Chen and Wang, 2004;Doane et al. 2006). Meanwhile, GATA3 is also a transcription factor serving as a curial component in the tumorigenesis of ER+ breast cancer, and is involved in growth control and maintenance of the differentiated state in epithelial cells (Usary et al. 2004). The SNPs identifi ed in our selected genes may be involved in determining whether ER expression causes disparities between Chinese and Caucasian breast cancer patients. Our work can provide possible SNPs associated with ER status in breast cancer of different ethnic origins and a set of potential gene expression signatures for novel targeted therapeutic strategies.