Genetic Variants in TGF-β Pathway Are Associated with Ovarian Cancer Risk

The transforming growth factor-β (TGF-β) signaling pathway is involved in a diverse array of cellular processes responsible for tumorigenesis. In this case-control study, we applied a pathway-based approach to evaluate single-nucleotide polymorphisms (SNPs) in the TGF-β signaling pathway as predictors of ovarian cancer risk. We systematically genotyped 218 SNPs from 21 genes in the TGF-β signaling pathway in 417 ovarian cancer cases and 417 matched control subjects. We analyzed the associations of these SNPs with ovarian cancer risk, performed haplotype analysis and identified potential cumulative effects of genetic variants. We also performed analysis to identify higher-order gene-gene interactions influencing ovarian cancer risk. Individual SNP analysis showed that the most significant SNP was SMAD6: rs4147407, with an adjusted odds ratio (OR) of 1.60 (95% confidence interval [CI], 1.14–2.24, P = 0.0066). Cumulative genotype analysis of 13 SNPs with significant main effects exhibited a clear dose-response trend of escalating risk with increasing number of unfavorable genotypes. In gene-based analysis, SMAD6 was identified as the most significant gene associated with ovarian cancer risk. Haplotype analysis further revealed that two haplotype blocks within SMAD6 were significantly associated with decreased ovarian cancer risk, as compared to the most common haplotype. Gene-gene interaction analysis further categorized the study population into subgroups with different ovarian cancer risk. Our findings suggest that genetic variants in the TGF-β signaling pathway are associated with ovarian cancer risk and may facilitate the identification of high-risk subgroups in the general population.


Introduction
Ovarian cancer is the leading cause of death from gynecologic cancer among women in the United States, with an estimated 21,880 new cases and 13,850 deaths in 2010 [1]. Because the disease is mostly symptomless in early stages and there are currently no effective screening methods, 75% of women present with advanced-stage disease (stage III or IV). The 5-year survival rate of advanced-stage disease is only around 30% [2]. The etiology of ovarian cancer remains largely unknown, although hormonal factors, inflammation, and wound healing are thought to play important roles [3].
Ovarian cancer is a multifactorial disease and genetic susceptibility has been suggested in previous studies. For example, mutations in BRCA1, BRCA2, MLH1, and MSH2 were found to account for approximately 50% of familial ovarian cancers [4,5]. However, there are compelling evidence suggesting that common genetic variants contribute to ovarian cancer susceptibility [6,7]. Recently, genome-wide association studies (GWAs) have identified several common susceptibility alleles in four loci showing strong associations, but as most SNPs identified in GWAs, the associations are usually low in magnitude with most of the ORs less than 1.3 [8,9,10].
Due to the heterogeneous and multigenic nature of ovarian cancer, it is unlikely that any single SNP will be sufficient to confer disease risk. A comprehensive pathway-based analysis that focuses on evaluating the cumulative effects of a panel of SNPs would be more powerful to pinpoint the susceptibility genes and polymorphisms.
The transforming growth factor-b (TGF-b) pathway, including TGF-bs, bone morphogenetic proteins (BMPs), activins, and related proteins, is involved in a diverse array of cellular processes, including cell proliferation, morphogenesis, migration, extracellular matrix production, and apoptosis. Alteration of TGF-b superfamily signaling has been implicated in various human pathologies, including cancer, developmental disorders, cardiovascular and autoimmune diseases [11,12,13]. Experimental data have shown that more than 75% of human ovarian cancers exhibit resistance to TGF-b signaling [14,15], suggesting that diminished TGF-b responsiveness is a key event in this disease. In normal ovarian surface epithelial cells, autocrine growth inhibition is maintained by TGF-b [16], but tumor cells escape the antiproliferative effects of TGF-b by acquiring mutations in the components of the signaling pathways or by selectively disrupting TGF-b. Mutations and deletions of Smad genes in the TGF-b signaling pathway often lead to unstable protein products that are rapidly degraded after ubiquitination and shift the equilibrium of the signaling cascade resulting in tumorigenesis [11]. Studies have reported the presence of some common genetic variations in the TGF-b signaling pathway to be related to ovarian carcinogenesis, such as TGFB1: rs56361919 in 23% of ovarian cancer cases [17]. In addition, mutations and/or alterations in the expression of TGF-b receptors and loss of SMAD4 are frequently detected in human ovarian tumors [18].
Given the critical role of the TGF-b pathway in maintaining proper cellular function and the disruption of this pathway in ovarian cancer, it is possible that common genetic variations in this  pathway may affect the risk of ovarian cancer. To our knowledge, no molecular epidemiologic studies have been performed to comprehensively evaluate genetic variants in this pathway with ovarian cancer risk. In this study, we aimed to test the hypothesis that common germline genetic variants in the TGF-b pathway are associated with ovarian cancer risk.

Study population and data collection
The patient population has been described previously [19]. Briefly, 417 newly diagnosed and histologically confirmed ovarian cancer patients with primary malignancy were recruited at The University of Texas MD Anderson Cancer Center. Cases had not received any chemotherapy or radiotherapy prior to recruitment. There were no restrictions on recruitment in terms of age, ethnicity, or clinical stage of disease. The 417 controls were healthy women without prior history of cancer (except nonmelanoma skin cancer) and identified from a large pool of control subjects enrolled in ongoing case-control studies of cancer. Controls subjects were individuals seeing a physician for routine health checkups or addressing health concerns at the Kelsey-Seybold Clinic. Cases and controls were matched by age (65 years) and ethnicity.
Demographic characteristics (age and ethnicity), occupational history, tobacco use history, medical history, family history of cancer, and other epidemiologic data were collected for all patients and controls. For each participant, a blood sample was collected into heparinized tubes for lymphocyte isolation and DNA extraction. For all cases and controls, a written informed consent was obtained prior to participation and the donation of blood samples. The study was approved by Institutional Review Boards of MD Anderson and Kelsey Seybold Clinic.

SNP Selection and Genotyping
The procedures used to select SNPs of the TGF-b pathway have been described previously [20]. Briefly, we compiled data from Gene Ontology (http://www.geneontology.org) and a systematic literature review to refine the gene list in the TGF-b signaling pathway. Tagging SNPs were identified from the HapMap database (http://www.hapmap.org) and selected using LDSelect program (http://droog.gs.washington.edu/ldSelect.html) to separate SNPs into bins on the basis of linkage disequilibrium. Selected tagging SNPs have a r 2 threshold of 0.8, minor allele frequency (MAF) greater than 0.01 in Caucasian population and are located within 10 kb upstream of transcriptional start site and 10 kb downstream of transcriptional end site. Potentially functional SNPs (e.g., coding SNPs and SNPs in untranslated regions, promoters, and splicing sites) were also included. Overall, 218 SNPs in 21 genes of the TGF-b pathway were selected along with SNPs from other cancer-related pathways. Complete set of SNPs was sent to Illumina technical support for custom iSelect, Infinium II BeadChip design using proprietary program developed by Illumina. Genomic DNA was isolated from peripheral blood lymphocytes using the QIAamp DNA Blood Maxi kit (QIAGEN, Valencia, CA). Genotyping followed the standard protocol of Illumina's Infinium iSelect HD Custom Genotyping Beadchip provided by Illumina (San Diego, CA). BeadStudio software was used to call genotypes. All laboratory personnel were blinded to the case-control status of the study subjects.

Statistical Analyses
The distribution of categorical variables and continuous variables between cases and control subjects was compared by Pearson's x 2 test and Student's t test, respectively. For each SNP in this study, we tested Hardy-Weinberg equilibrium using the goodness-of-fit x 2 test to compare the observed with the expected frequency of genotypes in control subjects. For SNP analysis, we tested three different genetic models, dominant model, recessive model and additive model to identify the best-fitting model with the smallest P value. If the percentage of the homozygous variant genotypes was less than five in cases or controls, we only considered the dominant model which has the highest statistical power. Multiple logistic regression analysis was used to estimate the odds ratios (ORs) and 95% confidence intervals (CI) while adjusting for age and ethnicity where appropriate. For internal validation, a bootstrap resampling method was performed 100 times on samples randomly drawn from the original data set and a P values was obtained for the best-fitting model in each bootstrapped sample. Cumulative effects of SNPs were assessed by summing up the putative unfavorable genotypes showing significant association with the risk (P,0.05) in single SNP analysis and then grouped into four categories based on the distribution of ORs. A gene-based analysis was used to explore the associations between genes and ovarian cancer risk using the likelihoodratio test (LRT) as described previously [21]. Classification and regression tree (CART) analysis was used to explore higher-order gene-gene interactions using the Expectation-Haplotype analysis was performed using the maximization algorithm implemented in the HelixTree software (Golden Helix, Bozeman, MT). We also performed 10,000 bootstrap runs to construct 95%CIs for the ORs in cumulative genotype analysis and CART analysis. All statistical analyses were adjusted for age, ethnicity. Statistical analysis was performed using STATA 10.0 (College Station, TX).

Subject characteristics
In this study, there were 417 cases and 417 age-and ethnicitymatched control subjects. The mean age was 60.73 (SD: 10.36) in cases and 60.30 (SD: 10.71) in control subjects (P = 0.554). The majority of the cases (n = 339, 81.29%) and controls (n = 349, 83.69%) were Caucasians. Of the cases, the majority are diagnosed at stage III (66.5%), whose tumors are of the serous subtype (61.3%) ( Table S1).

Association between individual SNP and risk
A total of 218 SNPs from 21 genes in the TGF-b pathway were analyzed (Table S2). Twenty-three SNPs from ten genes showed significant associations with ovarian cancer risk at P,0.05 ( Table 1). Internal validation by bootstrapping method identified 13 SNPs from eight genes showing consistent associations (i.e. P,0.05 in 80 or more among 100 bootstrapped samples). The most significant SNP was SMAD6: rs4147407 with subjects carrying at least one variant allele exhibiting a 1.60-fold increased risk (95%CI, 1.14-2.24). For another SNP in SMAD6, the variant allele of rs4075546 was associated with decreased risk (OR, 0.77; 95%CI, 0.63-0.94, P = 0.0099).

Gene based analysis for ovarian cancer risk
Gene-based analysis identified SMAD6 and TGFB1 (P,0.05 for all SNPs examined in each gene using the dominant or additive model; Table 3) as genes associated with ovarian cancer risk. SMAD6 showed the most significant association (P = .034), suggesting that of the genes examined genetic variations in this gene had the strongest influence on ovarian cancer risk..

Haplotype analysis of SMAD6 SNPs
As multiple SNPs in the SMAD6 gene showed significant associations, we performed haplotype analysis for the 29 SNPs genotyped in SMAD6. Five haplotype blocks were defined by local linkage disequilibrium (LD) according to HaploView [22] (Figure 1, Table 4). The definition of ''blocks'' was described previously by Gabriel et al. [23] We observed significant associations between SMAD6 haplotypes and risk for ovarian cancer in two LD blocks, block 1 in the 59 flanking region and block 2 in intron 5 region (Figure 1 and Table 4). Haplotype H2 of block 1 was composed of SNPs rs11857194-rs1470123-rs2053424, and subjects carrying only one variant allele of rs1470123 showed a significant decrease in association with ovarian cancer risk (OR, 0.72; 95%CI, 0.55-0.95; P = .018) compared with those carrying the most common haplotype of only one variant allele of rs205342. Haplotype H1 of block 2 comprised of SNPs rs16953584-rs7182227, and subjects carrying only one variant allele of rs16953584 showed a significant 36% reduction in risk (OR, 0.64; 95%CI, 0.44-0.92; P = .016) compared with the most common haplotype of two wildtype alleles ( Table 4) Higher-order gene-gene interactions CART analysis was applied to explore the higher-order interactions between the 13 significant SNPs. As shown in Figure 2, the tree model resulted in four terminal nodes with different risks for ovarian cancer. The initial split was defined by INHBC: rs2228225, indicating that this SNP is the primary factor contributing to variations in ovarian cancer risk in the study population. The reference node of the tree structure was composed of INHBC: rs2228225 AA, SMAD6: rs4147407 CC, and BMP2: rs235757 AA+AG genotypes. Subjects in this node showed the lowest risk of ovarian cancer. The high node was composed of INHBC: rs2228225 AA and SMAD6: rs4147407 CT+TT and indicated the highest risk of ovarian cancer, with an OR of 6.33 (95%CI, 2.32-17.28; P = .0003), suggesting that the genetic variant of SMAD6: rs4147407 was a major determinant to switch the trend of lowest risk to highest risk of ovarian cancer (Figure 2 and Table 5).

Discussion
In this study, we systematically evaluated the associations between a comprehensive panel of genetic variants in the TGF-b pathway genes and ovarian cancer risk. Our results suggested that multiple SNPs in the pathway were associated with ovarian cancer risk. In particular, SNPs in SMAD6 showed the most significant associations in single SNP and haplotype analyses. Further, there was a cumulative effect of SNPs in the pathway that conferred a significant dose-response trend with subjects carrying the highest number of unfavorable genotypes exhibiting the greatest risk. Results from CART analysis suggested higher-order gene-gene interactions that further defined high vs. low risk subgroups in the study population.
One of the major findings was the significant association of SMAD6 polymorphisms with ovarian cancer risk. SMAD6 encodes a protein that is localized in both nuclei and cytoplasm [24] and works as an inhibitory Smad induced by BMPs and TGF-b signals for an auto-inhibitory feedback mechanism in the TGF-b pathway [25,26]. The SMAD6 gene is expressed in most human tissues, including the ovary ( Figure S1). Moreover, SMAD6 was reported to be overexpressed in ovarian adenocarcinoma compared to normal ovarian tissue [27], and expression of BMP-2 protein has been shown to induce SMAD6 expression in ovarian cancer cells and was associated with poor prognosis [28]. The function of SMAD6 in tumorigenesis has not been well established. However, mutations in SMAD6 have been reported in human ovarian cancer [29]. Since TGF-b signals may function in potent tumor suppression in normal epithelial cells and in early-stage tumors [11], we speculated that genetic variations in SMAD6 may result in altered gene expression or regulation of signaling function. In this study, four polymorphisms (rs4147407, rs4075546, rs16953584, and rs4776318) in SMAD6 were found to be significantly associated with ovarian cancer risk. Among these polymorphisms, rs4147407 was associated with increased risk, whereas rs4075546, rs16953584, and rs4776318 were associated with decreased risk. Haplotype analysis further identified two candidate loci of SMAD6.
Haplotype blocks located in the 59 flanking region and intron 5 of the SMAD6 gene respectively, were both associated with decreased risk in this study. However, none of these SNPs are located in the coding region of SMAD6, which suggest that these significant SNPs or the identified loci may not directly alter SMAD6 function but may change the level of gene expression through being located in regulatory regions or being linked to other causal SNPs to affect gene activity. Further in vitro and in vivo functional studies are needed to characterize the functional significance of the SMAD6 SNPs identified. CART analysis revealed gene-gene interactions among INHBC, SMAD6, and BMP2. In the tree model, INHBC:rs2228225 was at the initial split, suggesting that this variant functions as the primary risk factor for ovarian cancer. SMAD6: rs4147407 was located in the second level of the tree structure and was shown to interact with the INHBC: rs2228225 to influence cancer risk. Specifically, the variant alleles of rs4147407 were associated with a 6-fold increase in risk along with common allele of INHBC: rs2228225. Indeed, INHBC has been identified as beta C chain of inhibin, a hormone that can regulate cell growth and differentiation [30]. The result of CART analysis further strengthened the crucial role of SMAD6 in influencing the risk of ovarian cancer in the study population.
Our study has some limitations. Chance findings are possible due to small sample size of subgroups. However, we used various statistical methods to control for false positives. For example, we performed bootstrapping analysis for internal validation of the significant SNPs. Other potential limitations include the fact that unmeasured ovarian cancer risk factors in this study (e.g. hormone replacement use) may confound the overall association. Given that  we tested a genetic-driven hypothesis rather than an environmental-driven hypothesis, this limitation may be less of a concern. As with all case-control studies, selection bias may also confound the identified associations. Nevertheless, MD Anderson serves as a referral center for many cancer patients from the Kelsey Seybold Clinics in the Houston metropolitan area; therefore our controls are likely to represent the base population that give rise to cancer cases.
In conclusion, our study is the first study to apply a pathwaybased approach to evaluate germline genetic variations in the TGF-b pathway and their associations with ovarian cancer risk. We have identified 13 polymorphisms in the TGF-b pathway significantly associated with ovarian cancer risk. In particular, SNPs in SMAD6 showed the most significant associations. Our data also suggested a cumulative effect of SNPs in the pathway that jointly influenced ovarian cancer risk, and identified higherorder interactions that further define high vs. low risk subgroups in the study population. Future studies are necessary to characterize functional significance of the genetic variants we have identified, as well as to confirm or externally validate the associations in independent populations. Figure S1 Expression of SMAD6 transcript in human cells and tissues. SMAD6 tissue expression was referenced in the T1Dbase database, a web-based source for genetic and genomic information on Type I Diabetes (www.t1dbase.org). In this resource, human tissue and cell-type specific gene expression data were obtained from the Novartis GNF SymAtlas database. The expression data were generated on Affymetrix HGU133A chip and a custom microarray. (DOC)