MicroRNA Related Polymorphisms and Breast Cancer Risk

Genetic variations, such as single nucleotide polymorphisms (SNPs) in microRNAs (miRNA) or in the miRNA binding sites may affect the miRNA dependent gene expression regulation, which has been implicated in various cancers, including breast cancer, and may alter individual susceptibility to cancer. We investigated associations between miRNA related SNPs and breast cancer risk. First we evaluated 2,196 SNPs in a case-control study combining nine genome wide association studies (GWAS). Second, we further investigated 42 SNPs with suggestive evidence for association using 41,785 cases and 41,880 controls from 41 studies included in the Breast Cancer Association Consortium (BCAC). Combining the GWAS and BCAC data within a meta-analysis, we estimated main effects on breast cancer risk as well as risks for estrogen receptor (ER) and age defined subgroups. Five miRNA binding site SNPs associated significantly with breast cancer risk: rs1045494 (odds ratio (OR) 0.92; 95% confidence interval (CI): 0.88–0.96), rs1052532 (OR 0.97; 95% CI: 0.95–0.99), rs10719 (OR 0.97; 95% CI: 0.94–0.99), rs4687554 (OR 0.97; 95% CI: 0.95–0.99, and rs3134615 (OR 1.03; 95% CI: 1.01–1.05) located in the 3′ UTR of CASP8, HDDC3, DROSHA, MUSTN1, and MYCL1, respectively. DROSHA belongs to miRNA machinery genes and has a central role in initial miRNA processing. The remaining genes are involved in different molecular functions, including apoptosis and gene expression regulation. Further studies are warranted to elucidate whether the miRNA binding site SNPs are the causative variants for the observed risk effects.

supported by the German Cancer Aid (grant no. 107352). The GENICA was funded by the Federal Ministry of Education and Research (BMBF) Germany grants 01KW9975/5, 01KW9976/8, 01KW9977/0 and 01KW0114, the Robert Bosch Foundation, Stuttgart, Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, German Social Accident Insurance, Institute of the Ruhr University Bochum (IPA), Germany, as well as the Department of Internal Medicine, Evangelische Kliniken Bonn gGmbH, Johanniter Krankenhaus, Bonn, Germany. The KBCP was financially supported by the special Government Funding (EVO) of Kuopio University Hospital grants, Cancer Fund of North Savo, the Finnish Cancer Organizations, and by the strategic funding of the University of Eastern Finland. kConFab is supported by a grant from the National Breast Cancer Foundation, and previously by the National Health and Medical Research Council (NHMRC), the Queensland Cancer Fund, the Cancer Councils of New South Wales, Victoria, Tasmania and South Australia, and the Cancer Foundation of Western Australia. The kConFab Clinical Follow Up Study was funded by the NHMRC [145684,288704,454508]. Financial support for the AOCS was provided by the United States Army Medical Research and Materiel Command [DAMD17-01-1-0729], the Cancer Council of Tasmania and Cancer Foundation of Western Australia and the NHMRC [199600]. GCT is supported by the NHMRC. LMBC is supported by the 'Stichting tegen Kanker' (232-2008 and 196-2010). Diether Lambrechts is supported by the FWO and the KULPFV/10/016-SymBioSysII. The MARIE study was supported by the Deutsche Krebshilfe e.V. [70-2892-BR I], the Hamburg CancerSociety, the German Cancer Research Center and the genotype work in part by the Federal Ministry of Education and Research (BMBF) Germany [01KH0402]. MBCSG is supported by grants from the Italian Association for Cancer Research (AIRC) and by funds from the Italian citizens who allocated the 5/1000 share of their tax payment in support of the Fondazione IRCCS Istituto Nazionale Tumori, according to Italian laws (INT-Institutional strategic projects ''561000''). The

Introduction
Breast cancer is the most common women's cancer and is a leading cause of cancer mortality [1]. Inherited genetic variation has been associated with the initiation, development and progression of breast cancer. Studies on twins have suggested that hereditary predisposing factors are involved in up to one third of all breast cancers [2]. Many genetic loci have been associated with breast cancer risk and collectively explain approximately 35% of the familial risk [3,4]. The largest genetic association study of breast cancer to date identified 41 novel low penetrance susceptibility loci [4] by selecting nearly 30,000 SNPs from a meta-analysis of nine genome-wide association (GWA) studies and genotyping them using 41,785 cases and 41,880 controls of European ancestry from studies in the Breast Cancer Association Consortium (BCAC). These 41 susceptibility loci probably represent the tip of the ice berg, and additional SNPs from the combined GWAS might explain a similar fraction of familial risk to that attributed to the already identified loci [4].
Mature miRNAs are 20-23 nucleotide, single-stranded RNA molecules that play a crucial role in gene expression regulation for many cellular processes including differentiation potential and development pattern. MiRNAs undergo a stepwise maturation process involving an array of miRNA machinery components. Drosha and DGCR8 mediate the cleavage of long primary miRNA transcripts (pri-miRNAs) into shorter pre-miRNAs in the nucleus [5,6]. The pre-miRNAs are then transported to the cytoplasm where they are further cleaved by Dicer to produce mature miRNAs [7]. MiRNAs interact by pairing with the 39 untranslated region (UTR), and also within the coding region and 59 UTR of the corresponding mRNAs leading to mRNA destabilization, cleavage or translation repression. More effective mRNA destabilization is achieved when miRNA targets the 3'UTR rather than other mRNA regions [8][9][10]. An individual miRNA may regulate approximately 100 distinct mRNAs, and together more than 1000 human miRNAs are believed to modulate more than half of the mRNA species encoded in the genome [11,12]. Additionally, most mRNAs possess binding sites for miRNAs [13]. MiRNAs are involved in tumorigenesis in that they can be either oncogenic when tumor suppressor genes are targeted, or genomic guardians (tumour suppressor miRNAs) when oncogenes are targeted [14]. Additionally it has been suggested that they may modulate both metastasis [15] and chemotherapy resistance [16]. MiRNAs have also been shown to have altered expression levels in tumours compared to normal tissue and between tumor subtypes in breast cancer among other carcinoma types [17][18][19]. SNPs may affect miRNA machinery genes or miRNAs activity; however SNPs can also create, abolish or modify miRNA binding sites in their binding regions. Polymorphisms in miRNA binding sites have been studied in regard to the risk of several cancers [20], including breast cancer [21][22][23]. These studies have found evidence for association of miRNA related SNPs and cancer risk, but the study sample sizes have been relatively small.
In this study, we investigate associations between miRNArelated polymorphisms and breast cancer risk by using a meta-analysis of nine GWAS and subsequent genotyping of top hits using 41,785 cases and 41,880 controls of European ancestry from the BCAC. To our knowledge, this is thus far the largest investigation of associations between miRNA-related polymorphisms and breast cancer susceptibility.

SNP selection and genotyping
SNPs in mature or pre-miRNAs, in genes of the miRNA machinery and in 3'UTR regions of protein coding genes with a potential effect on miRNA binding were systematically searched from Ensembl (hg18/build36) and Patrocles databases [24]. Additionally, tagging SNPs for such with r 2 $0.8 were also identified utilizing the public HapMap SNP database. By this in silico approach we identified altogether 147,801 candidate SNPs and 12,550 tagging SNPs. These SNPs were then overlayed with those from the combined GWAS from the BCAC [4] and altogether 2196 SNPs were present (either genotyped or imputed) in the combined GWAS. These SNPs were genotyped with Illumina or Affymetrix arrays, as described previously [25][26][27][28][29][30][31][32]. The combined GWAS data were imputed for all scans using HapMap version 2 CEU as a reference in similar fashion to that presented by Michailidou and colleagues [4] with the exception that the HapMap version 2 release 21 was used at the time the overlay was performed. Analysis using a 1-degree-of-freedom trend test of these 2196 SNPs in the combined GWAS indicated some evidence of association with breast cancer risk for 44 SNPs (p,0.09). Notably, the combined GWAS included imputed data generated using HapMap version 2 release 21 (based on NCBI build 35 (dbSNP b125)), whereas the results presented here for the combined GWAS are based on imputation using HapMap version 2 release 22 (based on NCBI build 36 (dbSNP b126)). In the release 22, a number of SNPs were excluded due to mapping inconsistencies in build 35 relative to build 36. Hence, the estimates from the combined GWAS may slightly differ from the initial association analysis. The 44 SNPs (including 30 candidate and 14 tagging SNP) were genotyped on additional samples in the BCAC using the custom Illumina Infinium array (iCOGS) which included a total of 211,155 SNPs as described previously. The detailed description of quality control process for combined GWAS and iCOGS genotyping data was presented in [4].
Of the 42 SNPs that passed quality control [4], two were located in miRNA genes (one candidate SNP located in pre-miRNA hsa-miR-2110 and one tag SNP tagging a mature hsamir-548l variant), and four SNPs were located in miRNA machinery genes (SMAD5, SND1, CNOT4 and DROSHA). The genotyped DROSHA SNP tags the 39 UTR miRNA binding site variant in the DROSHA gene. The remaining 38 candidate or tag SNPs were located in, or tagged to a predicted miRNA binding site in the 39 UTR of protein coding genes. All 42 SNPs are described in Table 1. The workflow of the SNP selection in different stages is illustrated in Figure 1.

Study sample
The combined GWAS included nine breast cancer studies totalling 10,052 cases and 12,575 controls of European ethnic background. Details and study-specific subject numbers are presented in Table S1. Since the GWAS were limited to patients of European ethnic background we further utilized 41,785 cases ascertained for their first primary, invasive breast cancer and 41,880 controls of European ancestry from 41 BCAC studies genotyped using the iCOGS array (Table S2). For a subgroup analysis of ER negative and ER positive cases, as well as cases aged less than 50 years at diagnosis, we included all the cases for which the respective data were available. The ER subgroup analysis was based on 702 ER negative cases and 2,019 ER positive cases from five GWAS studies and 7,200 ER negative cases from 40 BCAC studies and 26,302 ER positive cases from 34 BCAC studies. The analysis of cases aged less than 50 years at diagnosis was based on 3,470 cases from three GWAS studies and 9,483 cases from 35 BCAC studies. All participating studies conform to the Declaration of Helsinki and were approved by the respective ethical review boards and ethics committees (Tables S1 and S2), and all participants in these studies had provided written consent for the research.

Statistical methods
We used logistic regression to estimate per-allele log-odds ratios and standard errors including the study as a covariate. We also included principal components as covariates in order to correct for potential hidden population structure. In the GWAS, for two studies (UK2 and HEBCS) the estimates were adjusted for the first three principal components and in the iCOGS analysis we used the first six principal components and an additional component to reduce inflation for the LMBC study, as described previously [4]. Subgroup analyses were carried out for ER negative and positive subgroups and for the group aged less than 50 years at diagnosis. For meta-analysis, we combined the estimates from the combined GWAS and iCOGS with a fixed effects model using the inverse variance weighted method. In the meta-analysis, the subjects involved in both combined GWAS and iCOGS (1880) were only taken into account once. In order to adust for P-values against multiple testing, we used Benjamini Hochberg correction. The adjusted P-values are shown in Table 2 along with the nominal Pvalues. In the text we report the nominal P-values. The statistical analyses were conducted using the R 2.14.0 statistical computing environment (http://www.r-project.org/).

Results
For the 42 SNPs we successfully genotyped, estimates of association from the combined GWAS and from iCOGS analysis are shown in Table S3. Twenty-one SNPs showed consistent    Table 2). SNP rs1045494 is tagging the hsa-miR-938 binding site SNP rs1045487 (r 2 = 1.0) of CASP8 and the SNP rs1052532 in HDDC3 is predicted to abolish the binding site for hsa-miR-1224-3p. The SNP rs10719 is predicted to abolish the hsa-miR-1298 binding site in the 39 UTR of DROSHA. SNP rs4687554 tags the hsa-miR-891b binding site SNP rs6445538 (r 2 = 1.0) of MUSTN1 and rs3134615 is located at the binding site of hsa-miR-1827 of MYCL1. There was no evidence for heterogeneity in the per-allele OR for any SNP. The per study per allele ORs for these five miRNA binding site SNPs from the combined GWAS along with per-SNP heterogeneity variance P-values are shown in Figure S1 and from the iCOGS in Figure S2. Next we analysed the SNPs by ER status-defined subtype, and for cases aged less than 50 years at diagnosis, for risk associations in the meta-analysis of combined GWAS and iCOGS (Tables S4, S5 and S6). These analyses did not reveal any additional significant results. For rs1045494 in CASP8, rs4687554 in MUSTN1 and rs3134615 in MYCL1 (OR 1.03 [95%CI 1.01-1.05]; P = 7.75610 24 ) a more significant association with breast cancer risk was found for the ER positive subgroup than in the main analysis, but the result from the test for heterogeneity by ER status was not significant (data not shown). All associations were estimated using an additive inheritance model. Dominant and recessive models did not improve the estimates (data not shown).

Discussion
We investigated associations between genetic variation in miRNAs, in the genes of the miRNA machinery and in the miRNA binding sites and the risk of breast cancer. We identified several SNPs that are predicted to abolish an miRNA binding site and that are significantly associated with breast cancer risk. Previous studies investigating miRNA related SNPs, especially in miRNA binding sites have included predefined sets of genes. Nicoloso and colleagues investigated 38 previously identified breast cancer risk SNPs and found two to modify miRNA binding sites in TGFB1 and XRCC1 in vitro [23]. Neither of these were included in our data set. Liang and colleagues investigated 134 potential miRNA binding sites in cancer-related genes and found six miRNA binding site SNPs that were associated with ovarian cancer risk [34].  In the meta-analysis of combined GWAS and iCOGS for main effects, for four of the five most significant miRNA binding site SNPs, the minor allele was associated with a decreased breast cancer risk. The minor allele of SNP rs3134615 in 39 UTR of MYCL1 was associated with an increased breast cancer risk. All the five most significant miRNA binding site SNPs locate in 39 UTR and have been predicted to abolish the miRNA binding site. The defect in miRNA-mediated regulation would be expected to lead to an increase in the translation of the corresponding encoded protein. The five genes, whose regulation may be affected by the miRNA-associated SNPs, include the pre-apoptotic gene CASP8, HDDC3, miRNA biogenesis master regulator DROSHA, MYCfamily member MYCL1 and MUSTN1. CASP8 is involved in apoptosis in breast cancer cells [35], and many studies have reported polymorphisms in this gene to be associated with risks for several cancers [36,37] including breast cancer [38,39], indicating the importance of CASP8 in tumor development. SNP rs1045494 studied here is located close to the coding region SNP rs1045485 that has been previously shown to have a stronger protective effect [38,40,41]. Interestingly, Michalidou and colleagues reported this SNP as having only weak evidence for an association (P 0.0013 in combined GWAS and iCOGS) [4], but these two SNPs (rs1045485 and rs1045494) are not correlated (r 2 = 0.001 in Caucasian population). Neither is rs1045494 correlated with the more strongly associated rs1830298 SNP, identified through finemapping of the region (r 2 = 0.02) [42]. Rs1045494 tags SNP rs1045487 (r 2 = 1.0) which is predicted to abolish the hsa-miR-938 binding site and thus may affect CASP8 expression. There is very little reported evidence on the involvement of HDDC3 or the hsa-miR-1224-3p in cancer, indicating a novel association with risk. HDDC3 has been suggested to be involved in the starvation response [43]. The HDDC3 gene is expressed at higher levels by several different tumor types, including breast tumors, than by normal tissue [44]. DROSHA is a miRNA master regulator. It is a member of the RNase III enzyme family, belongs to the miRNA biogenesis pathway and is the core nuclease that processes pri-miRNAs into pre-miRNAs in the nucleus [5,6]. The SNP rs10719 in the 39 UTR of DROSHA is predicted to abolish the hsa-miR-1298 binding site. Hsa-miR-1298 is predicted to target DROSHA by the Patrocles prediction as well as by TargetScan [45] and PITA [46] prediction algorithms. Recently a small Korean study reported another SNP rs644236, tagging the SNP rs10719 (r 2 = 0.955 in CEU population and r 2 = 0.876 in Asian population (combined CHB and JPT)) to be associated with elevated breast cancer risk [47]. When taking into account the opposite major and minors alleles in the Asian and European populations for SNPs rs644236 and rs10719, this result is in concordance with our results where both the combined GWAS as well as the iCOGS analysis consistently indicated an association of the minor allele of SNP rs10719 with reduced breast cancer risk. We also found the minor allele of SNP rs3134615 in the 39 UTR of MYCL1 to be associated with an increased risk. MYCL1 (L-MYC) belongs to the same family of transcription factors as the known proto-oncogene MYC (C-MYC) and they share a high degree of structural similarity [48]. The MYCL1 gene has previously been reported to be amplified and overexpressed in ovarian cancer [49]. A casecontrol study by Xiong and colleagues reported SNP rs3134615 to be significantly associated with increased risk of small cell lung cancer [50]. SNP rs3134615 was predicted by Patrocles to abolish the hsa-miR-1827 binding site. This has also been suggested by functional studies where MYCL1 was found as the target of hsa-miR-1827 and the SNP rs3134615 was also found to increase MYCL1 expression [50]. The evidence from functional studies is consistent with our finding that SNP rs3134615 might increase breast cancer risk. MUSTN1 has been shown to be involved in the development and regeneration of the musculoskeletal system [51]. Thus far no evidence of association between MUSTN1 and breast cancer has been reported, but the MUSTN1 gene is expressed in the mammary glands [52].
Since only a small fraction of miRNA binding sites has been experimentally validated, we selected SNPs that had been computationally predicted to affect miRNA binding sites. For our original SNP selection we used the Patrocles database that contains predicted miRNA binding sites and also compiles perturbation prediction of SNP effects. There are a multitude of prediction programs and their performance has been evaluated [53]. Witkos and colleagues find target prediction algorithms that utilize orthologous sequence alignment, like Patrocles, to be the most reliable.
The followup of the 42 miRNA related SNPs identified five significant associations with breast cancer risk. Although the individual risk effects were subtle, considering that we could only investigate a small proportion of our initial in silico data set of miRNA related SNPs (over 140,000 SNPs) this may suggest that genetic polymorphisms affecting the miRNA regulation could have a considerable combined effect on breast cancer risk.
It should be noted that, until fine mapping studies are carried out for these loci, it is not clear whether these miRNA-related SNPs are the variants responsible for the observed associations.
This comprehensive analysis of miRNA related polymorphisms using a large two stage study of women with European ancestry provides evidence for miRNA related SNPs being potential modulators of breast cancer risk. Figure S1 Forest plots for the five most significant miRNA binding site SNPs from the combined GWAS. Squares indicate the estimated per-allele OR for the minor allele in Europeans. The horizontal lines indicate 95% confidence limits. The vertical blue dashed lines indicate clipping of the confidence intervals for presentation purpose. The area of the square is inversely proportional to the variance of the estimate. The diamond indicates the estimated per-allele OR from the combined analysis. (PDF) Figure S2 Forest plots for the five most significant miRNA binding site SNPs from the iCOGS. Squares indicate the estimated per-allele OR for the minor allele in Europeans. The horizontal lines indicate 95% confidence limits. The vertical blue dashed lines indicate clipping of the confidence intervals for presentation purpose. The area of the square is inversely proportional to the variance of the estimate. The diamond indicates the estimated per-allele OR from the combined analysis. (PDF)