Evaluation of significant genome-wide association studies risk — SNPs in young breast cancer patients

Purpose Genome-wide-association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) that are associated with an increased risk of breast cancer. Most of these studies were conducted primarily in postmenopausal breast cancer patients. Therefore, we set out to assess whether or not these breast cancer variants are also associated with an elevated risk of breast cancer in young premenopausal patients. Methods In 451 women of European ancestry who had prospectively enrolled in a longitudinal cohort study for women diagnosed with breast cancer at or under age 40, we genotyped 44 SNPs that were previously associated with breast cancer risk. A control group was comprised of 1142 postmenopausal healthy women from the Nurses’ Health Study (NHS). We assessed if the frequencies of the adequately genotyped SNPs differed significantly (p≤0.05) between the cohort of young breast cancer patients and postmenopausal controls, and then we corrected for multiple testing. Results Genotyping of the controls or cases was inadequate for comparisons between the groups for seven of the 44 SNPs. 9 of the remaining 37 were associated with breast cancer risk in young women with a p-value <0.05: rs10510102, rs1219648, rs13387042, rs1876206, rs2936870, rs2981579, rs3734805, rs3803662 and rs4973768. The directions of these associations were consistent with those in postmenopausal women. However, after correction for multiple testing (Benjamini Hochberg) none of the results remained statistically significant. Conclusion After correction for multiple testing, none of the alleles for postmenopausal breast cancer were clearly associated with risk of premenopausal breast cancer in this relatively small study.

Introduction with breast cancer at or under 40 years of age (NCT01468246). Between November 2006 and December 13, 2012, 1463 women were invited to participate from eleven sites in Massachusetts, one site in Denver, Colorado, and one site in Toronto, Canada. Eligibility requirements included age 40 years or younger and diagnosis with stage 0-4 breast cancer less than six months prior to enrollment.
After 915 patients signed informed consent in person or by mail between 11/1/2006 and 12/13/2012, they received mailed surveys that included questions about sociodemographic information and medical history. In addition, medical record and central pathology review were used to obtain data on tumor stage, grade, ER/PR expression, and Her2/neu (human epidermal growth factor receptor 2) overexpression. Blood samples were collected at enrollment, one year after diagnosis, and four years after diagnosis. For the present study, only one blood sample per patient was genotyped.
Participants with non-invasive breast cancer (n = 32), as well as those with missing participant information (n = 31), were excluded from the study. In total, 451 patients with stage 1-4 breast cancer were eligible for inclusion in our analysis. This study was approved by the Institutional Review Board at Dana-Farber/Harvard Cancer Center as well as at other study sites.

Controls
Comparator genotype data came from 1142 controls who had no history of cancer and were participants in the Nurses' Health Study (NHS). These were the same controls who were used in the "Cancer Genetic Markers of Susceptibility (CGEMS) Project" [15]. All CGEMS data are available at dbGAP (Study accession number phs000147.v3.p1) [29]. Control blood samples were provided between [1989][1990], and only those not diagnosed with breast cancer during follow-up (until June 1, 2004) were included as controls. In the CGEMS project, 528.173 SNPs were genotyped with Illumina HumanHap550 and then imputed to HapMap 2. All control data were genotyped directly (n = 22) or imputed in our data (n = 15) ( Table 1). All controls self-reported as postmenopausal and Caucasian (Southern European/Mediterranean, Scandinavian, other Caucasian); their inferred ancestry through genetic markers was consistent with this self-report.

DNA-Extraction
DNA was extracted from patient whole blood samples using a Qiagen DNA extraction kit (QIAamp DNA Blood Mini kit) according to manufacturer instructions on a Qiacube instrument at the Dana-Farber Cancer Institute Breast Cancer SPORE CORE Laboratory.

Genotyping
All DNA samples were genotyped using the Sequenom platform. Sequenom probes and primers were purchased via Mass Array Typer 4.0.20 and MySequenom (www.mysequenom.com, Sequenom Inc). Quality control standards were followed for genotyping: 10% of the samples were genotyped twice and showed a concordance rate of 99.9%. Three SNPs were excluded from the analysis due to call rates of <95% and were replaced by proxies [proxy rs11041665 to rs3817198); proxy rs62391594 to rs6556756; proxy rs11628293 to rs999737].
Seven additional SNPs had to be excluded: rs418470 (proxy to rs1926657) and rs62391594 (proxy to rs6556756) because of missing control data; rs930395 (proxy to rs7716600) due to technical problems; rs1154865 because it had strand-ambiguous alleles (C/G); and rs10995194, rs11041665, rs2269336 due to low imputation quality score (r-sq<0.95) in the CGEMS control data. At three loci, SNPs were in Linkage Disequilibrium (LD) with each other (R2>0.9; R2>0.8; R2>0.68) and are highlighted in Table 2. The final data consisted of 37 SNPs tested in 451 cases and 1142 controls. We limited the analysis to European-ancestry patients (n = 451). All SNPs (cases and controls) were in Hardy-Weinberg equilibrium (p-value < = 0.01).

Statistical analysis
For comparison of genotype frequencies, a Chi-Square Test was used. Statistical analyses were performed using R-program (R 3.0.1 GUI, The R Foundation for statistical computing, S. Urbanek & H.-J. Bibiko). All results from the analysis with a p-value <0.05 were considered statistically significant. Correction for multiple testing was performed using the Benjamini-Hochberg method. The threshold for statistical significance based on the multiple tests performed is FDR<0.1. The chi-square test was able to detect a minimal difference in allele frequency of 8.3% with 80% power. The case-control comparison was allele-based. The association is evaluated using multivariate logistic regression, which correspond to the "additive model (odds ratio associated with per allele increase)". We assessed three tumor biologyderived subgroups in our analysis: 1) patients with ER-positive, PR-positive and Her2-negative breast cancers (n = 193); 2) patients with Her2-positive breast cancers (n = 134); and 3) patients with ER-negative, PR-negative and Her2-negative (triple negative) breast cancers (n = 90). The 34 patients who had other types of tumors (20 with ER-positive, PR-negative, Her2-negative tumors, 6 with ER-negative, PR-positive, Her2-negative, and 8 with missing receptor information) were not included in any subgroup analysis. We also performed subgroup analyses focusing on patients with a known deleterious BRCA mutation (BRCA+; BRCA1 n = 27, BRCA2 n = 11)) and patients known not to carry a deleterious BRCA mutation (BRCA-) (n = 223). Nine patients were known to have an unclassified variant in one of the BRCA genes, 79 patients were not tested, and for 102 the BRCA-status data were unknown. Other subpopulations were described by age [�25yrs (n = 10), 26-30yrs (n = 42), 31-35yrs (n = 113) and 36-40yrs (n = 286)]. We compared each subgroup to the same large control population (n = 1142).

Clinical and pathological data
DNA samples from 451 breast cancer patients with a median age at diagnosis of 37 years were genotyped. Please see Table 3 for a summary of the clinical and pathological characteristics of the patient cohort. In the control population of participants from the Nurses Health Study, the median age at the time of DNA collection was 66 years (mean: 65.7 years, range: 44-83 years).

SNP frequencies
The evaluation of 37 GWAS-SNPs revealed nine variants that differed significantly between the whole cohort of young breast cancer patients and postmenopausal controls: rs10510102, rs1219648, rs13387042, rs1876206, rs2936870 (proxy for rs2981575), rs2981579, rs3734805, rs3803662 and rs4973768. The directions of these associations were consistent with those in postmenopausal women.
In subgroup analyses, rs13387042 and rs2936870 were only associated with ER-positive, PR-positive, Her2-negative cancers, while rs10510102, rs2981579, rs3734805 were only associated with Her2-positive breast cancers. rs4973768 was associated with both ER-positive, PRpositive, Her2-negative breast cancers and Her2-positive breast cancers. Three SNPs did not appear to be associated with premenopausal breast cancer in any of the subgroups. Results from SNP analyses in the whole cohort as well as in the subgroups are shown in Tables 2 and 4  (Table 2 and Table 4).
After correction for multiple testing by Benjamini-Hochberg, none of the SNPs were found to be statistically significantly associated with breast cancer risk.

Discussion
In order to elucidate if the allelic architecture in young women with breast cancer is similar to that in post-menopausal breast cancer, we tested the frequency of previously identified postmenopausal breast cancer risk-associated SNPs in a large cohort of young breast cancer patients. When our initially planned p value threshold of <0.05 was used, nine SNPs (rs10510102, rs1219648, rs13387042, rs1876206, rs2981579, rs3734805, rs3803662, rs4973768,

BRCA-Testing
(Continued ) proxy rs2936870) associated with breast cancer in postmenopausal women [11,12,16,20,22,23] also appeared to be associated with breast cancer in young women, in the same direction as that observed in postmenopausal women. However, after correction for multiple testing (Benjamini Hochberg) none of the results remained statistically significant.
In the current study, the strongest association with breast cancer in young women was found for rs4973768. Importantly, this SNP was significantly associated with breast cancer in the overall patient cohort as well as in smaller subgroups based on tumor subtype (ER-positive/PR-positive/Her2-negative and Her2-positive breast cancers), BRCA status (BRCA negative), and age (31-40yrs).
rs4973768 lies in the untranslated region (3'UTR) of the sodium bicarbonate (Na+HCO3-) cotransporter NBCn1 (SLC4A7) [30]. The Nashville Breast Health Study evaluated this SNP in 1511 cases with a mean age of 53.3 years (range 25-75 years) and identified an association with ER-positive breast cancer [31]. Further, a significant association of SNP rs4973768 with increased breast cancer risk was replicated among different ethnicities in mixed age patients [32,33]. Andersen et al. analyzed whether different SNPs previously identified in GWAS interact with one another and with reproductive and menstrual risk factors in association with breast cancer risk. Including over 1400 European-ancestry women with a median age of 54.5 years, this study confirmed the association of rs4973768 with breast cancer; however, modifications of menstrual and reproductive risk factors associations with breast cancer risk by a polygenic score were not observed [33]. In addition to that, rs4973768 was significantly associated with breast cancer in 477 Chinese, thereby further corroborating the association of rs4973768 in different ethnicities [32]. In contrast to our study, Antoniou and co-workers found an association of rs4973768 with an increased breast cancer risk in BRCA2 carriers [34], but we likely did not have enough BRCA2 carriers in our study (n = 11) to find such a link.
rs10510102, rs2981579 and rs3734805 were significantly associated with premenopausal breast cancer in the overall group and in the subgroup with Her2-positive breast cancers. Prior studies have described associations of rs10510102 [12] and rs2981579 (FGFR2) [35][36][37] with breast cancer risk, but not specifically with Her2-positive breast cancers. FGFR2 (Fibroblast growth factor receptor 2) is a member of the tyrosine kinase gene superfamily [38] and is involved cell growth, invasiveness, motility and angiogenesis [38]. Both rs13387042 and rs2936870 (proxy for rs2981575) were significantly associated with risk of premenopausal breast cancer in the overall group and in the Her2-negative subgroup.
rs3803662 on 16q12, located close to TOX3 (TNRC9, CAGF9) and LOC643714 [39], has been identified as a breast cancer risk allele in various GWAS [11,16,[22][23][24] and linked to both ER-positive [22,[40][41][42][43] and ER-negative primarily postmenopausal breast cancer [44]. Previous studies suggest that the rs3803662 risk allele may most strongly increase the likelihood of luminal A tumors in postmenopausal breast cancer, and that expression levels of TOX3 and/or LOC643714 might influence the progression of breast cancer [39]. Within the current study of younger women, we now report an association of rs3803662 with breast cancer risk in the whole premenopausal patient cohort as well as in the subgroup of patients 36-40yrs, but not in the other subgroups. This is consistent with the prior findings of Tapper et al. [45]  Although the directions of these associations in our cohort were consistent with those observed in postmenopausal women, it has to be emphasized that none of the above mentioned SNPs remained significant after correction for multiple testing, perhaps due to our relatively small sample size. A larger study is needed to more definitively assess the relevance of these SNPs as causal factors in premenopausal breast cancer.
In addition to this limited power, which may have contributed to falsely negative results, our conclusions are limited by the fact that we did not evaluate all of the breast cancer predisposing variants that have now been discovered. For example, Michailidou et al. [19] identified SNPs at 41 new breast cancer susceptibility loci at genome-wide significance in 2013. In cases diagnosed at young age (<40 years), two loci rs2588809 at 14q24.1 (P = 0.001) and rs941764 at 14q32.12 (P = 0.007)) showed higher per-allele ORs. Both SNPs were newly published since we began our work, and therefore not included in our study. The newest breast cancer GWAS was recently published by Michailidou et al. [46].
In addition, earlier this year, Shi et al. [35] used a family-based design to analyze the relationship between breast cancer before age 50 and 77 GWAS-identified risk SNPs. They found 4 SNPs associated with a higher breast cancer risk, two of which, rs3803662 in TOX3 and rs2981579-A (FGFR2), are consistent with our findings. In our study, one of the SNPs identified to be important by Shi and colleagues, rs999737, had to be excluded due to call rates of <95%. It was replaced by its proxy, rs11628293, which did not appear to be significantly associated with risk of premenopausal breast cancer in our study. We did not assess the fourth SNP found by Shi et al. (rs12662670) because our analysis finished prior to the publication of their work.
A third limitation of this study is that different genotyping platforms were used for controls and patients, potentially introducing bias. In addition, it is possible that allele frequency differences between the control cohort and the premenopausal breast cancer cohort may have been related to age differences between the cohorts (i.e., if a gene predisposes to longevity for reasons other than a reduced risk of breast cancer, we might see a difference in the frequency of a SNP in or around that gene between the cases and controls in our study).
In conclusion, we found that nine SNPs previously associated with postmenopausal breast cancer risk might be associated with breast cancer risk in premenopausal women to some degree, but none of the results remained statistically significant after correction for multiple testing. This adds to the relatively scarce literature evaluating genetic predisposition to youngonset breast cancer [19,35,45,[47][48][49][50]. Future functional genomics analyses may help us better understand the causes of premenopausal breast cancer, and it will also be important to investigate potential interactions between genetic and environmental risk factors.