Introduction

Endometrial cancer (EC), a neoplasm of the uterine epithelial lining, is the most common gynecological malignancy in developed countries and the fourth most common cancer among US women (www.cancer.org 2013). This disease primarily affects postmenopausal women and is more common in women of European ancestry. In the USA in 2013, an estimated 49,560 women may develop EC and 8,190 may die from the disease, a case fatality similar to that of breast cancer. The estimated lifetime risk of women developing the disease in the USA is 1 in 38 (www.cancer.org 2013). EC is categorized into two distinct subtypes based on histologic and clinical characteristics. Type I ECs, the most common in women of European ancestry (80–90 %), are mostly endometrioid adenocarcinomas (EA). The remaining 10–20 % of ECs are Type II, which predominantly consist of serous and clear cell carcinomas.

EC risk is strongly increased by a Western lifestyle, with up to tenfold higher incidence rates in Western, industrialized countries than in Asia or rural Africa (Pisani et al. 1993). Major risk factors include obesity and use of postmenopausal estrogen-only hormone therapy (ET). Excess body weight has been associated with a two to fivefold increase in EC risk in both pre- and postmenopausal women, and has been estimated to account for about 40–50 % of EC incidence in affluent societies (Bergstrom et al. 2001). Epidemiological evidence also suggests increased risks in association with early age of menarche, late age of menopause, nulliparity and infertility. Furthermore, women with a family history of EC have their risk increased by nearly twofold (Gruber and Thompson 1996; Lucenteforte et al. 2009) and an even greater risk in rare family cancer syndromes such as Lynch syndrome (also termed hereditary nonpolyposis colorectal cancer, HNPCC) (Papadopoulos et al. 1994; Nicolaides et al. 1994; Risinger et al. 1993), suggesting that inherited genetic factors increase susceptibility to EC. Though these studies support an inherited genetic component to risk (Vasen et al. 1994; Schildkraut et al. 1989; Gruber and Thompson 1996; Seger et al. 2011), twin studies suggest that the familial aggregation in risk may be mostly due to shared environmental factors and not shared genetics (Lichtenstein et al. 2000).

The predominant mechanistic hypothesis describing Type I endometrial carcinogenesis is known as the “unopposed estrogen” hypothesis (Key and Pike 1988). This theory states that EC risk is increased among women who have high circulating levels of bioavailable estrogens and low levels of progesterone, so that the mitogenic effect of estrogens is insufficiently counterbalanced by the opposing effect of progesterone. The unopposed estrogen hypothesis is supported by observations that the use of ET (Herrinton and Weiss 1993; Persson et al. 1989) and of Oracon (a sequential oral contraceptive (OC) characterized by an unusually high ratio of estrogenic to progestogenic activity) (Weiss and Sayvetz 1980) greatly increase EC risk, while use of combined OCs (i.e., containing progestins as well as estrogen throughout the treatment period) is associated with a reduced risk (Henderson et al. 1983). A further observation that led to the unopposed estrogen hypothesis is that mitotic rates of endometrial tissue are higher during the follicular phase of the menstrual cycle, when progesterone levels are low and the uterine lining undergoes proliferation, than during the luteal phase (Ferenczy et al. 1979). Progesterone counteracts the growth-stimulatory effects of estrogen by inducing glandular and stromal differentiation (Clarke and Sutherland 1990; Ace and Okulicz 1995) and endometrial hyperplasia can be reversed by progestin therapy (Ehrlich et al. 1981). Many of the genes in the sex steroid hormone metabolism pathway have served as “candidates” in search of polymorphic variants that predispose to EC. Although some studies suggest that SNPs in these genes, for example, the CYP19A1 (aromatase) gene, are associated with EC risk (Setiawan et al. 2009), very little of the genetic risk can be explained by these SNPs.

To this end, efforts have been undertaken to identify genes involved in EC causation. Recently, two genome-wide association studies (GWAS) of EC have been conducted (Spurdle et al. 2011; Long et al. 2012). However, only one study identified a novel genome-wide significant association (P = 7.1 × 10−10) with a susceptibility marker located at 17q12 (rs4430796), near the HNF1 homeobox B (HNF1B) gene, in relation to EC. Though originally identified in women of European ancestry, this locus has been replicated in other ethnicities (Setiawan et al. 2012). This marker has also been associated with prostate cancer (Thomas et al. 2008), diabetes (Winckler et al. 2007; Gudmundsson et al. 2007) and certain subtypes of ovarian cancer (Shen et al. 2013). In search of additional common genetic variants, we conducted a two-stage GWAS of EC among women participating in studies that are part of the Epidemiology of Endometrial Cancer Consortium (E2C2, details in Supplementary Table 1).

Results

We conducted a GWAS within the E2C2 to identify genetic loci that predispose to EC. Details on the 15 participating studies are provided in Supplementary Table 1. The discovery phase of the GWAS (Stage 1) was conducted among women of European ancestry and was restricted to Type 1 EC, the most common subtype accounting for 80–90 % of all cases in women of European descent. Seven participating studies, including four cohort [California Teacher’s Study (CTS), Nurses’ Health Study (NHS), Multiethnic Cohort (MEC), Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO)] and three case–control studies [Connecticut Endometrial Cancer (CONN), Fred Hutchinson Cancer Research Center (FHCRC), Polish Endometrial Cancer Study (PECS)], were genotyped in Stage 1 (2,695 cases, 2,777 controls). Study-specific population characteristics are summarized in Table 1. The mean age at diagnosis for cases in Stage 1 ranged from 59.6 in FHCRC to 67.7 in PLCO.

Table 1 Studies participating in the genome-wide association study (GWAS)

After quality control metrics were applied (see methods), over 524K-genotyped SNPs remained in each study for a combined total of unique 873K SNPs for analysis. The genomic control lambda for the study was 1.008, indicating little evidence of population substructure, relatedness or differential genotyping between cases and controls (Fig. 1). No SNP association reached genome-wide significance (P < 5 × 10−8) (Fig. 2). In particular, we did not replicate rs1202524 (P = 0.39), a reported EC susceptibility locus in Asian women (Long et al. 2012), in our Stage 1 population of women of European ancestry.

Fig. 1
figure 1

Log quantile–quantile (Q–Q) plot. The observed –log10 P values (Y-axis) of 873,935 SNPs from a meta-analysis of seven studies included in the discovery phase of the endometrial cancer GWAS adjusted for the principal components of genetic variation plotted against the expected –log10 quantile (X-axis). The genomic control lambda is 1.008. Imputed P values are represented by the dashed line

Fig. 2
figure 2

Manhattan plot of the association results. The –log10 P values from the meta-analysis of seven studies in the discovery phase of the endometrial cancer GWAS adjusted for principal components of genetic variation plotted against chromosomal base pair position. Chromosomes are color coded

Among SNPs associated with the smallest ranked P values, rs9344 and rs1352075 at the 11q13.3 locus caught our attention because of significant associations between this locus and breast cancer (Turnbull et al. 2010) and renal cancer (Purdue et al. 2011) in prior GWAS. Thus, we initially pursued a fast-track replication for seven SNPs independently associated with EC (r 2 < 0.2) at P < 1 × 10−5 from Stage 1, as well as two HNF1B SNPs (rs4430796 and rs11651755) identified by Spurdle et al. (Spurdle et al. 2011) (Supplementary Table 2). The fast-track replication was conducted in a multiethnic sample of 2,294 cases and 3,395 controls from two cohorts [MEC and the Prevention Study II (CPSII) Nutrition cohort] and five case–control studies [the Alberta Health Services (AHS) study, FHCRC, Estrogen, Diet, Genetics, and Endometrial Cancer (EDGE) study, Turin, and Women’s Insights and Shared Experiences (WISE)] (Table 1). Among women of European ancestry, we replicated EC associations with SNPs at the HNF1B locus (P < 0.005), but did not replicate any of the other seven SNPs (Supplementary Table 2a): the lowest P value for the seven SNPs in fast-track replication was 0.18. No statistically significant associations were observed when we examined among other ethnicities (Supplementary Table 2b).

We selected 2,129 SNPs with P < 0.0037 in Stage 1 for follow-up in a subset of the fast-track replication studies and two previously conducted GWAS (ANECS/SEARCH and SECGS) for Stage 2 (Supplementary Tables 3 and 4). DNA samples from a multiethnic sample of women in AHS, FHCRC, MEC and EDGE (Supplementary Table 5) were genotyped for 1,818 of these SNPs as custom content on Illumina’s Human Exome 12v1 chip; the remaining SNPs failed design or quality control. After pooled analysis, no SNP association reached genome-wide significance in women of European ancestry or in women of multiple ethnicities combined, either among Type I EC cases (Table 2) or among those with endometrioid subtype (Table 3). In addition, we further adjusted for BMI; results did not change qualitatively (data not shown).

Table 2 Results for all Type I endometrial cancer cases and controls in Stage 1 GWAS and Stage 2 replication (Illumina Exome 12v with custom content plus in silico validation) where P < 1 × 10−4
Table 3 Results for SNPs shown in Table 2, endometrioid cases only and controls in Stage 1 GWAS and Stage 2 replication

Discussion

Our present study reports results from a new independent GWAS of EC based on a total of 7,077 cases and 16,343 controls from the E2C2 (Table 1). We did not identify any novel loci associated with EC that reached genome-wide significance (p < 5 × 10−8).

In a joint analysis of the GWAS and replication populations, the variant most significantly associated with EC was rs9459805 on chromosome 6 at the RNASET2 locus (OR = 1.19, 95 % CI 1.10–1.29; P = 1.11 × 10−5, Table 2). Of potential interest, two variants suggestively associated with EC (rs12514742, joint P = 5.78 × 10−5; rs12521272, joint P = 7.37 × 10−5) are located at the prolactin receptor (PRLR) gene locus on chromosome 5. Circulating levels of prolactin, a polypeptide hormone involved in numerous physiological processes including reproduction, are higher among EC patients compared to healthy controls (Levina et al. 2009; Yurkovetsky et al. 2007; Kanat-Pektas et al. 2010), and increased PRLR expression has been noted for endometrial tumors compared to non-cancerous endometrial tissue. Prolactin signaling via PRLR has also been shown to potentiate proliferation and inhibit chemotherapy-induced apoptosis of EC cell lines (Levina et al. 2009). Additional studies in independent populations are required to confirm whether variants at the PRLR locus influence EC risk.

To date, only one locus associated with EC at the genome-wide significance level has been identified by GWAS (Spurdle et al. 2011). Located within the HNF1B gene on chromosome 17, the common variant most significantly associated with EC (rs4430796; OR per G allele = 0.84, 95 % CI 0.79–0.89; P = 7.1 × 10−10) in the GWAS by Spurdle et al. (2011) was nominally associated with EC in our discovery (Stage 1) population in the expected direction (OR per G allele = 0.92, P = 0.03; Supplementary Table 2a). This effect estimate is consistent with a winner’s-curse adjustment of the original GWAS effect estimate, which also yields a per G allele OR of 0.92 (Zhong and Prentice 2008). Further genotyping within fast-track replication studies confirmed the association of the rs4430796 G allele with reduced EC risk among women of European ancestry (joint OR = 0.90, 95 % CI 0.85–0.96; P = 5.2 × 10−4) with no evidence of heterogeneity between studies (P = 0.50). In the earlier GWAS by Spurdle et al., the discovery phase was restricted to patients with the endometrioid histologic subtype of EC. Additionally restricting the replication stage to cases with endometrioid histology (~77 % of cases) slightly strengthened the association between rs4430796 and EC risk (joint OR = 0.82, 95 % CI 0.77–0.87; P = 4.3 × 10−11) in the study by Spurdle et al. (2011).

Our GWAS study included all EC cases diagnosed with Type 1 tumors, a group consisting of the following histologic subtypes: endometrioid adenocarcinoma (ICD-O-3 codes 8380, 8381, 8382, 8383), adenocarcinoma tubular (8210, 8211), papillary adenocarcinoma (8260, 8262, 8263), adenocarcinoma with squamous metaplasia (8570), mucinous adenocarcinoma (8480, 8481) and adenocarcinoma NOS (8140) (Kim et al. 2008). Even though the endometrioid adenocarcinoma subtypes represent the majority of Type 1 tumors (60 %) (Robboy et al. 2009), the inclusion of the less common Type 1 histologic subtypes may have introduced sufficient heterogeneity to reduce power to detect genome-wide significant associations. However, when we restricted our analysis to Stage 1 and Stage 2 cases with known endometrioid histology, the overall association of rs4430796 with EC risk remained the same, while the significance weakened most likely due to a loss of power from the reduced sample size. This is consistent with results from the PAGE study, which found that HNF1B may be a general susceptibility locus for EC, as risk associated with rs4430796 [G] was similar for Type 1 and Type 2 tumors (Setiawan et al. 2012). Most of the suggestive SNP associations in our study (Table 2) were slightly weakened when the analysis was restricted to cases with known endometrioid histology (Table 3).

Endometrial cancer is part of Lynch syndrome, which is attributable to the inheritance of rare, highly penetrant mutations in DNA mismatch repair genes (Nicolaides et al. 1994; Peltomaki et al. 1993; Aaltonen et al. 1993). The lifetime risk of EC among women with HNPCC is 50–60 %, whereas that of the general population is 2–3 %(Seger et al. 2011). Women with this inherited predisposition to endometrial neoplasm tend to develop the disease 15 years earlier than the general population (Vasen et al. 1994). Studies on estimates of heritability for EC suggested a high genetic component for younger women (Schildkraut et al. 1989; Gruber and Thompson 1996; Parslov et al. 2000). In addition, a record linkage study in Utah (Seger et al. 2011) indicated that there was considerable clustering of EC in families, even accounting for obesity. On the other hand, a twin study of sporadic cancers (i.e., not attributable to family cancer syndromes), which account for 98 % of EC cases, suggests a low genetic contribution (Lichtenstein et al. 2000).

Based on the results of this study and the previous GWAS in European ancestry women (Spurdle et al. 2011), it is unlikely that there exist any common variants with large effects on the risk of EC, although there may be many markers with smaller effects. For example, the probability that at least one of these GWAS would identify a genome-wide significant association with a marker that had a per-allele odds ratio of 1.2 and a risk allele frequency of 0.30 is over 80 %. Conversely, the power of this study to identify a marker like rs4430796 with a per-allele odds ratio of 1.08 and risk allele frequency of 0.52 is 5 %; the power of the Spurdle et al. GWAS was under 1 %. This suggests that circa 18 additional markers with HNF1B-like effects on EC exist, but have not yet been identified due to low power (Park et al. 2010). Consequently, a GWAS with 12,000 cases and 24,000 controls—triple the sample size of the two European ancestry GWAS conducted to date—should identify three or more markers with HNF1B-like effect sizes with 85 % probability, as well as other markers with smaller effects. We caution that these projections are based on only one known GWAS-identified risk marker; we cannot rule out a larger number of HNF1B-like risk markers and can say little about markers with subtler effects.

In conclusion, we did not identify any novel loci associated with EC susceptibility. Taken together, a low inherited genetic component, tumor heterogeneity and the small expected effects of genetic variants could explain the apparent lack of association. Therefore, larger studies with specific tumor classification (Kandoth et al. 2013) are necessary to identify novel genetic polymorphisms associated with EC susceptibility.

Materials and methods

Study participants

Participating studies are described in Table 1 and comprise a total of 7,077 EC cases and 16,343 controls from 15 studies (ten case–control and five cohort, which were analyzed as nested case–control). Cases in Stage 1 were diagnosed with Type I EC. In cohort studies, controls were cancer free at the time of case diagnosis. In case–control studies, controls had not had hysterectomies. The cohort studies were analyzed as nested case–control studies. Cases of European descent from CTS, CONN, FHRC, MEC, NHS and PLCO were scanned using Illumina Omniexpress. PLCO controls were scanned using Illumina Omni 2.5 and the PECS cases and controls were scanned using Illumina Human 660 W. With the exception of PLCO, all controls were matched to cases on age within each study site. Each participating study obtained informed consent from study participants and approval from its institutional review board (IRB) for this study and obtained IRB certification permitting data sharing in accordance with the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association studies (GWAS).

Participating studies in Stage 2 are described in Table 1. We did not restrict to European ancestry in this stage; a multiethnic population was included (Supplementary Table 5), although we also conducted sensitivity analyses restricted to women of European ancestry. We conducted two replications, a fast track, in which nine SNPs were genotyped in all studies except ANECS, SEARCH and SECGS using the Taqman assay. Stage 2 was conducted using the Illumina’s Human Exome 12v1 chip with custom content in the following studies: AHS, FHCRC, MEC and EDGE.

GWAS Genotyping

DNA was isolated from peripheral blood following the manufacturer’s recommended protocol. Genotyping was performed at two centers. At least 625 ng of each DNA sample from NHS, CONN, MEC, CTS and FHCRC was sent to USC for genotyping using the HumanOmniExpress BeadChips (Illumina Inc, San Diego, CA). The BeadChips were run on an Illumina iScan system using the Infinium HD Assay Super Automated Protocol. The GenomeStudio Genotyping (GT) Module (Illumina Inc, San Diego, CA) was used for data normalization and genotype calling. The following studies were genotyped at the Core Genotyping Facility (CGF), at the National Cancer Institute; PLCO cases were genotyped using the Illumina Omni Express chip, PECS controls were previously genotyped on the Illumina Human 660 W chip and PLCO controls were genotyped on the Omni 2.5 M chip.

Replication genotyping

Fast-track replication was performed at the Dana Farber/Harvard Cancer Center High-Throughput Genotyping Core on the ABI PRISM 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA) according to the manufacturer’s instructions. TaqMan® assays were ordered using either Assays-on-Demand or using the ABI Assays-By-Design service. All Stage 2 replication samples were genotyped using Illumina Exome 12v with custom content (N = 1818 SNPs) (Table 1).

Genome-wide association analysis

In total, 5,806 women with genotypes were available for Stage 1 analysis. To minimize bias due to population stratification, we used ~7,600 ancestry informative markers to identify and exclude women with <80 % European ancestry (N = 146). An additional four participants were excluded based on a self-report as being of non-European descent. We also identified four unexpected inter-study duplicates (all EC cases) and removed one subject from each unexpected duplicate pair. Because the scan was based on women of European descent with Type I EC, 180 cases of Type II EC were excluded for a final sample size of 5,472 (2,695 cases, 2,777 controls) women eligible for Stage 1. After filtering SNPs with completion rates <90 %, minor allele frequencies <1 %, and out of Hardy–Weinberg equilibrium (P < 0.0001) we had >524K genotyped SNPs in each Stage 1 study for a combined total of >873K unique SNPs across all studies. Concordance between known duplicates was >99.9 %.

We applied similar filters to the newly genotyped Stage 2 samples. Four pairs of unexpected duplicates (eight total samples) and 30 samples with <90 % SNP completion rate were removed. One genetically male sample and seven samples that did not cluster with other samples from their self-reported ancestry group were also excluded, leaving 2,975 samples for analysis. SNPs with <90 % completion rate were removed from analysis, as were SNPs that showed deviation from HWE at P < 10−5 in any ethnic group.

Genotyping procedures, quality control and analysis procedures for the ANECS/SEARCH and SECGS GWAS have been reported previously (Spurdle et al. 2011; Long et al. 2012).

In all analyses, genotypes were coded log additively (0, 1, 2 copies of the minor allele) and logistic regression was used to model associations. Stage 1 analyses were adjusted for study and the first two principal components. Analyses of the newly genotyped Stage 2 data (i.e., all Stage 2 studies except ANECS/SEARCH or SECGS) were adjusted for study and the first four principal components. Principal components for Stage 1 were calculated using ~7,600 independent markers (Yu et al. 2008); principal components for Stage 2 were calculated using 47,097 common SNPs on the exome chip. Of the 1,818 SNPs selected for replication in Stage 2, 1,371 loci included additional in silico data from two previously reported GWAS (Spurdle et al. 2011; Long et al. 2012) in a total of 2,121 cases and 10,209 controls from SEARCH/ANECS and SECGS studies. Study populations were analyzed separately and results combined using fixed effects meta-analysis. Association analyses of SNPs selected for fast-track replication were conducted in SAS Version 9.2 (SAS Institute, Cary, NC, USA). All other analyses were performed using PLINK software package (v 1.07, October 2009).