Leveraging Ethnic Group Incidence Variation to Investigate Genetic Susceptibility to Glioma: A Novel Candidate SNP Approach

Objectives: Using a novel candidate SNP approach, we aimed to identify a possible genetic basis for the higher glioma incidence in Whites relative to East Asians and African-Americans. Methods:  We hypothesized that genetic regions containing SNPs with extreme differences in allele frequencies across ethnicities are most likely to harbor susceptibility variants. We used International HapMap Project data to identify 3,961 candidate SNPs with the largest allele frequency differences in Whites compared to East Asians and Africans and tested these SNPs for association with glioma risk in a set of White cases and controls. Top SNPs identified in the discovery dataset were tested for association with glioma in five independent replication datasets. Results: No SNP achieved statistical significance in either the discovery or replication datasets after accounting for multiple testing or conducting meta-analysis. However, the most strongly associated SNP, rs879471, was found to be in linkage disequilibrium with a previously identified risk SNP, rs6010620, in RTEL1. We estimate rs6010620 to account for a glioma incidence rate ratio of 1.34 for Whites relative to East Asians. Conclusion: We explored genetic susceptibility to glioma using a novel candidate SNP method which may be applicable to other diseases with appropriate epidemiologic patterns.


INTRODUCTION
Incidence rates of adult primary malignant brain tumors (PMBT), most of which are gliomas (Kohler et al., 2011), vary among ethnic groups (Darefsky and Dubrow, 2009;Dubrow and Darefsky, 2011). The age-standardized incidence rate for northern American non-Hispanic Whites is 2.5-3.0 times the rate among East Asians and around twice the rate among African-Americans. The latter ratio is likely to be higher for comparisons of White to African populations, given the ∼20% European content of the African-American genome (Patterson et al., 2004); however presently there are no data allowing an evaluation (Darefsky and Dubrow, 2009). These ethnic differences in PMBT incidence are unlikely to be solely ascribable to factors such as access to care or diagnostic facilities; in particular, the White-East Asian difference is observed in comparisons among different countries as well as within the United States, where both groups have similar access (Darefsky and Dubrow, 2009;Dubrow and Darefsky, 2011). Notably, ethnic incidence variation has been observed for both www.frontiersin.org grade IV glioma (glioblastoma, or GBM) and non-GBM tumors (Dubrow and Darefsky, 2011).
The only established environmental risk factor for glioma is exposure to high-dose ionizing radiation (Bondy et al., 2008;Ostrom and Barnholtz-Sloan, 2011), which accounts for a small number of cases; furthermore, studies have demonstrated a consistent inverse association with history of allergy (Schoemaker et al., 2010;Lachance et al., 2011) as well as evidence of interaction effects between history of allergy and several established glioma risk alleles (Schoemaker et al., 2010). However, epidemiologic studies have provided no conclusive evidence for diagnostic radiation (Davis et al., 2011), electromagnetic field exposure from residential power lines (Wrensch et al., 1999), smoking (Mandelzweig et al., 2009), alcohol consumption (Efird et al., 2004), nutritional factors (Bondy et al., 2008), or cell phone use (Cardis et al., 2010) as risk factors. Collectively, these observations suggest that ethnic group associated genetic variants, rather than environmental factors, underscore variation in glioma incidence among ethnic groups. Such an assertion is supported by a number of studies suggesting genetic pathways to glioma may differ across ethnicities (Mochizuki et al., 1999;Chen et al., 2001;Das et al., 2002;Wiencke et al., 2005). Following on from this it is possible that the frequencies of haplotypes associated with glioma susceptibility will differ between Whites and East Asians/Africans, such that haplotypes harboring alleles associated with an increased glioma risk would be more prevalent among Whites and conversely haplotypes associated with decreased glioma risk would be more prevalent among East Asians and Africans. Identification of these haplotypes offers the prospect of gaining valuable insight into genes influencing glioma risk.
Here we employed a candidate SNP approach to identify previously unknown genetic variants associated with glioma risk through the identification of SNPs that may tag glioma-related haplotypes. Our primary hypothesis is based on the premise that the same alleles confer protection against glioma in both East Asians and Africans. Consequently, we propose that these alleles are carried at a greater frequency by both East Asians and Africans than by Whites and that genetic regions (i.e., haplotypes) containing SNPs with the greatest allele frequency differences between Whites and both East Asians and Africans (with the same direction of difference) are particularly likely to harbor these alleles. To take into account the possibility that alleles that confer protection in East Asians differ from alleles that confer protection in Africans, we also propose a secondary hypothesis that genetic regions containing SNPs with the greatest allele frequency differences between Whites and either East Asians or Africans, but not both, are likely to harbor protective alleles which are distinct from those identified under the primary hypothesis.
Given that ethnic incidence differences are broadly similar for GBM and non-GBM glioma (Dubrow and Darefsky, 2011), we postulate that polymorphisms driving these incidence differences are common across these glioma subtypes, and therefore consider all gliomas combined without stratification. Since large differences in allele frequency are needed to account for even a relatively small portion of the White/East Asian or White/African incidence rate ratio (Figure 1), we restrict our analyses to SNPs showing the largest frequency differences. FIGURE 1 | White/East Asian incidence rate ratios for varying allele distributions and genotypic relative risks. Plots were generated by calculating incidence rate ratios (IRRs) according to varying genotypic relative risks (GRR) and ethnic group allele frequencies. For example, suppose the GRR for glioma for persons with one B allele is 2.00, and the GRR for persons with two B alleles is 3.00 (relative to those homozygous for the A allele). If the frequency of allele A in Whites is 0.20 (p = 0.2), the proportions of AA (p 2 ), AB (2pq), and BB (q 2 ) genotypes are 0.04, 0.32, and 0.64, respectively, assuming Hardy Weinberg equilibrium. To calculate a normalized incidence rate, the genotype proportion is multiplied by the associated GRR risk: 0.04 (1.00) + 0.32 (2.00) + 0.64 (3.00) = 2.60. Given an East Asian allele A frequency of 0.80, the East Asian normalized incidence rate is 0.64 (1.00) + 0.32 (2.00) + 0.04 (3.00) = 1.40. The White/East Asian IRR is 1.86 (2.60/1.40) in this scenario. The same calculations apply for White/African IRRs.

SELECTION OF CANDIDATE SNPs
To select candidate SNPs we used allele frequency data on unrelated individuals from six populations included in the International HapMap Project Phase III (Altshuler et al., 2010): 113 Utah residents with Northern and Western European ethnicity (CEU); 102 Toscans from Italy (TSI); 137 Han Chinese from Beijing, China (CHB); 113 Japanese from Tokyo, Japan (JPT); 147 Yoruba from Ibadan, Nigeria (YRI); and 110 Luhya from Webuye, Kenya (LWK). We grouped CEU and TSI together as Whites, CHB and JPT as East Asians, and YRI and LWK as Africans. Data from Phase III, release 28 were downloaded from the HapMap Project File Transfer Protocol 1 . In this release, frequency of genotype missingness per SNP was required to be <0.05 per population, and SNPs were excluded with Hardy Weinberg P < 10 −7 . We used in-house Perl scripts to calculate allele frequencies and call rates, according to ethnic group, for SNPs genotyped previously for our discovery genome-wide association study (GWAS) of high-grade adult glioma (Wrensch et al., 2009). SNPs with call rates <95% among Whites, East Asians, or Africans, respectively, were excluded. For each SNP, differences in reference allele frequencies were calculated for Whites vs. East Asians and Whites vs. Africans, as well as the average difference if differences were in the same direction (e.g., the frequency of allele A is low in Whites but high in both East Asians and Africans).
To test our primary hypothesis, we selected SNPs with the greatest average allele frequency differences (provided equivalent directionality) as defined by three categories: "Highest" (≥0.70),"High" (0.60 to <0.70), and "Moderate" (0.40 to <0.60). To test our secondary hypothesis, we selected SNPs for which the allele frequency difference was Highest (≥0.70) in one population comparison, but Low (<0.40) in the other ("Highest/Low").

DISCOVERY DATASET
Descriptive characteristics of discovery set cases and controls are presented in Table 1. Subjects providing genotype data for the 1 ftp://ftp.ncbi.nlm.nih.gov/hapmap discovery phase included 692 high-grade glioma cases and 3,992 controls originally assembled for the 2009 GWAS of glioma by Wrensch et al. (2009). Briefly, cases included 622 individuals of European ethnicity from the San Francisco Adult Glioma Study (AGS) and 70 from The Cancer Genome Atlas (TCGA; McLendon et al., 2008), aged 20 or older with incident histologically confirmed anaplastic astrocytoma (n = 97) or GBM (n = 595; International Classification of Diseases for Oncology, morphology codes 9380-9481). Controls included 602 subjects from AGS identified using random digit dialing and frequency matched to cases on age, sex, and ethnicity, as well as 3,390 subjects from the Illumina iControl Database 2 . All subjects were confirmed to be unrelated and of European ethnicity by multidimensional scaling analysis.

DISCOVERY DATASET GENOTYPING
Details of sample preparation and genotyping have been provided previously (Wrensch et al., 2009). Briefly, DNA from all AGS cases and controls was isolated from whole blood using Qiagen's Gentra Puregene DNA isolation kit, and genotyping was conducted using Illumina's HumanCNV370-Duo BeadChip. AGS samples were required to have a call rate of at least 98%. SNPs deviating from Hardy Weinberg equilibrium in AGS or Illumina controls (P < 10 −5 ) were excluded from further analysis, as were those with greater than 5% missing data in any of the four subject groups (AGS cases or controls, TCGA cases, Illumina controls).

REPLICATION DATASETS
We investigated our top candidate SNPs from the discovery dataset in five independent sets of cases and controls. Detailed procedures of subject selection and genotyping have been described previously (Shete et al., 2009;Wrensch et al., 2009;Sanson et al., 2011). Mayo Clinic cases (n = 176), 65% with GBM and 35% with grade III glioma, were diagnosed in Rochester, Minnesota between 2005 and 2008. Controls (n = 174) were identified from among individuals who had a general medical exam at the Mayo Clinic, and were matched to cases on sex, age, race, and residence. All cases 2 http://www.illumina.com/science/icontroldb.ilmn www.frontiersin.org and controls were genotyped using Illumina Human 610Quad arrays.
The four other replication datasets (UK, US, French, and German) were previously included in a pooled GWAS of glioma (Shete et al., 2009;Sanson et al., 2011). Briefly, the UK GWAS comprised 631 cases ascertained through the INTERPHONE study (Cardis et al., 2010) and 2,699 controls from the 1958 Birth Cohort (Power and Elliott, 2006). The US GWAS comprised 1,247 cases recruited through MD Anderson Cancer Center in Houston, Texas and 2,236 controls from the Cancer Genetic Markers of Susceptibility study (Hunter et al., 2007). The French GWAS comprised 1,423 cases from the Service de Neurologie Mazarin, Groupe Hospitalier Pitié-Salpêtrière Paris, and 1,190 controls from the SU.VI.MAX study (Hercberg et al., 2004). The German GWAS comprised 846 cases recruited from the University of Bonn Medical Center, and 1,310 controls from the KORA (Holle et al., 2005;Wichmann et al., 2005), POPGEN (Krawczak et al., 2006), and Heinz Nixdorf RECALL studies (Schmermund et al., 2002). Cases in the UK and US GWAS were genotyped using Illumina Human 610Quad arrays, and cases from the French and German GWAS were genotyped using Illumina HumanHap660 arrays. Controls in the UK GWAS were genotyped using Illumina Human 1M Duo arrays; the US controls on Illumina HumanHap240, 300, and 500 arrays; the French controls on Illumina HumanHap660 arrays; and the German controls using Illumina HumanHap550 arrays.

STATISTICAL ANALYSES
Odds ratios and 95% confidence intervals for the association of candidate SNPs with glioma in the discovery and replication sets were calculated using unconditional logistic regression under an additive model (0, 1, or 2 copies of the minor allele). Potential population stratification was adjusted for using principal components derived by the EIGENSTRAT method and included in the logistic regression model (Price et al., 2006).
Discovery set results were evaluated in comparison to Bonferroni-adjusted significance thresholds based on a study-wide significance threshold of 0.05, calculated separately for each of the four subgroups of candidate SNPs [Highest allele frequency difference (≥0.70), High (0.60 to <0.70), Moderate (0.40 to <0.60), and Highest/Low (≥0.70 in one population, <0.40 in the other population)] such that the significance thresholds accounted for the prior probability of association with glioma according to our hypotheses (i.e., P = 0.0125 per subgroup). With 38 SNPs in the Highest allele frequency difference category, the significance threshold for this category was 0.0125/38 = 3.29 × 10 −4 . Statistical thresholds for the High, Moderate, and Highest/Low subgroups were 4.70 × 10 −5 , 4.34 × 10 −6 , and 1.61 × 10 −5 , respectively ( Table 2). For replication set analyses we used a nominal significance level of 0.05. All P-values reported (discovery and replication) are one-sided because of the directionality inherent in the hypothesis being tested. The generic inverse variance method was used (assuming a fixed effects model) to obtain meta-analysis results for combined discovery and replication set data.

SELECTED CANDIDATE SNPs
Of 275,895 SNPs for which genotype data on discovery glioma cases and controls were previously available, HapMap data were not available or were of insufficient quality for 1,188 (0.43%). Predetermined allele frequency difference criteria were met for 3,961 of the 274,707 remaining SNPs (Figure 2). We identified 2,883 SNPs in the Moderate (0.40 to <0.60) allele frequency difference category, 266 in the High (0.60 to < 0.70) category, 38 in the Highest (≥0.70) category, and 774 in the Highest/Low category (≥0.70 in one population comparison, but <0.40 in the other; Table 2; Figure 3).

DISCOVERY SET
A Manhattan plot of the 3,961 SNP-glioma associations is shown in Figure 4. The most strongly associated SNP, rs879471 in STMN3 on chromosome 20q13 (P = 1.72 × 10 −4 ), maps 39.9 kb from rs6010620, a SNP intronic to RTEL1 that was previously identified as a top hit in the GWAS conducted by Wrensch et al. (2009). Conditioning rs879471 on rs6010620 did not, however, provide evidence of a separate signal (P = 0.22), so rs879471 was excluded from further analysis.
While no SNP association attained our predetermined Bonferroni-adjusted significance levels ( Table 2), 10 genes (SMARCA2, BRE, SLCO3A1, MORN5, C10orf11, RBM27, PTPRJ, SPIB, NMNAT1, and RPUSD3) were identified containing at least one SNP with P < 0.01. In order to investigate SNPs that may be markers of glioma risk but were excluded by our strict allele frequency criteria, we tested 260 additional SNPs in these 10 genes and within 5 kb upstream and downstream for association with glioma risk (regardless of ethnic allele frequency differences). We excluded four genes (SLCO3A1, C10orf11, PTPRJ, and NMNAT1) from further analysis because the direction of association of additional tested SNPs with glioma risk was inconsistent with our hypothesis. All SNPs with P < 0.01 across the six remaining genes (SMARCA2, BRE, MORN5, RBM27, SPIB, and RPUSD3) were selected for replication (n = 20 SNPs, nine of which were identified in the secondary discovery analysis).

REPLICATION SETS
Discovery and replication set results for the 20 selected SNPs are presented in Table 3. Genotype data were not available for rs4464229 or rs3863 in the UK, US, French, or German sets, and data were not available for rs4666022 in any of the replication sets. For the remaining SNPs, four achieved nominal significance (P < 0.05) with an odds ratio in the same direction as in the discovery set in one of the five replication sets and one achieved nominal significance in two of the replication sets; however, none achieved statistical significance in a replication set after correcting for multiple comparisons, and none achieved nominal significance in the meta-analysis.

DISCUSSION
Here we have applied a novel candidate SNP method to identify glioma risk alleles, taking advantage of ethnic group differences in glioma incidence. Specifically, we tested the hypothesis that genetic regions containing SNPs with extreme differences in allele frequencies across ethnicities harbor variants that drive the ethnic group variation in glioma incidence. Although in the present study no SNPs from our discovery set reached our predetermined significance thresholds, we identified 10 genes containing one or more SNPs with P < 0.01, and selected 20 SNPs across six genes for replication. However, no SNP was statistically significant in any of the replication sets after accounting for multiple comparisons, and no SNP was nominally significant in the meta-analysis.
There were some limitations of this study which may have prevented the detection of glioma-associated SNPs. While our allele frequency difference criteria for candidate SNPs were designed to be inclusive of SNPs that could be responsible for a meaningful risk difference across ethnic groups, it is possible that these criteria excluded SNPs that are, in fact, associated with glioma risk but did not meet our criteria. Additionally, given 692 cases and 3,992 controls in our discovery set, our power calculations demonstrate that for a moderately common putative risk allele (0.20 allele frequency), we had 80% power to detect an odds ratio as low as 1.46. Yet for a relatively rare risk allele (0.05 allele frequency), we had 80% power to detect an odds ratio no lower than 1.88. Thus, it is plausible that our set of candidate SNPs includes one or more variants with low to moderate association with glioma risk, but that we were underpowered to detect such an association. Furthermore, it should be noted that although we postulated that polymorphisms driving ethnic group incidence differences are common across glioma subtypes, it is possible that differences in the glioma subtype distribution in the discovery and replication sets impacted the replicability of our findings. The ability of our approach to detect SNPs that tag glioma-related haplotypes may also have been degraded by heterogeneity across ethnic groups in the haplotype that a given tagging SNP represents. Finally, we note that our study was unable to assess potential interaction effects between risk loci, gene-environment interactions, or the role of rare variants.
Although this study did not lead to the discovery of novel glioma-associated SNPs, it is noteworthy that our most strongly associated candidate SNP, rs879471, was in strong linkage disequilibrium with rs6010620 (D = 0.78 in HapMap CEU + TSI, data from Haploview version 4.2), a top hit from the Wrensch et al. (2009) GWAS. This suggests the successful identification of a haplotype that differs in frequency across ethnic groups and Frontiers in Genetics | Applied Genetic Epidemiology  (Cochran, 1954 Sanson et al. (2011). is related to glioma risk. In this respect the A allele of rs6010620, which is protective against glioma (OR = 0.68, 95% CI: 0.58-0.79), is considerably more common in East Asians than Whites (frequency of 0.697 vs. 0.228, respectively), but did not meet our strict allele frequency difference criteria because the frequency in Africans was 0.019. On the basis of a calculation similar to that presented in Figure 1, we would conclude that this SNP is sufficient to account for an incidence rate ratio of 1.34 for Whites relative to East Asians. Notably, another top hit from the same GWAS, rs1412829, has a risk allele C (OR = 1.39, 95% CI: 1.24-1.57) that is more common in Whites than East Asians or Africans but also did not meet our allele frequency criteria (frequency of 0.402, 0.104, and 0.009, respectively). Based on our calculations, this SNP can account for a White to East Asian incidence rate ratio of 1.22, and a White to African incidence rate ratio of 1.30. Thus, these two SNPs alone may account for a meaningful proportion of the observed inter-ethnic incidence rate ratios. None of the other five established glioma susceptibility loci contribute to the inter-ethnic incidence rate differences ( Table 4).
The candidate SNP approach used in this study provides a viable alternative to admixture mapping, which investigates the genetic makeup of recently admixed groups to localize diseaserelated variants (Patterson et al., 2004;Smith and O'Brien, 2005). In the context of glioma research, the admixture method would ideally be applied to a set of African-American cases and controls, where excess European ethnicity shared among African-American cases would be suggestive of genetic regions that may play a role in glioma risk. For conditions like hypertension, where prevalence differs widely between ethnicities and samples from an admixed group are available, the pairing of admixture mapping with GWAS has proven to be an effective means of identifying disease-related variants (Levy et al., 2009;Zhu et al., 2011). However, when samples from an admixed group are not available, we present our approach as an alternative complement to GWAS. Specifically, this approach to identify disease-causing variants is attractive where (1) the disease incidence differs substantially across two or more ethnic groups; (2) ethnic group differences tend to persist independent of geographic location, suggesting a genetic etiology; (3) there is an availability of cases and controls of uniform ethnicity; and (4) an admixture approach is not feasible given the unavailability of a sufficient number of appropriate admixed cases.
While we did not identify novel glioma susceptibility variants in this analysis, we conclude that the additional risk in White populations conferred by rs6010620 and rs1412829 lends support to our initial hypothesis, and provides an impetus for a larger discovery set and/or pursuing admixture mapping. Given the rarity of glioma among African-Americans and the resultant difficulties inherent in collecting enough African-American cases to perform an admixture mapping study, further application of our method may be the preferred approach.