Introduction

Leukemia, a heterogeneous group of hematological malignancies, is the most common cancer in children. Nearly all pediatric cases of leukemia are acute in nature, and the majority, ~80%, are acute lymphoblastic leukemia (ALL) [1]. ALL is characterized by specific chromosomal abnormalities, such as translocations and changes in ploidy, that may result from unrepaired DNA damage such as double-strand breaks (DSBs) [2, 3]. The few known risk factors for childhood leukemia include exposure to ionizing radiation and some chemotherapeutic agents [4], which are known to cause cellular DNA damage [5]. Repair of damage resulting from exposure to such agents is a critical cellular maintenance function. Therefore, alterations in innate DNA repair, cell cycle, or genomic maintenance processes may play a role in leukemia development [68].

In addition to altered DNA repair processes, DNA DSBs can also arise via the inhibition of DNA topoisomerases [9], or immune-cell specific processes such as somatic hypermutation and class-switch recombination [10, 11]. Germline single nucleotide polymorphisms (SNPs) in genes that control these processes may alter their function, resulting in the accumulation of DNA damage and chromosomal instability.

Cancers at numerous sites have been linked to SNPs in specific DNA repair pathways, including nucleotide excision repair (NER), mismatch repair (MMR), and DSB repair and breast cancer [12, 13]. In addition, SNPs in genes that control cell cycling and checkpoints may determine whether cells with substantial DNA damage and instability progress to replication or undergo apoptosis [5, 14].

In order to explore the association between germline genetic variants and ALL, we examined 21 genes in base excision (BER), nucleotide excision (NER), and DSB repair pathways, and 11 genes in cell cycle and genomic maintenance pathways, among 377 childhood ALL cases and 448 controls from a population-based study of childhood leukemia in Northern and Central California. In addition, while the association between high doses of ionizing radiation and leukemia is well established [15, 16], the evidence for low dose sources such as diagnostic X-rays is evolving. As we previously found exposure to diagnostic X-rays postnatally to be a risk factor for childhood leukemia [17], we examined whether the risk of childhood ALL associated with diagnostic irradiation is modified by variants in DNA repair genes.

Materials and methods

Study subjects

The study was conducted among children participating in the Northern California Childhood Leukemia Study (NCCLS), an ongoing population-based case–control study. The study enrollment and recruitment procedures have been described in detail previously [18]. Briefly, case children under 15 years of age with incident leukemia were ascertained within 72 h of diagnosis via a rapid reporting system established with the diagnosing hospitals. Control children selected from California birth certificates were matched to case children on date of birth, sex, maternal race, and child’s Hispanic ethnicity (having at least one parent reporting Hispanic ethnicity). Participation rates among eligible cases and controls were 87 and 86%, respectively. Data on potential risk factors were elicited from an English- or Spanish-speaking parent (usually the mother) by trained interviewers using a structured questionnaire.

This study was reviewed and approved by institutional review committees at the University of California Berkeley, the California Department of Public Health, and the participating hospitals. Written informed consent was obtained from all parent respondents.

Biospecimen collection and DNA processing

Buccal cytobrush specimens successfully collected at interview from 95% of participating children were processed by heating in the presence of 0.5 N NaOH. Isolated DNA was later repurified using an automated DNA extraction system (AutoGen, Holliston, MA), whole-genome amplified using GenomePlex reagents (Rubicon Genomics, Ann Arbor, MI). When buccal cytobrush DNA was inadequate (26.6% of subjects), DNA was isolated from dried bloodspots (DBS) collected at birth and archived by the Genetic Diseases Screening Program of the California Department of Public Health. After extraction (QIAamp 96 DNA Blood Kit, QIAGEN, Germany), these DNA samples were whole-genome amplified using REPLI-g reagents (QIAGEN). Regardless of source, DNA specimens were quantitated using human-specific Alu-PCR to confirm a minimum level of amplifiable human DNA [19]. We have previously shown that highly multiplexed GoldenGate genotyping (Illumina, San Diego, CA) of whole-genome amplified buccal cell DNA yields genotypes that are highly concordant with those from genomic DNA from peripheral blood [20]. We genotyped DNA specimens from both buccal cells and DBS for 9 subjects; genotype concordance between paired samples was 98.9%.

Selection of variants

This analysis included SNPs in 32 genes involved in DNA repair, cell cycle and genomic stability processes. These genes were selected after consensus review by our investigative team after review of the literature. These include base excision repair (APEX1, MUTYH, UNG2, XRCC1), nucleotide excision repair (ERCC2), double-strand break, nonhomologous end-joining (LIG4, PRKDC, XRCC4, XRCC5, XRCC6), double-strand break, homologous recombination (BRCA1, BRCA2, MRE11, NBN, RAD50, RAD51, RAD54B, RAD54L, XRCC2, XRCC3), direct reversal of damage (MGMT), cell cycle (TP53, TP53BP1, CCND1, CDKN2A (p16), CDKN2B (p15)), topoisomerases (TOP1, TOP2A, TOP2B), and recombination of immunoglobulin and T-cell receptor genes (AICDA, RAG1, RAG2).

Using HaploView [21] in conjunction with SNP data from the 30 Caucasian trios in the HapMap project (Release 19, Build 34, www.hapmap.org) and the 23 Hispanics in the SNP500Cancer project (www.snp500cancer.nci.nih.gov), we selected haplotype-tagging SNPs (htSNPs) that captured at least 80% of the haplotype diversity for common haplotypes (>5% frequency) in either the Caucasian or Hispanic populations [22]. As Hispanics are a recently admixed ethnic group and comprise 42% of our study population, we placed special emphasis on capturing haplotype structures in this population. To maximize capture of potential regulatory regions, we included 10 kb stretches both up- and downstream from the gene boundaries reported in the UCSC Genome Browser. In addition, we identified non-synonymous SNPs within these genes. Finally, a set of ancestry informative markers (AIMs) was included for SNP genotyping; these had been previously identified to distinguish Amerindian, African, and European populations [23], three populations that make up the genetic ancestry of US Hispanics.

Genotyping

Although the study recruitment was originally of an individually matched design, we performed a frequency-matched analysis by including all subjects with available biospecimens from 1995 through 2002. Accordingly, we genotyped whole-genome amplified DNAs of 385 available ALL cases and 456 available controls enrolled from 1995 to 2002 using a custom Illumina GoldenGate assay panel with a GenCall threshold of 0.25. Fifty-nine duplicate samples that were processed in the same manner and genotyped on the same plate showed >99% concordance in genotype. For these duplicates, the sample with the higher SNP call rate within each pair was included in the final data. SNPs were excluded if the call rate was less than 90% (29 SNPs), had minor allele frequencies less than 5% in both Hispanics and non-Hispanics (6 SNPs), or failed Hardy–Weinberg equilibrium (p < 0.01) in both Hispanic and non-Hispanic controls (1 SNP). After applying these data quality thresholds, data for 238 SNPs in the 32 selected genes were available for 377 ALL cases and 448 controls. A total of 80 AIMs were successfully typed in these subjects.

Cytogenetic characterization

The cytogenetic classification methods used in this analysis have been described in detail elsewhere [24]. Briefly, pretreatment diagnostic karyotype and fluorescence in situ hybridization (FISH) data were abstracted from leukemia patient records shortly after diagnosis. Additional FISH analyses were conducted at the University of California, Berkeley, to identify ETV6-RUNX1 [t(12; 21)] translocations and hyperdiploidy when not done at hospitals. As shown in Table 1, the presence of any structural or numerical change was common among the NCCLS case population. The most frequent structural abnormality, t(12;21), was more common among non-Hispanics than Hispanics. Among cases with a numerical chromosomal change of any kind, high hyperdiploidy (>51 chromosomes) was most commonly observed. Because of the low prevalence for most other individual structural and numeric changes, cytogenetic subgroup analyses were limited to t(12;21), high hyperdiploidy, and the broader categories “any structural change” and “any numerical change”.

Table 1 Demographic and cytogenetic characteristics of Hispanic and Non-Hispanic children, the Northern California Childhood Leukemia Study, 1995–2002

Diagnostic X-ray exposure assessment

Information on child’s X-rays received prior to the date of diagnosis for cases or corresponding reference date for matched controls (hereafter called “postnatal X-rays”), was collected during the in-person interview as described previously [17]. All postnatal diagnostic X-ray exposures, with the exception of dental X-rays, were reported by the following broadly defined regions of the body: chest, skull, broken bone, and “other”. Respondents also reported the number of postnatal X-rays received and the age at first X-ray. Postnatal X-ray exposure information was available for 746 (90.4%) of the genotyped study participants. In our genotyped study sample, having 3+ x-diagnostic X-rays postnatally was associated with a significantly increased risk of childhood ALL (OR = 2.49, 95% CI 1.55–3.98), similar to our previous report on a larger sample size [25].

Statistical analysis

Based on the AIMs, individual estimates of genetic ancestry, i.e., percent contribution of each of the three ancestral populations per person, were obtained from maximum likelihood estimation as described previously [26]. Using the confounding relative risk (CRR) [27], we found no evidence of major confounding by estimated genetic ancestry (>10%) over and above adjustment for self-identified race and ethnicity. As a result, our analysis proceeded with stratification by or adjustment for self-identified race and ethnicity.

As a preliminary step prior to haplotype analysis, we tested for potential interactions of individual SNPs with Hispanic ethnicity using the likelihood ratio test at the 0.05 significance level. We used unconditional logistic regression to estimate odds ratios (ORs) for the log-additive associations of individual SNPs, both overall and by cytogenetic subtype, after adjusting for age at diagnosis, sex, and child’s race, plus child’s Hispanic ethnicity in analyses combining Hispanics and non-Hispanics.

For haplotype analyses, we applied a sliding window approach for each gene, as implemented in the haplo.stats package for R [28], using window sizes of 2–5 SNPs. This approach examines sub-haplotypes using the full set of SNP data, with differently sized “windows” of adjacent alleles. This is an effective means of combining multi-locus data for Hispanics and non-Hispanics, as it is agnostic to differences in haplotype structure, provided no individual SNPs for a given gene show significant effect heterogeneity by Hispanic ethnicity (p interaction ≤ 0.05). If none of the individual SNPs in a given gene showed such heterogeneity, data for both ethnic groups were combined for sliding window analyses; for genes in which significant single-SNP heterogeneity by ethnicity existed, the sliding window analysis for that gene was conducted separately for Hispanics and non-Hispanics. We utilized GrASP, a graphical tool [29], to display and visualize sliding window results. We used haplotype trend regression [30] to estimate the magnitude of effect associated with risk haplotypes of the windows with the smallest global p values.

Assessment of potential interactions with exposure to ionizing radiation was limited to single SNPs or haplotypes with significant main effects; the significance of these was assessed in logistic regression models using the likelihood ratio test. The number of postnatal X-rays received by the child was modeled as a dichotomous variable (0–2 vs. 3+ X-rays). Postnatal X-rays that occurred less than a year prior to the date of diagnosis for cases or corresponding reference date for controls were excluded from analysis. Maternal X-rays received during the year prior to conception and during pregnancy were excluded from analysis due to low prevalence of exposure in the study population.

Results

Characteristics of the study population are presented in Table 1 stratified by Hispanic ethnicity. Cases and controls were similar with respect to age, sex, and maternal race.

Of the 32 genes examined, 6 included one or more individual SNPs with effects that differed significantly between Hispanics and non-Hispanics (APEX1, BRCA1, BRCA2, MGMT, RAD51, and RAG2). Accordingly, haplotype sliding window analyses for these 6 genes were stratified by Hispanic ethnicity, while those for the other 26 genes were conducted with both ethnic groups combined. Total ALL and subtype-specific ALL results for genes with significant (p ≤ 0.05) haplotype effects that persisted through increasingly larger windows are presented in the Supplemental Material. Haplotype trend regression results estimating the magnitudes of effect for haplotypes with the lowest multi-SNP p value in sliding window analyses are shown in Table 2 for total ALL, and in Table 3 for specific ALL subtypes.

Table 2 Significant haplotype trend regression results (global p ≤0.05), for total childhood ALL, by ethnicity, NCCLS, 1995–2002
Table 3 Significant haplotype trend regression results (global p ≤ 0.05), by major childhood ALL subtype, both ethnicities combined, NCCLS, 1995–2002

For total ALL, among both ethnicities combined, ERCC2 showed a significant haplotype association that persisted through progressively larger SNP windows, with haplotype G-A-A showing a significantly reduced risk (OR = 0.59, p = 0.018). Among Hispanics, haplotypes A–A-A and A-G-A of RAD51 were significantly associated with increased risks of total ALL (OR = 1.55 and p = 0.05, OR = 1.51 and p = 0.04, respectively). Among non-Hispanics, APEX1 and BRCA2 showed significant haplotype associations with total ALL: haplotype A–A of APEX1 was significantly associated with an increased risk (OR = 1.90, p = 0.003), and haplotype G-A of BRCA2 was significantly associated with an increased risk (OR = 1.77 p = 0.02).

We performed haplotype analyses for specific ALL subtypes for genes with nominally significant associations among both ethnicities combined (Table 3). Two genes (NBN and XRCC4) showed significant associations with t(12;21) translocation-positive childhood ALL (global p ≤ 0.05). For NBN, haplotype A–A-G-A-G was borderline significantly associated with a reduced risk (OR = 0.55, p = 0.057). For XRCC4, haplotypes C-G-G-G-A and C-G-A-G-A were significantly associated with reduced risk (OR = 0.39 and p = 0.039, OR = 0.56 and p = 0.05, respectively). XRCC4 was also significantly associated with childhood ALL with any structural abnormalities: haplotypes G-A-G and G–G-G were both significantly associated with a reduced risk (OR = 0.60 and p = 0.006, and OR = 0.55 and p = 0.012, respectively). Of particular interest, the two 3-SNP risk haplotypes of XRCC4 are part of the two significant 5-SNP risk haplotypes associated with t(12;21) translocation-positive childhood ALL, and have a similar effect.

For high-hyperdiploid-positive childhood ALL, and more broadly for childhood ALL with any numerical ploidy changes, there were significant haplotype associations with just one gene: CDKN2A. The G–G haplotype was significantly associated with a reduced risk of hyperdiploid-positive disease (OR = 0.30, p = 0.002), while risk reduction associated with the A–A haplotype was marginally significant (OR = 0.73, p = 0.09). Looking more broadly at childhood ALL with any numerical changes, these same haplotypes showed significant associations (OR = 0.44 and p = 0.01, OR = 0.67 and p = 0.008, respectively).

In testing gene-environment interactions, we investigated interactions of postnatal X-ray exposures (0–2 vs. 3 or more) with those haplotypes that showed nominally significant main effects in both ethnicities combined, both for total ALL and by major ALL subtype. At the p ≤ 0.05 level, we found the main effect of the G–G-G haplotype of XRCC4 with ALL with any structural abnormalities was significantly modified by number of postnatal X-rays (p = 0.027). No such interactions were observed for NBN or ERCC2.

Discussion

In this study, we utilized a haplotype-tagging approach to examine the role of variants in genes involved in DNA repair and genomic maintenance in risk of childhood ALL. We found significant haplotype associations with total ALL for ERCC2, RAD51, APEX1, and BRCA2; and with specific ALL subtypes for NBN, XRCC4, and CDKN2A. In addition, we observed significant gene-environment interactions between XRCC4 variants and exposure to diagnostic X-rays modulating the risk of structural abnormality-positive childhood ALL. Our results provide strong support for a role of the DNA repair and cell cycle control pathways in risk of childhood ALL.

APEX1 encodes the major apurinic/apyrimidinic (AP) endonuclease in human cells. AP sites occur frequently in DNA as a result of spontaneous hydrolysis, damaging agents, or glycosylases that remove specific abnormal bases; therefore, APEX1 plays an important role in base excision repair. Gene expression analyses of APEX1 have shown high levels of expression in many different types of tumors, including osteosarcoma, ovarian, and digestive cancers ([31] Endunuclease, [32]). The two SNPs involved in the observed APEX1 risk haplotype were borderline significantly associated with total childhood ALL after adjustment for the false discovery rate (p FDR = 0.06 for both) [33]. Of these, rs3120073, whose variant allele was associated with decreased risk ALL among both ethnicities combined, is located 6.5 kb upstream of APEX1, in intron 6 of OSGEP, a probable endopeptidase. Analysis of HapMap CEU population data shows that this SNP is not in strong linkage disequilibrium (r 2 > 0.80) with any nearby SNPs (<10 kbp), though HapMap genotyping data for this variant was available for less than 50% of the CEU population. The other involved SNP, rs11160711, whose variant allele was associated with an increased risk of total childhood ALL among non-Hispanics, is 10kb upstream of APEX1. Neither SNP has known function. Additional studies are warranted to replicate these findings and identify functional variants that may be in strong linkage disequilibrium with these SNPs.

The ERCC2 gene product is involved in nucleotide excision repair. Defects in this gene can cause rare genetic syndromes, including the cancer-prone syndrome xeroderma pigmentosum complementation group D, which necessitates protection from ultraviolet light [34]. We found a significant haplotype association for this gene with total childhood ALL risk. Separately, the BRCA2 gene product, a tumor suppressor, interacts with the RAD51 gene product, a recombinase, to effect homologous repair of double-strand breaks [35]. We observed significant haplotype associations for BRCA2 and RAD51, albeit in different ethnic groups. These results must be confirmed, but support a role for DSB repair pathways in risk of childhood ALL.

The X-ray repair cross-complementing protein encoded by XRCC4 is involved in nonhomologous end-joining repair of double-strand DNA breaks and the completion of V(D)J recombination events [36, 37]. In addition to the significant haplotype associations with this gene for ALL with any structural changes, we observed a significant interaction between the XRCC4 risk haplotype and exposure to postnatal X-rays on risk of the same disease subtype. These effects appear to be driven largely by intronic SNP rs1193695, which showed nominally significant main effects and interactions with postnatal X-ray exposure for both total ALL as well as ALL subtypes defined by any structural changes and any numerical changes. Our results provide compelling support for a role of the nonhomologous end-joining repair pathway in risk of ALL, particularly ALL with structural abnormalities, both alone and in conjunction with exposure to ionizing radiation.

Other genes showed significant subtype-specific haplotype associations as well. Haplotypes of NBN were associated with t(12;21) positive childhood ALL. NBN is part of the MRE11/RAD50 DSB repair complex, involved in homologous DSB repair. Mutations in this gene can lead to Nijmegen breakage syndrome, a chromosomal instability syndrome that predisposes to cancer, among other diseases [38]. These results for NBN did not extend more broadly to ALL with any structural changes.

In contrast, haplotypes of CDKN2A were associated with both hyperdiploid ALL and ALL with any numerical ploidy changes. CDKN2A is a cell cycle control gene recognized as a tumor suppressor for its role in stabilizing p53. A study of secondary hits from a genome-wide association study found CDKN2A SNP rs3731217, which was not included for genotyping in our study, to be associated with childhood ALL, specifically B-cell precursor disease [39]. Our results support these previous findings for a role of CDKN2A in childhood ALL risk, and suggest that further studies should examine effects by ploidy.

Few previous candidate gene studies have investigated the main effects of DNA repair pathway gene variants in the etiology of childhood ALL, and those that have were typically of smaller sample size (less than 200 cases) and focused on analyses of individual SNPs with putative function (i.e., inducing amino acid changes or located in potential regulatory regions). One study reported significant association of a promoter variant of CDKN2A with childhood pre-B-cell ALL [40], while another reported an association of a functional ERCC1 variant with total childhood ALL ([41] 2006). Other studies have found significant main effects of functional SNPs in XRCC1 [42, 43], and a significant haplotype combining three of these XRCC1 functional variants of XRCC1 has also been observed [42]. In our study, we did not observe a significant haplotype association for XRCC1 or ERCC1 though we did observe significant effects for CDKN2A. Inconsistencies between our study and previous candidate gene studies may be attributable to a number of factors, including the differences in approach (haplotype vs. individual SNP), population and/or allele frequency differences and sample size differences that impact power to detect effects, as well as chance.

To our knowledge, this study is the first to examine the joint effects of genes in the DNA repair and cell cycle control pathways with ionizing radiation on risk of childhood ALL using a haplotype approach. We focused on interactions involving haplotypes with significant main effects, as the biological basis is unclear for interactions involving significant subgroup variation in exposure–disease associations with no main effects when subgroups are combined [44]. We also limited our investigation of potential interactions to those haplotypes with significant main effects in both Hispanics and non-Hispanics combined, as the sizes of the individual ethnic groups were considered too small to permit adequately powered examinations of interactions. A previous study focusing on single SNPs using a case-only interaction study design reported a suggestive interaction between an APEX1 SNP and postnatal X-ray exposure [45]. Per our a priori analysis plan, we did not examine APEX1 haplotype interactions since this gene showed significant main effects in only non-Hispanics. In this regard, we recognize that our total sample size (343 cases, 406 controls with both genetic and exposure data) may be insufficient to observe modest interaction effects with adequate statistical power. In addition, although the role of post-natal diagnostic X-rays in childhood ALL risk is not entirely clear [16], we previously found exposure to postnatal X-rays to be significantly associated with childhood ALL (OR = 1.85, 95%CI: 1.22–2.79 for 3+ vs. 0–2 X-rays) in a larger sample size (n = 711 cases), of which the current study population is a subset [17]. However, these measures are derived from maternal reports of the child’s exposures and exposure periods and are potentially subject to reporting errors. Further studies with improved measures of ionizing radiation exposure, perhaps via review of medical records, as well as data on other potential DNA damaging exposures and larger sample sizes, are needed to confirm the associations observed.

One of the strengths of our study is the inclusion of U.S. Hispanics, an understudied population whose childhood leukemia incidence rates are the highest reported in California [46]. We selected SNPs in a manner to maximize capture of genetic variation in Hispanics, and examined Hispanics separately from non-Hispanics where there was significant heterogeneity in between-group effects of individual SNPs. Although this approach may have limited our ability to detect associations in the population as a whole, we believe it was necessary given that genetic susceptibility and/or patterns of linkage disequilibrium may be different in Hispanics versus non-Hispanics due to the Hispanic population’s relatively recent genetic admixture [23]. Results that differ between Hispanics and non-Hispanics may be due to differences in allele frequency and/or haplotype structure, or may reflect underlying differences in exposures that modulate the effects of genes. Regardless, if the results are not spurious, they represent potential risk loci, and we present them in either or both ethnic groups for replication and further followup. A final point is that the limited size of certain racial/ethnic sub-populations among non-Hispanics precluded further stratification of this group; therefore, heterogeneity among non-Hispanics might have obscured results.

In gauging these results, consideration must be given to several factors. First, despite this study’s relatively large sample size compared to those of most previous candidate gene studies, the presence of genetic heterogeneity due to the ethnic and racial diversity of the California population may have influenced our ability to detect associations. However, in this study population, we found no evidence of strong confounding due to estimated genetic ancestry, minimizing concerns about the impact of population stratification on the results. In addition, as noted above, our findings for CDKN2A confirm those from analyses of secondary hits from a recent genome-wide association study [39]. However, associations identified in our study for other genes, including APEX1, ERCC2, and others, were not observed in the primary or secondary hits of the large genome-wide studies conducted to date [39, 47, 48]. This may be due to stringent multiple testing adjustment (at the p ≤ 1 × 10−7 level) to account for the large number of individual variants investigated in the genome-wide studies. In contrast to the agnostic approach to discovery used in genome-wide studies, our study focused on relatively few genes representing key elements of the DNA repair and cell cycle control pathways. Other DNA repair genes not included in our study may also be associated with disease; we are unable to comment on these. We concede that results of our study may be due to chance, and therefore must be replicated. However, the haplotype-tagging approach we adopted maximizes capture of total variation within each candidate gene and the haplotype analysis increases statistical power to detect associations over analyses of individual variants. Finally, differences in genetic risk factors by cytogenetic subtypes may be obscured in studies that do not stratify analyses by these subtypes.

In summary, our results indicate that elements of the DNA repair and cell cycle pathways are likely to be associated with childhood ALL, and that some of these elements may interact with ionizing radiation exposures to modulate risk. The associations and interactions identified should be considered targets for further analysis in studies with larger sample sizes, high quality environmental exposure data, and finer coverage of SNPs in the identified associated regions.