Age-related differences of genetic susceptibility to patients with acute lymphoblastic leukemia

Inherited predispositions to acute lymphoblastic leukemia have been well investigated in pediatric patients, but studies on adults, particularly Chinese patients, are limited. In this study, we conducted a genome-wide association study in 466 all-age Chinese patients with Acute lymphoblastic leukemia (ALL) and 1,466 non-ALL controls to estimate the impact of age on ALL susceptibility in the Chinese population. Among the 17 reported loci, 8 have been validated in pediatric and 1 in adult patients. The strongest association signal was identified at ARID5B locus and gradually decreased with age, while the signal at GATA3 exhibited the opposite trend and significantly impact on adult patients. With genome-wide approaches, germline variants at 2q14.3 rank as the top inherited predisposition to adult patients (e.g., rs73956024, P = 4.3 × 10-5) and separate the genetic risk of pediatric vs. adult patients (P = 3.6 × 10-6), whereas variants at 15q25.3 (e.g., rs11638062) have a similar impact on patients in different age groups (overall P = 2.9 × 10-7). Our analysis highlights the impact of age on genetic susceptibility to ALL in Chinese patients.


Subject and genotyping
Peripheral blood was obtained from 1,466 non-ALL controls, as well as 466 B-linage ALL patients (381 childhood [0-14 yrs] and 85 adult patients [14-68 yrs] who were treated with standard protocol in West China Hospital of West China Second Hospital (e.g., CCGC-ALL2015, registered in http://www.chictr.org.cn/ with ID: ChiCTR-IPR-14005706). Clinical information was obtained from the record system at our hospitals, including gender, age at diagnosis, and molecular subtypes. Fusion-based molecular subtypes were determined by fluorescence in situ hybridization.
A total of 811,852 SNPs were genotyped with Precision Medicine Research Array (ThermoFisher) and filtered based on minor allele frequency, call-rate, Hardy-Weinberg equilibrium, etc., according to previous standard steps [8]. Subsequently, imputation was conducted by well-established methods (i.e., Michigan Imputation Server [32]) with the filtered SNPs. After setting r 2 = 0.5 as a cutoff threshold, 9,466,286 SNPs were finally used for subsequent association analysis.

Statistical analysis
Four GWAS approaches were conducted: all patients vs. non-ALL control, pediatric patients (< 14yrs) vs. non-ALL controls, adult patients vs. non-ALL controls, and pediatric patients vs. adult patients. For the reported loci, subtype-specific associations were evaluated. For statistical analysis, the association of SNP genotypes with the indicated phenotypes (e.g., ALL susceptibility of all-age patients) were estimated by comparing the genotype frequency between ALL cases and non-ALL controls, or different age groups with logistic regression model after adjusting for gender and the top three principal components. P value, odds ratio (OR) and 95% confidence interval (95% CI) was estimated by using PLINK (version 1.90) [33].

RESULTS
To investigate the inherited predispositions to ALL in the Chinese population, GWAS was performed with all imputed SNPs after stringently filtering. A total of 1,466 non-ALL controls and 466 B-linage ALL patients were included in this study with baseline characteristics illustrated in Table 1, including 381 pediatric (< 14 yrs] and 85 adult patients (≥ 14yrs). For childhood ALL (< 14yrs), only one locus (i.e., ARID5B) reached genome-wide significance (P < 5 × 10 -8 ) (Figure 1), suggesting no novel strong genetic predisposition to ALL in the Chinese population with current sample size. Subsequently, we retrieved the association results for all 21 SNPs at 17 reported loci from ALL patients and non-ALL controls. In pediatric patients, a total of 7 SNPs at 6 loci were significantly associated with ALL susceptibility regardless of molecular subtypes, including SNPs at ARID5B, IKZF1, BMI-PIP4K2A, CEBPE, CDKN2B-AS1, and BAK1 ( Since the significance of some GWAS signals is greatly impacted by clinical features, we thus estimated their associations with ALL susceptibility in Chinese patients considering ethnicity, age, and molecular subtype. The best example for ethnic specificity is the causal missense variant (i.e., rs3731249) in CDKN2A. Risk allele frequency (RAF) of rs3731249 is absent (0%) in our cohort, which is consistent with that in a public database (i.e., 0% in East Asian vs. 3.3% in Caucasians according to gnomAD [34]) (Figure 2A), and thus perfectly explain the racial difference at this locus. The insignificance of rs17481869 at 2q22.3 locus and rs76925697 at 9q21.31 locus can also be explained by ethnicity specific risk allele frequency. In pediatric ALL patients, the novel Hispanic-specific ALL risk signal at ERG locus exhibited marginally significant association with ALL susceptibility in Chinese patients (P = 0.07, OR = 1.18 [0.99-1.41]), and reached statistical significance in patients without common fusion (P = 0.02, OR = 1.29 [1.04-1.61]), which has been validated in previous study [28]. However, rs1121404 (P = 0.34, OR = 1.13 [0.88-1.45]) at WWOW identified in Chinese specific GWAS cannot be validated in our cohort. Association analysis was next performed in different genetic subtypes. Although association of rs7088318 at BMI-PIP4K2A locus with ALL susceptibility did not reach statistical significance in the whole patient cohort (P = 0.21, OR = 1.11 [0.94-1.31]), risk allele of this SNP was enriched in Figure 1. GWAS results of ALL susceptibility in all-age Chinese patients. Association between SNPs and ALL was evaluated in 466 ALL cases and 1,466 non-ALL controls. P value was estimated by logistic regression test and -log10 P (y-axis) were plotted against the respectively chromosomal position of each SNP (x-axis). Only genotyped but not imputed SNPs were illustrated.  was observed in the TCF3-PBX1 subtype at rs2836365. In contrast, ETV6-RUNX1 subtype specific SNPs rs10853104 at IGF2BP1 locus (P = 0.53, OR = 1.08 [0.84-1.39]) at 17q21.32 cannot be validated in our cohort even after considering different subtypes.

AGING
Next, we evaluated the impact of age on genetic predisposition. Among all the reported GWAS loci, four were significantly associated with age at diagnosis, namely ARID5B, GATA3, BAK1 and CEBPE (Table 2). Particularly, all signals lost their association with ALL susceptibility in adults except rs3824662 at GATA3 locus.
As  Figure 2B. To investigate the impact of age on ALL susceptibility, we compared allele frequencies of each SNP between pediatric and adult patients with GWAS approach by using logistic regression model (λ=1.03 for the quantile-quantile plot). Although no locus reached genome-wide significance, one novel locus was identified at 2q14.3 with the top signal of rs73956024 (P = 3.6 × 10 -6 ) ( Figure 3A and Table 3). Additionally, to identify novel ALL susceptibility loci in different age of Chinese patients, we also preformed GWASs in pediatric and adult patients separately. Only ARID5B locus reached genome-wide significance in pediatric patients, but none for adult patients, probably because of the small sample size. Interestingly, signals at 2q14.3 locus, which was described above, also ranked the top in adult (e.g., rs73956024, P = 4.5 × 10 -5 , OR = 2.31 [1.55-3.45]) but not significant in pediatric patients ( Figure 3B and  3C and Table 3). However, validation in independent patient cohorts is needed due to the small sample size of our study.

DISCUSSION
Most inherited dispositions to ALL have been revealed in Caucasians through genome-wide approaches but limited in Chinese patients. Although subsequent validations have been proceeded for the early identified loci (e.g., ARID5B), the impact of the novel loci identified recently  with a large sample size has not been evaluated, particularly those ages-, ethnicity-and subtype-specific variants. In this study, we systematically investigated the reported GWAS signals for ALL susceptibly in all-age Chinese patients, and estimated the impact of clinical features, particularly age at diagnosis on ALL susceptibility. A total of 11 SNPs located at 9 out of 17 loci can be validated in Chinese patients in the whole cohort or specific subgroup. The inconsistency is probably due to the racial difference of inherited predispositions to ALL and the small sample size (particularly for some molecular subtypes). Except for the absence of missense variants in CDKN2A (i.e., rs3731249), causal variants at 2q22.3 and 9q21.31 loci may be tagged by other variants rather than rs17481869 and rs76925697, risk alleles of which are absent in East Asia. Therefore, we checked the linkage equilibrium (LD) block of these two SNPs in our cohort. No statistically significant signal was identified, suggesting racial specificity of these two loci. For ETV6-RUNX1 subtype specific locus in IGF2BP1, although only 77 patients carried ETV6-RUNX1 fusion, the risk allele frequency of rs10853104 has no obvious difference between patient and non-ALL controls (12.3% vs. 12.4%), suggesting that insignificance of this locus is probably induced by ethnic specificity rather than the small sample size. Moreover, similar trends were also observed for the rest of the insignificant variants, while enlarging sample size may increase the statistical power for evaluating the significance of some subgroup specific loci, such as ERG and LHPP (P = 0.07 and 0.08 in all pediatric patients).

AGING
For the GWAS approach, no novel locus has been identified, arguing for a larger sample size to identify potential novel susceptibility locus in Chinese patients in the future. However, we identified a potential novel locus (i.e., 15q25.3) that may have an impact on the susceptibility of all-age patients. Since we do not have an independent replication cohort, we checked the association of this locus in GWAS in previous reports with multi-ethnic populations to validate this signal [8]. Marginally significant was observed for the top signal at this locus (i.e., rs11638062) with P = 0.09. Interesting, another SNP at this locus (i.e., rs16977928 with P = 2.3 × 10 -5 in our all-age GWAS of ALL susceptibility), which is in moderate LD with rs11638062 (r 2 = 0.27, D' = 0.66) in Caucasians, exhibits statistical significance in multi-ethnic population (P = 0.007). After considering ethnicity, despite the association trend in Caucasians and blacks, rs16977928 is only significant in Hispanics (P = 0.01), who are a mixture between Native American and Caucasians. Since the ancestors of Native Americans are considered to descend from East Asians [35], the causal variant for this locus may exhibit an ethnicity-specific manner in the East Asian population. Moreover, rs11638062 is located in the AGBL1 gene, polymorphism in which was also associated with lung cancer risk in the Chinese population, suggesting its potential role on tumorigenesis [36]. On the other hand, rs73956024 is located in an enhancer region upstream of HS6ST1 in B cells according to the public resource [37], and thus could be considered as a possible eQTL to possibly impact the expression level of the adjacent genes, including HS6ST1.
In the case of the impact of age on ALL, the difference of risk allele frequencies for the reported GWAS loci between patients and non-ALL controls decreased in adults compared with that in childhood except signals at GATA3 locus, suggesting the majority of the known GWAS signals are age-specific for pediatric patients. Therefore, we conducted the first GWAS approach to screen ALL susceptibility locus in adult patients. rs3824662 at GATA3 locus exhibits association at candidate level rather than genome-wide significance. On the other hand, a novel locus at 2q14.3 not only exhibits the most significant association with susceptibility in adult patients but also has the strongest impact on age at diagnosis, suggesting its potential role on age-specific leukemogenesis. Although no known gene located in the LD region of these SNPs, multiple ENCODE candidate cis-regulatory elements were identified, indicating the possible epigenetic effect of the causal variant in this region. Moreover, due to the limited samples and previous research on adult patients, validation for this locus is needed in independent cohorts with a large sample size in the future.

Data availability
The datasets generated for this study can be found in Array Express in the Genome Variation Map (GVM) database (http://bigd.big.ac.cn/gvm) with the accession number of GVM000060.

Ethics statement
This study was approved by the Ethics Committee of West China Second Hospital, Sichuan University, and informed consent was obtained from patients or their guardians, as appropriate.

AUTHOR CONTRIBUTIONS
JZ, YaZ, and XL designed and supervised this study. QH, MC, CZ, SZ, DY, YgZ, and YuW conducted the experiments and data analyses, and interpreted the data. YxY, YeW, YpZ, BY, LW, KC, YfY, CX, and JG collected the clinical information. XL contributed to the conception of the study and drafted the manuscript. All authors contributed to writing of the manuscript and approved the final manuscript.