Assessing the impact of arsenic metabolism efficiency on DNA methylation using Mendelian randomization

Supplemental Digital Content is available in the text.


Introduction
Exposure to arsenic from consumption of naturally contaminated drinking water impacts >100 million people globally, including ~50 million in Bangladesh. 1 Arsenic is a toxic and carcinogenic metal, and consumption of drinking water with arsenic concentrations above 50-100 µg/L is associated with risk for several cancer types in multiple populations. 2,3 Arsenic exposure is also associated with increased risk of cardiovascular diseases, 4,5 respiratory disease, 6 diabetes, 7 nonmalignant lung disease, 8 and increased overall mortality. 9 Chronically exposed Background: Arsenic exposure affects >100 million people globally and increases risk for chronic diseases. One possible toxicity mechanism is epigenetic modification. Previous epigenome-wide association studies (EWAS) have identified associations between arsenic exposure and CpG-specific DNA methylation. To provide additional evidence that observed associations represent causal relationships, we examine the association between genetic determinants of arsenic metabolism efficiency (percent dimethylarsinic acid, DMA%, in urine) and DNA methylation among individuals from the Health Effects of Arsenic Longitudinal Study (n = 379) and Bangladesh Vitamin E and Selenium Trial (n = 393). Methods: We used multivariate linear models to assess the association of methylation at 221 arsenic-associated CpGs with DMA% and measures of genetically predicted DMA% derived from three SNPs (rs9527, rs11191527, and rs61735836). We also conducted two-sample Mendelian randomization analyses to estimate the association between arsenic metabolism efficiency and CpG methylation. Results: Among the associations between DMA% and methylation at each of 221 CpGs, 64% were directionally consistent with associations observed between arsenic exposure and the 221 CpGs from a prior EWAS. Similarly, among the associations between genetically predicted DMA% and each CpG, 62% were directionally consistent with the prior EWAS results. Two-sample Mendelian randomization analyses produced similar conclusions. Conclusion: Our findings support the hypothesis that arsenic exposure effects DNA methylation at specific CpGs in whole blood. Our novel approach for assessing the impact of arsenic exposure on DNA methylation requires larger samples in order to draw more robust conclusions for specific CpG sites.
individuals maintain high risks of arsenic-associated diseases and mortality for several decades. 10 Thus, an understanding of arsenic toxicity mechanisms is needed to better assess risk and develop effective prevention and treatment strategies for arsenic-associated diseases and health effects.
The mechanisms of arsenic toxicity are complex and remain to be fully elucidated. Prior studies suggest that arsenic alters DNA methylation, and arsenic-associated DNA methylation alterations may be an important mechanism of arsenic toxicity. 11 DNA methylation is an epigenetic modification that primarily occurs at cytosine nucleotides within CpG sites (CpGs) and reflects chromatin conformation and regulation of gene expression. Prenatal arsenic exposure is associated with CpG methylation in cord blood according to numerous studies, [12][13][14][15][16] and differential methylation in blood has been found in adult Bangladeshi, American, and Argentinian populations with a wide range of exposure levels (measured in urine, from ~ 10 μg/g creatinine to >1600 μg/g creatinine). [17][18][19][20] A recent epigenome-wide association study (EWAS) of Bangladeshi adults identified associations between urinary arsenic and methylation at >200 CpGs in whole blood. 21 These CpGs are enriched in genes belonging to cancer, aging, inflammation, and toxicant response pathways. 21 While a mechanism of arsenic-induced carcinogenesis involving DNA methylation changes is plausible, we do not know that associations observed in EWAS represent causal effects of arsenic on the epigenome.
Arsenic undergoes biotransformation by arsenite methyltransferase (AS3MT), and arsenic metabolism efficiency modulates internal dose of arsenic by facilitating excretion of arsenic in urine. AS3MT converts inorganic arsenic (iAs) via methylation reactions into monomethylarsonic acid (MMA) and then dimethylarsinic acid (DMA). 22,23 The proportion of DMA among all arsenic species in urine (DMA%) is a measure of arsenic metabolism efficiency. 24 These methylated species, especially DMA, are more readily excreted in urine than iAs, and higher DMA relative to MMA in urine is associated with lower arsenic toxicity risk. [25][26][27][28] Therefore, arsenic metabolism efficiency reflects variability in the internal arsenic dose for individuals at a given level of exposure. Because DMA% has known genetic determinants, it is possible to use a Mendelian randomization (MR) approach to estimate associations between DMA% and methylation at specific CpGs that are not biased due to confounding by environmental or lifestyle factors. Genetic determinants of arsenic metabolism can be utilized as instrumental variables (IVs); and given that these IVs (1) affect arsenic metabolism and (2) are associated with CpG methylation exclusively through their effect on arsenic metabolism, MR provides accurate effect estimates. 29 If these conditions are satisfied, MR estimates avoid biases that may affect associations with directly measured arsenic metabolism, such as confounding or reverse causation. Genome-wide association (GWA) and candidate gene studies have identified independent associations of variants in the 10q24.32/AS3MT region, rs9527 and rs11191527, with DMA% among arsenic-exposed Bangladeshi individuals. 24,30,31 DMA% is also associated with rs61735836 in exon 3 of FTCD (Formiminotransferase cyclodeaminase), which has a catalytic role in the one-carbon/folate cycle, a source of methyl groups for arsenic methylation. 32 In this study, we use an MR approach to evaluate the relationship between arsenic metabolism efficiency and methylation of arsenic-associated CpGs identified in prior EWAS. 21 Efficient metabolism of arsenic (i.e., high DMA%) should reduce the internal dose of arsenic; thus, we hypothesize that DMA% will be associated with the CpGs discovered in EWAS in a direction opposite to that of arsenic exposure. Using data on 379 Bangladeshi participants in the Health Effects of Arsenic Longitudinal Study (HEALS) and 393 participants in the Bangladesh Vitamin E and Selenium Trial (BEST), 33,34 we obtain MR-based estimates ( Figure 1) of the association between arsenic metabolism efficiency and each arsenic-associated CpG, using genetic determinants of DMA% as IVs. 35 Our approach represents a novel strategy to examine the toxicological relevance of molecular features showing association with environmental exposures, such as DNA methylation, in observational studies.

Study participants
Subjects analyzed in this article were participants in one of the two following studies: the HEALS and the BEST. DNA samples were obtained at baseline from participants in both cohorts. HEALS (described previously in detail) is a prospective longitudinal study of health outcomes associated with chronic arsenic exposure in Araihazar, Bangladesh, with 11,746 adults (11,224 with arsenic measurements) enrolled at the original baseline visit (age 18-75 years). 33 Trained study physicians (blinded to arsenic measurements) conducted in-person interviews, clinical evaluations, and collection of urine and blood samples using structured protocols. 36 BEST is a 2 × 2 factorial randomized chemoprevention trial assessing the effects of vitamin E and selenium dietary supplements on skin cancer risk among Bangladeshi individuals (n = 7,000, age 25-65 years) with arsenical skin lesions. 37 Many of the study protocols in BEST, including sample collection and Figure 1. Causal diagram depicting the relationships among 10q24.32 and FTCD genetic variants, arsenic methylation efficiency, and methylation levels of CpG sites in an arsenic-exposed population. The causal relationship between the 10q24.32 and FTCD variants and arsenic methylation permits the use of these variants as potential IVs in our MR analysis. exposure assessment, were identical to those in HEALS. 24 This article uses DNA methylation data from a recent EWAS of 396 randomly selected adults from HEALS, and from a prior study of 400 BEST participants. 19

Exposure assessment
Arsenic was measured in urine at baseline for HEALS and BEST participants and in drinking water for HEALS participants. At baseline, each HEALS participant identified the well used as their primary source of drinking water. Urinary and water arsenic were measured using graphite furnace atomic absorption spectrometry in a single laboratory. 38 Total urinary arsenic concentration was divided by creatinine to compute creatinine-adjusted arsenic (micrograms/gram creatinine). 39 In HEALS participants, urinary arsenic metabolites were distinguished via high-performance liquid chromatography, followed by detection using inductively coupled plasma-mass spectrometry with dynamic reaction cell. 26 The percentages of iAs, MMA, and DMA among all arsenic species were calculated after subtracting arsenobetaine and arsenocholine (nontoxic organic arsenic from dietary sources) from total arsenic.

Genotyping
DNA extraction and quality control for HEALS and BEST blood samples have been described in the respective studies in which the 10q24.32 and FTCD variant associations with arsenic metabolism efficiency were discovered. 24,32 Genotypes for the 10q24.32 and FTCD variants were obtained from Illumina HumanCytoSNP-12 v2.1 chips and Illumina's exome array v1.1, respectively. Genotypes for rs9527 and rs11191527 were available for 379 individuals in HEALS and 393 in BEST, and genotypes for rs61735836 were available for 340 individuals in HEALS and 383 in BEST (Table 1).

DNA methylation
DNA methylation was measured on the Illumina EPIC array for 396 HEALS participants and 450K array for 400 BEST participants. 21,37 Preprocessing and normalization of these arrays has been described previously. 21 Briefly, beta mixture quantile (BMIQ) normalization was applied to reduce type I/II probe bias, and an EWAS of log 2 -transformed urinary arsenic was conducted in both HEALS and BEST using models adjusted for age, sex, smoking status, BMI, and surrogate variables. 40 The METAL software was used to conduct a meta-analysis using inputted summary statistics from 390,810 CpGs measured in both HEALS and BEST. 41 Analyses of the EPIC array data from HEALS participants identified associations between log 2 -transformed urinary arsenic and 34 CpGs at FDR = 0.05 (eTable 1; http://links.lww.com/EE/A77). From the meta-analysis of both HEALS and BEST, we identified 221 CpGs associated with log 2 -transformed urinary arsenic at FDR = 0.05 (eTable 2; http://links.lww.com/EE/A77).

Estimation of GP-DMA%
We generated a weighted SNP score representing genetically predicted DMA% (GP-DMA%) as a measure of arsenic metabolism efficiency. β coefficients for associations of rs9527, rs11191527, and rs61735836 with DMA% were obtained from prior GWA study in HEALS. 24,32 The β coefficient for the association of each SNP with DMA% (Table 2) was multiplied by its high-efficiency allele (effect allele) count, and the score was computed as the sum of these products. GP-DMA% captured the increase in DMA% relative to an arbitrary baseline DMA%. Let X j be the high-efficiency allele count of each jth SNP, and β Xj be the effect size of the association between SNP j and DMA%. Then: In a supplementary analysis, we defined a binary score indicating whether each individual carried only high-efficiency alleles at both rs9527 and rs61735836 (or if they carried at least one low-efficiency allele at one or both of these SNPs). This strategy avoided specifying weights, rather relying on the two SNPs with (1) large effect sizes (compared with rs11191527) and (2) low frequency of the low-efficiency allele ( Table 2).

Statistical methods
Associations of CpG methylation with DMA%, 10q24.32 and FTCD SNPs, and GP-DMA% were estimated by linear regression. Because of small sample sizes of minor allele homozygotes among our SNPs (at most 2.3% of the sample), these individuals were combined with heterozygotes and classified as minor allele carriers for analyses of single SNPs. Each SNP genotype was coded as a binary variable for higher effect allele count. Because of moderate LD between rs9527 and rs11191527, these SNPs were included together as covariates in individual SNP analyses. All regressions included age, sex, methylation batch, and cigarette smoking status (never, former, and current smoker) as covariates. Analyses in which HEALS and BEST were combined were adjusted for cohort (HEALS or BEST). Regressions with DMA% as a predictor (restricted to HEALS) also included BMI, years of education, and log-transformed water arsenic as covariates, to adjust for potential confounders of this non-genetic variable. As an additional adjustment for confounding due to arsenic exposure levels, associations between log-transformed DMA and CpG methylation were adjusted for log-transformed urinary arsenic and log-transformed urinary creatinine in addition to the same covariates as in DMA% regressions.
Due to low power for testing the association of SNPs (or DMA%) with any specific CpG site, we tested the hypothesis that the set of arsenic-associated CpGs tended to associate with DMA% (and with genotypes that impact DMA%) in directions consistent with the CpGs' associations with arsenic. We applied binomial tests to binary variables representing the directional consistency of association for each CpG site. For each CpG, the estimate of association with DMA% or GP-DMA% (or the MR estimate) was considered consistent with prior EWAS if the β coefficient had a direction opposite to that of the previously reported association between arsenic exposure and CpG methylation. In other words, we hypothesized that if arsenic metabolism efficiency impacts DNA methylation, increased DMA% (or GP-DMA%) would lower the internal arsenic dose, thus the direction of association for each arsenic-associated CpG site would be opposite to that of arsenic exposure (Figure 2). Under the null hypothesis, the number of CpGs showing a direction of association consistent with the prior EWAS is equal (or less than) to the number of CpGs with a direction of association that is inconsistent with the prior EWAS. We report one-sided P-values (from a binomial test) that correspond to the alternative hypothesis the number of CpG showing an association directionally consistent with prior EWAS is greater than the number of CpGs showing a directionally inconsistent association.

Mendelian randomization
MR estimates were computed using the "Mendelian randomization" R package, 42 using rs9527, rs11191527, and rs61735836 as IVs in the inverse-variance weighted (IVW) MR method. Maximum likelihood MR was also conducted as a sensitivity analysis. Effect sizes (β Xj ) and standard errors (se(β Xj )) for associations between each jth SNP and DMA% were derived from prior publications from HEALS (Table 2). 24,32 For SNP ~ CpG associations, effect estimates (β Yj ) and standard errors (se(β Yj )) were obtained from the regression analyses conducted in this study.

Participant characteristics
Genotyping and methylation data were available for 379 HEALS participants and 393 BEST participants. The median age in HEALS was 37 (IQR: 30, 43) and 44 years in BEST (IQR: 35, 51), with 57.8% women in HEALS and 46.8% women in BEST (Table 1). In HEALS, 33.8% were current smokers and 5.5% former smokers; and in BEST, 27.2% were current smokers and 10.2% former. The median urinary arsenic (adjusted for creatinine) was 200 µg/g (IQR: 109.5, 344) in HEALS and 137 µg/g (IQR: 76, 395) in BEST. Water arsenic and DMA% data are not available for BEST participants.

Mendelian randomization
We used a two-sample MR approach based on summary statistics to obtain an MR-based estimate of the association between DMA% and CpG methylation using the 10q24.32 and FTCD variants as IVs for DMA%, in the combined HEALS and BEST cohort (n = 772). IVW-MR association estimates for 134 out of 221 CpGs were consistent in direction with arsenic exposure associations (binomial P = 0.0010) ( Table 3; Figure 3C). When we used the maximum-likelihood MR method as a sensitivity analysis, the effect estimates for 134 CpGs were consistent in direction with arsenic associations (P = 0.0010) (eTable 3; http://links.lww.com/EE/A77). Plots for CpGs with the strongest MR effect estimates are included in the Supplement (eFigure 4; http://links.lww.com/EE/A77). When analyses were restricted to 41 CpGs whose associations passed a Bonferroni P-value threshold in EWAS, 27 were consistent (P = 0.03) (eTable 5; http://links.lww.com/EE/A77). To assess the sensitivity of our MR results to the method of analysis via summary statistics, we also tested associations of CpG methylation levels with individual effect allele counts of 10q24.32 and FTCD SNPs (see Methods). Among 221 CpGs, associations between allele scores and CpG methylation were consistent with arsenic exposure associations in 132 CpGs for rs9527 (P = 0.0023), 120 for rs11191527 (P = 0.11), and 131 for rs61735836 (P = 0.0035) (eTable 6; http://links.lww.com/EE/A77, eFigure 5; http://links. lww.com/EE/A77).

Discussion
In this study, we used an MR approach to provide evidence that supports the hypothesis that arsenic metabolism efficiency is causally related to DNA methylation measured in whole blood at specific CpG sites previously reported to be associated with urinary arsenic. Using a sample of 379 Bangladeshi adults from the HEALS cohort, we demonstrate that the directions of association between DMA% and DNA methylation levels at many arsenic-associated CpG sites tend to be in the opposite direction of the association between urinary arsenic and the same CpGs. Additionally, genetically predicted DMA% (combining Figure 3. Associations of DMA% and genetically predicted DMA% with DNA methylation at 221 arsenic-associated CpGs discovered in meta-analysis. β coefficients for associations of (A) DMA% and (B) weighted GP-DMA% with CpG sites, as well as (C) MR-based association estimates for each CpG site, were considered consistent with associations between ln(creatinine-adjusted urinary As) and CpG sites (from prior EWAS) if their signs were opposite. Study populations were HEALS (n = 379) for DMA% analyses, a subset of the combined cohort with full genotypic data (n = 723) for weighted GP-DMA%, and a combined cohort of HEALS and BEST (n = 772) for MR.
information across multiple SNPs) tends to be associated with these CpGs in directions opposite to that of arsenic exposure. Finally, the MR estimates for arsenic-associated CpGs tend to be consistent with the associations between these CpGs and arsenic exposure. These observations support the hypothesis that efficient metabolism of arsenic (i.e., high DMA%) reduces an individual's internal arsenic dose and should reduce the impact of arsenic on CpGs to which arsenic is causally related. While the genetics of arsenic metabolism and the associations of arsenic exposure with DNA methylation have been studied previously, in this study, we use the unique approach of applying MR to estimate the association between arsenic metabolism efficiency and methylation at specific CpG sites. The random assignment of study participants' genotypes minimizes the risk of bias in the association of arsenic metabolism with CpG methylation due to confounding (under the assumption that AS3MT and FTCD genotypes are associated with CpG methylation only through their effect on arsenic metabolism). This article does not provide strong evidence that arsenic metabolism efficiency affects methylation at any specific CpG site (largely due to lack of statistical power for tests of individual CpGs). These results may not generalize to tissue types other than blood or to other arsenic-exposed populations.
Although the binomial tests of directional consistency are consistent with a causal influence of arsenic metabolism on some fraction of the CpG sites tested, it is unclear if the assumptions of MR are fully satisfied for any individual CpG. MR provides valid evidence supporting causal effects if IVs are (1) associated with the risk factor (DMA%), (2) not associated with confounders of the association between the risk factor and outcome (CpG methylation), and (3) independent of the outcome conditional on the risk factor and confounders. 29 The third assumption (exclusion restriction) requires that the impact of 10q24.32 and FTCD variants on methylation levels of the CpG site be completely mediated by arsenic metabolism efficiency, and this assumption cannot be proven. 29 Because only three genetic variants are known to be independently associated with DMA%, the number of IVs is insufficient to conduct Egger regression and test for violations of the exclusion restriction. However, the role of AS3MT in arsenic metabolism is well-established, providing strong biologic plausibility for a specific effect of these SNPs on arsenic metabolism, even though the SNPs' biologic mechanisms are not completely understood. The mechanism linking FTCD to arsenic metabolism is less clear, but it is potentially related to folate metabolism and the availability of methyl groups for arsenic metabolism. 32 Given the biologic plausibility of these IVs with respect to arsenic metabolism and the consistency of our results between AS3MT SNP rs9527 and FTCD SNP rs61735836, we feel that violations of the MR assumptions in this work are unlikely. 31 Finally, the directions of association for rs11191527 with CpGs were not skewed either positively or negatively. This result may call into question the validity of rs11191527 as an IV. However, the power of our study for detecting associations between rs11191527 and each CpG is limited by the relatively weak association of rs11191527 with DMA%.
An additional limitation of our MR approach is that this approach cannot identify critical periods of this life course in which arsenic exposure affects DNA methylation, nor can we estimate the latency period of such effects. Applying MR only allows us to estimate the impact of lifelong differences in arsenic metabolism efficiency on DNA methylation.
The results of this article build upon those of the EWAS in which these arsenic-associated CpGs were identified, providing evidence for a potential mechanism for arsenic-induced carcinogenesis. Enrichment analyses in the prior EWAS have shown that CpGs annotated to CpG shores, DNase I hypersensitive sites, and enhancers are over-represented among arsenic-associated CpGs identified in the meta-analysis of HEALS and BEST. 21 These genomic regions have important roles in transcription regulation, and DNA methylation in these regions is more variable and may be more sensitive to environmental exposures. 43 This results from this article provide additional evidence to supporting studies to further investigate arsenic-associated DNA methylation as a mediator between arsenic exposure and associated diseases and health effects.
This study provides evidence suggesting that a substantial fraction of the associations between arsenic exposure and whole blood DNA methylation observed in prior EWAS represent a causal impact of arsenic, mediated by arsenic metabolism efficiency, on the epigenome. Our MR approach in the context of EWAS is particularly novel, providing an example that can inform future attempts at causal inference in this field and studies of potential molecular effects of environmental exposures. Future studies using larger sample sizes obtained from multiple arsenic-exposed populations should provide more robust MR evidence supporting causal effects of arsenic on specific CpG sites and a more comprehensive understanding of the mechanisms of arsenic-associated disease risk.