Methylation Analysis Reveals Fundamental Differences Between Ethnicity and Genetic Ancestry

Joshua M. Galanter, MD, MAS1,2,3, Christopher R. Gignoux, PhD4, Sam S. Oh, PhD1, Dara Torgerson, PhD2, Maria Pino-Yanes, PhD5,6, Neeta Thakur, MD, MPH1, Celeste Eng, BS1, Donglei Hu, PhD1, Scott Huntsman, MS1, Harold J. Farber, MD7, Pedro C Avila, MD8, Emerita Brigino-Buenaventura, MD9, Michael A LeNoir, MD10, Kelly Meade, MD11, Denise Serebrisky, MD12, William Rodríguez-Cintrón, MD13, Raj Kumar, MD14, Jose R Rodríguez-Santana, MD15, Max A. Seibold, PhD17, Luisa N. Borrell, DDS, PhD16, Esteban G. Burchard, MD, MPH1,2*, Noah Zaitlen, PhD1*


Introduction
Race, ethnicity, and genetic ancestry have had a complex and often controversial history within biomedical research and clinical practice 1,2 . For example, race-and ethnicityspecific clinical reference standards are based on an average derived from statistical modeling applied to population-based sampling on a given physical trait such as pulmonary function 3,4 . However, because race and ethnicity are social constructs, they ignore the heterogeneity within the categories 5 . To account for these heterogeneities and avoid social and political controversies, the genetics community has integrated the use of genetic ancestry as a proxy for race and ethnicity because genetic sequence is not altered by environmental or social factors, such as those related to racial or ethnic identity. Indeed, recent work from our group and others have demonstrated that genetic ancestry improves diagnostic precision compared to crude categorizations of racial/ethnic assignment for specific medical conditions and clinical decisions [6][7][8] .
Herein, we propose that ethnicity, above and beyond genome-wide ancestry, could be correlated to variation in methylation, a fundamental biological process.
Epigenetic modification of the genome through methylation plays a key role in the regulation of diverse cellular processes 9 . Changes in DNA methylation patterns have been associated with complex diseases, including various cancers 10 , cardiovascular disease 11,12 , obesity 13 , diabetes 14 , autoimmune and inflammatory diseases 15  CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint factors have also been shown to affect methylation levels, including endocrine disruptors, tobacco smoke 19 , polycyclic aromatic hydrocarbons, infectious pathogens, particulate matter, diesel exhaust particles 20 , allergens, heavy metals, and other indoor and outdoor pollutants 21 . Psychosocial factors, including measures of traumatic experiences 22-24 , socioeconomic status 25,26 , and general perceived stress 27 , also affect methylation levels. Given the roles of both genetic and environmental influences upon methylation, we leveraged genome-wide methylation data as an intermediate phenotype to examine the degree to which self-identified ethnicity and genetic ancestry are reflected in differences in methylation. We hypothesized that while genetic ancestry can explain many of the differences in methylation between these groups, some ethnic-specific methylation differences reflecting social and environmental differences between groups, would remain. We further examined the relationship between genome-wide (global) estimates of ancestry and locus-specific (local) ancestry to determine the extent to which associations between global ancestry and methylation are reflective of genetic factors acting in -cis. Finally, by using dense genotyping arrays, we queried whether methylation differences associated with ancestry can be traced back to meQTLs whose allele frequencies differ by ancestry. To address these aims, we analyzed data from 573 Latino children according to their national origin identity or ethnic subgroup (such as Puerto Rican and Mexican), enrolled in the Genes-Environments and Admixture in Latino Americans (GALA II) study of childhood asthma 28 .

Results
The study included 573 participants, the majority of whom self-identified as being either of Puerto Rican (n = 220) or Mexican origin (n = 276). Table 1 displays baseline   . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint characteristics of the GALA II study participants with methylation data included in this study, stratified by ethnic subgroups (Puerto Rican, Mexican, Other Latino, and Mixed Latinos who had grandparents of more than one national origin). Among the 524 participants with genomic ancestry estimates, European ancestry represented slightly over 50% of the average participant's ancestry, while Native American ancestry had the largest inter-quartile range. There were significant differences in ancestry between ethnic subgroups; Mexicans as a whole had a greater proportion of Native American Ancestry while Puerto Ricans had a significantly greater proportion European and African ancestry [ Table 1 and Supplementary Figure 1].

Global patterns of methylation
We first examined whether differences in ethnicity and ancestry resulted in discernible patterns in the global methylation profile by performing multidimensional scaling analysis (Supplementary Figure 2A). We tested for association of each of the first ten . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint To determine the extent to which the PCs-ethnicity associations were driven by genetic ancestry, we preformed a mediation analysis. The associations between ethnicity and PCs 3, 7, and 8 were significantly mediated by Native American ancestry (mediation p = 0.01, <0.001, and <0.001, respectively) and inclusion of Native American ancestry in the regression model of PCs 3, 7, and 8 caused the ethnicity associations to be nonsignificant. However, the associations of ethnicity with PCs 2 and 6 were not explained by Native American, African or European ancestry (mediation p > 0.05), suggesting that ethnic differences are associated with global methylation patterns beyond genetic differences between ethnic groups. When genetic ancestry was regressed on the methylation data, and the principal coordinates coordinates were recalculated using the residuals of the regression, there was an association between ethnicity and PC6 (p-ANOVA = 0.003). However, there was no association with any of the other principal coordinates. These observations suggest that while genetic ancestry can explain some of the association between ethnicity and global methylation patterns, other non-genetic factors, such as environmental and social exposure differences associated with ethnicity, influence methylation independent of genetic ancestry.

Differences in methylation by ethnicity
We next investigated associations between ethnicity and individual loci by performing an epigenome-wide association study of self-identified ethnicity (see methods for details of ascertainment of ethnicity) and methylation. We identified a significant difference in methylation M-values between ethnic groups at 916 CpG sites at a Bonferroni-corrected significance level of less than 1.5×10 -7 [ Figure 1A and Supplementary Table 1]. The most significant association with ethnicity occurred at cg12321355 in the ABO blood group gene (ABO) on chromosome 3 (p-ANOVA 6.7 × 10 -22 ) [ Figure 1B]. A two degree of . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint freedom ANOVA test for genomic ancestry was also significantly associated with methylation level at this site (p = 2.3 × 10 -5 ), and when the analysis was stratified by ethnic sub-group, showed an association in both Puerto Ricans and Mexicans (p = 0.001 for Puerto Ricans, p = 0.003 for Mexicans). Although adjusting for genomic ancestry attenuated the effect of ethnicity, a significant association between ethnicity and methylation remained (p = 0.04). Recruitment site, an environmental exposure proxy, was not significantly associated with methylation at this locus (p = 0.5), suggesting that ethnic differences beyond geography and ancestry are driving the association. Therefore, genetic ancestry explains much of the association between ethnicity and methylation, but other non-genetic factors associated with ethnicity could explain the ethnicity-associated methylation changes that cannot be accounted for by genomic ancestry alone. Environmental differences between geographic locations or recruitment sites are a potential non-genetic explanation for ethnic differences in methylation. We investigated the independent effect of recruitment site on methylation by analyzing the associations between recruitment site and individual methylation loci after adjusting for ethnicity. We did not find any loci significantly associated with recruitment site at a significance threshold of 1.6 x 10 -7 . We then performed an analysis to assess the effect of . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint recruitment sites on methylation stratified by ethnicity. We did not find any loci significantly associated with recruitment site and methylation among Mexican participants. We were underpowered to perform a similar analysis for Puerto Ricans because there were only 27 Puerto Rican participants recruited outside of Puerto Rico.
To ensure that the absence of association in Puerto Ricans was not due to the loss of power from the smaller sample size, we repeated our analysis of the association between ethnicity and ancestry randomly down-sampling to 276 participants to match the sample size in the analysis of geography in Mexicans. While down-sampling the study to this degree resulted in a loss of power, 128 methylation sites were still associated with ancestry. We conclude that recruitment site was unlikely to be a significant confounder of our associations between ethnicity and methylation and was not a significant independent predictor of methylation.
Ethnic differences in environmentally-associated methylation sites Differences in environmental exposures between ethnic subgroups may explain some of the observed differences in methylation. To investigate this possibility, we identified CpG loci that had previously been reported to be associated with environmental exposures and whose exposure prevalence differs between ethnic groups. We then tested whether methylation at these loci was associated with ethnicity in this study. We have reported that maternal smoking during pregnancy varies significantly by ethnicity 28 .
Maternal smoking during pregnancy has also been associated with statistically significant differences in methylation at 26 CpG loci in Norwegian newborns 19 . Of these 26 loci, 19 passed quality control (QC) in our own analysis, and the association between methylation and ethnicity was found to be nominally significant at 6 CpG loci. At a more stringent Bonferroni correction adjusting for 19 tests, cg23067299 in the aryl . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint hydrocarbon receptor repressor (AHRR) gene on chromosome 5 remained statistically significant [ Table 2]. The protein encoded by AHRR participates in the aryl hydrocarbon receptor (AhR) signaling cascade, which mediates dioxin toxicity, and is involved in regulation of cell growth and differentiation. These results suggest that ethnic differences in methylation at loci known to be responsive to tobacco smoke exposure in utero may be explained in part by ethnic-specific differences in the prevalence of maternal smoking during pregnancy.
We also found that CpG loci previously reported to be associated with diesel-exhaust particle (DEP) exposure 20 were significantly enriched among the set of loci whose methylation levels varied between ethnic groups. Specifically, of the 101 CpG sites that were significantly associated with exposure to DEP and passed QC in our dataset, 31 were nominally associated with ethnicity (p < 0.05), and 5 were associated with ethnicity after adjusting for 101 comparisons (p < 0.005) [ Table 2]. Finally, we found that methylation levels at cg11218385 in the pituitary adenylate cyclase-activating polypeptide type I receptor gene (ADCYAP1R1), which had been associated with exposure to violence in Puerto Ricans 22 and with heavy trauma exposure in adults 23 , was significantly associated with ethnicity (p = 0.02).

Differences in methylation by ancestry
Our epigenome-wide association study found 194 loci with a significant association between global genetic ancestry and methylation levels at a Bonferroni corrected association p-value of less than 1.6×10 -7 [ Figure 2A and Supplementary Table 2]. Of these significant associations, 55 were driven primarily by differences in African ancestry, 94 by differences in European ancestry, and 45 by differences in Native American ancestry. The most significant association between methylation and ancestry . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint occurred at cg04922029 in the Duffy antigen receptor chemokine gene (DARC) on chromosome 1 (ANOVA p-value 3.1 × 10 -24 ; Bonferroni corrected p-value 9.9 × 10 -19 ) [ Figure 2B]. This finding was driven by a strong association between methylation level and global African ancestry; each 25 percentage point increase in African ancestry was associated with an increase in M-value of 0.98, which corresponds to an almost doubling in the ratio of methylated to unmethylated DNA at the site (95% CI 0.72 to 1.06 per 25% increase in African ancestry, p = 1.1×10 -21 ). There was no significant heterogeneity in the association between genetic ancestry and methylation between Puerto Ricans and Mexicans (p-het = 0.5). Mexicans have a mean unadjusted methylation M-value 0.48 units lower than Puerto Ricans (95% CI 0.35 to 0.62 units, p = 1.1 × 10 -11 ). However, adjusting for African ancestry accounts for the differences in methylation level between the two sub-groups (p-adjusted = 0.4), demonstrating that ethnic differences in methylation at this site are due to differences in African ancestry.
A substantial proportion of the effect of global ancestry on local methylation levels is due to local ancestry acting in -cis. Among the 194 CpG sites associated with global ancestry, local ancestry at the CpG site explained a median of 10.4% (IQR 3.0% to 19.4%) of the variance in methylation at these sites, accounting for a median of 52.8% (IQR 20.3% to 84.9%) of the total variance explained jointly by local and global ancestry [Supplementary Figure 5].

Admixture mapping of methylation
We next performed an admixture mapping study, examining the association between methylation levels at each CpG site and ancestry at the same locus. Of the 321,503 CpG's examined, methylation at 3,694 (1.1%) was significantly associated with ancestry at the CpG site at a Bonferroni corrected association p-value of less than 1.6×10 -7 . [ Figure 3A . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint and Supplementary Table 3] This included 118 of the 194 loci identified above (61%), where global ancestry was associated with methylation. The most significant CpG site was again cg04922029, which was almost perfectly correlated with African ancestry at the locus (p = 6 × 10 -162 ) [ Figure 3B]. Each African haplotype at the CpG site was associated with an increase in methylation M-value of 2.7, corresponding to a 6.5-fold increase in the ratio of methylated to unmethylated DNA per African haplotype at that locus. The second most significant association occurred at cg06957310 on chromosome 17; each increase in African ancestry at the locus was associated with a decrease in Mvalue of 1.7 (a 3.2-fold decrease in the ratio of methylated to unmethylated DNA; p = 3.7 × 10 -75 ).
Finally, we explored whether our admixture mapping results were indicative of the presence of a meQTL. For each of the admixture mapping loci, we tested whether a single nucleotide polymorphism (SNP) within 1 Mb from the CpG was associated with methylation. We found 3637 loci out of the 3694 (98.5%) admixture mapping findings with at least one SNP within 1 Mb that was significantly associated with methylation levels (after adjustment of the number of SNPs in cis-). The SNP/CpG pair were separated by a median distance of 10.9 kb (interquartile range 2.9 kb to 35.1 kb). The furthest SNP/CpG pair were 998 kb apart. The most significant SNP/CpG pair was cg25134647/rs4963867, on chromosome 12, which are separated by 412 base pairs. Each copy of the T allele was associated with a decrease in M-value of 3.58, corresponding to a nearly 12-fold decrease in the ratio of methylated to unmethylated DNA at the site. We found that CpG cg04922029 (our top admixture mapping association) was significantly correlated with SNP rs2814778 [ Figure 3C], the Duffy null mutation, 212 base pairs away; each copy of the C allele was associated with an increase . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint in M-value of 1.5, or a 2.9-fold increase in the ratio of methylated to unmethylated DNA (p = 3.8 × 10 -90 ) [ Figure 3D].

Discussion
We have shown that both genomic ancestry and self-described ethnicity independently influence methylation levels throughout the genome. While genomic ancestry can explain a portion of the association between ethnicity and methylation, genomic ancestry inadequately accounts for the association between ethnicity and methylation at 34% (314/916) of loci. These results suggest that other non-genetic factors associated with self-identified ethnicity may influence differences in methylation patterns between Latino subgroups. These factors may include social, economic, cultural, and environmental exposures.
We conclude that systematic environmental differences between ethnic subgroups likely play an important role in shaping the methylome for both individuals and populations.
Loci previously associated with diverse environmental exposures such as in utero exposure to tobacco smoke 19 , diesel exhaust particles 20 , and psychosocial stress 22 were enriched in our set of loci, where methylation was associated with ethnicity. Thus, inclusion of relevant social and environmental exposures in studies of methylation may help elucidate racial/ethnic disparities in disease prevalence, health outcomes and therapeutic response.
Our comprehensive analysis of high-density methyl-and genotyping from genomic DNA allowed us to investigate the genetic control of methylation in great detail and without the potential destabilizing effects of EBV transformation and culture in cell lines 30 . The strongest patterns of methylation are associated with cell composition in whole blood 25 .
. CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint However, the specific type of Latino ethnic-subgroups (Puerto Rican, Mexican, other, or mixed) is also associated with principal coordinates of genome-wide methylation.
Our approach has some potential limitations. It is possible that fine-scale population structure (sub-continental ancestry) within European, African, and Native American populations may contribute to ethnic differences in methylation, as we had previously reported in the case of lung function 31 . Also, our models of genetic ancestry assumed a linear effect of ancestry on methylation. Although we would expect the effects to be small, a nonlinear association or other model misspecification could have led to incomplete adjustment for genetic ancestry, and thus, led to a residual association between ethnicity and methylation. To rule out any residual confounding due to recruitment sites, we conducted an additional analysis on the effect of recruitment site on methylation both for the overall study and for the Mexican participants (the largest study population in this analysis). We observed no significant independent effect of recruitment site suggesting that confounding due to recruitment region was limited, at least within the United States. We were unable to test for the effect of geographic differences between the United States and Puerto Rico because our study included relatively few Puerto Ricans recruited outside of Puerto Rico.
The presence of a strong association between genetic ancestry and methylation raises the possibility that epigenetic studies can be confounded by population stratification, similar to genetic association studies, and that adjustment for either genetic ancestry or selected principal components is warranted. This possibility was first demonstrated in a previous analysis of the association between self-described race and methylation 32 .
However, the study only evaluated two distinct racial groups (African Americans and Whites), while the present study demonstrates the possibility of population . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint stratification in an admixed and heterogeneous population with participants from diverse Latino national origins. The tendency to consider Latinos as a homogenous or monolithic ethnic group makes any analysis of this population particularly challenging.
Our finding of loci whose methylation patterns differed between Latino ethnic subgroups, even after adjusting for genetic ancestry, suggests that any analysis of these populations in disease-association studies that do not adjust for ethnic heterogeneity is likely to result in spurious associations even if genomic ancestry is included as a covariate.
Our analysis of local genetic ancestry and methylation demonstrates that loci associated with genome-wide ancestry are driven primarily by allele frequency differences between ancestral populations in 118 out of 194 loci, suggesting that in most cases global ancestry is acting in -cis. In addition, methylation-QTLs whose allele frequencies differ between ethnic groups are found in 95% (3637/3694) of loci associated with local ancestry. Of particular interest, the most significant ancestry-associated locus, the DARC gene, harbors an association between ancestry and methylation at cg04922029, which can be entirely explained by the genotype at rs2814778, the Duffy null mutation. This mutation, which confers resistance to P. vivax malaria, has an allele frequency of 100% in In summary, the present study provides a framework for understanding how genetic, social and environmental factors can contribute to systematic differences in methylation patterns between ethnic subgroups, even between presumably closely related populations such as Puerto Ricans and Mexicans. Methylation QTL's whose allele frequency varies by ancestry lead to an association between local ancestry and methylation level. This, in turn, leads to systematic variation in methylation patterns by ancestry, which then contributes to ethnic differences in genome-wide patterns of methylation. However, although genetic ancestry has been used to adjust for confounding in genetic studies, and can account for some of the ethnic differences in methylation in this study, ethnic identity is associated with methylation independent of genetic ancestry. This may be due to social and environmental effects captured by ethnicity. Indeed, we find that CpG sites known to be influenced by social and environmental exposures are also differentially methylated between ethnic subgroups. These findings called attention to a more complete understanding of the effect of social and environmental variables on methylation in the context of race and ethnicity to fully understanding this complex process.
Our findings have profound implications for the independent and joint effects of race, ethnicity, and genetic ancestry in biomedical research and clinical practice, especially in studies conducted in diverse or admixed populations. Our conclusions may be generalizable to any population that is racially mixed such as those from South Africa, India, and Brazil, though we would encourage further study in diverse populations. As the National Institutes of Health (NIH) embarks on a precision medicine initiative, this . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint research underscores the importance of including diverse populations and studying factors capturing the influence of social, cultural, and environmental factors, in addition to genetic ones, upon disparities in disease and drug response.

Participants
Institutional review boards at University of California, San Francisco and recruitment sites approved the study, and all participants/parents provided age-appropriate written assent/consent. Latino children were enrolled as a part of the ongoing GALA II casecontrol study 28 . A total of 4,702 children (2,374 participants with asthma and 2,328 healthy controls) were recruited from five centers (Chicago, Bronx, Houston, San Francisco Bay Area, and Puerto Rico) using a combination of community-and clinicbased recruitment. Participants were eligible if they were 8-21 years of age and selfidentified as a specific Latino ethnicity and had four Latino grandparents. Asthma cases were defined as participants with a history of physician diagnosed asthma and the presence of two or more symptoms of coughing, wheezing, or shortness of breath in the 2 years preceding enrollment. Participants were excluded if they reported any of the following: (1) 10 or more pack-years of smoking; (2) any smoking within 1 year of recruitment date; (3) history of lung diseases other than asthma (cases) or chronic illness (cases and controls); or (4) pregnancy in the third trimester. Further details of recruitment are described elsewhere 28 . Latino sub-ethnicity was determined by selfidentification and the ethnicity of the participants' four grandparents. Due to small numbers, ethnicities other than Puerto Rican and Mexican were collapsed into a single category, "other Latino". Participants whose four grandparents were of discordant ethnicity were considered to be of "mixed Latino" ethnicity.
. CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint Trained interviewers, proficient in both English and Spanish, administered questionnaires to gather baseline demographic data, as well as information on general health, asthma status, acculturation, social, and environmental exposures.

Statistical Analysis
Unless otherwise noted, all regression models were adjusted for case status, age, sex, estimated cell counts, and plate and position. To account for possible heterogeneity in . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint the cell type makeup of whole blood we inferred white cell counts using the method by Houseman et al 29 . Indicator variables were used to code categorical variables with more than two categories, such as ethnicity. In these cases, a nested analysis of variance (ANOVA) was used to compare models with and without the variables to obtain an omnibus p-value for the association between the categorical variable and the outcome.
For analyses of dependent beta-distributed variables (such as African, European, and Native American ancestries), or cell proportion, n-1 variables were included in the analysis, and a nested analysis of variance (ANOVA) was used to compare models with and without the variables to obtain an n-1 degree of freedom omnibus p-value for the association between predictor (such as ancestry) and the outcome variable.
The Bonferroni method was used to adjust for multiple comparisons. For methylomewide associations, the significance threshold was adjusted for 321,503 probes, resulting in a Bonferroni threshold of 1.6×10 -7 . Analyses were performed using R version 3. We also sought to establish the extent to which global differences in methylation between Puerto Ricans and Mexicans could be explained by differences in ancestry between the two groups. We estimated the proportion of the ethnicity association that was mediated by genomic ancestry using the R package "mediation" 42 for methylation principal coordinates, which demonstrated a significant association with ethnicity.

Local patterns of methylation
We also sought to correlate ethnicity and methylation at a locus-specific level. We thus performed a linear regression between methylation at each CpG site and self-reported ethnicity (Mexican, Puerto Rican, Mixed Latino, and Other Latino), followed by a three degree of freedom analysis of variance to determine the overall effect of ethnicity on methylation. We calculated the proportion of variance in methylation explained by ethnicity and genomic ancestry at each site where ethnicity was significantly associated with methylation. To do this, we fit a model that included both ethnicity and global ancestry as well as the confounders described above and calculated the proportion of variance explained by multiplying the ratio of the variance between predictors (ethnicity and genomic ancestry) and outcome (methylation) by the square of the effect magnitude (ß).
We also examined whether differences in methylation patterns by ethnicity could be associated with known loci that had previously been reported to vary based on common environmental exposures, including maternal smoking during pregnancy 19 , diesel exhaust particles (DEP) 20 , and exposure to violence 22 . We have previously shown that exposure to these common environmental exposures or similar exposures varied by ethnicity within our own GALA II study populations 28,43,44 .
. CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint In addition, we examined the association between global ancestry and methylation across all CpG loci using a two-degree of freedom likelihood ratio test as well as by examining the association between individual ancestral components (African, European, and Native American) and methylation at each CpG site. At each site where methylation was significantly associated with genomic ancestry proportions, we determined the relative effect of global ancestry (θ) and local ancestry (γ) in a joint model by calculating the proportion of variance explained as above.

cis-Admixture mapping
To determine whether ancestry associations with methylation were due to variation in local ancestry, we performed a cis-admixture mapping study, comparing estimates of local ancestry at each CpG site with methylation at the site. Because ancestry LD is much stronger than genotypic LD, it is possible to accurately interpolate ancestry at each CpG site based on the ancestry estimated at the nearest SNPs 37,45 . Measures of locus-specific ancestry were correlated with local methylation using linear regression.
We performed a two-degree of freedom analysis of variance test evaluating the overall effect of all three ancestries as well as single-ancestry associations comparing methylation at a given locus with the number of African, European and Native American chromosomes at that CpG site.

Allelic associations
In order to determine the extent to which admixture mapping results could be explained by allelic associations, we performed a meQTL analysis at all Bonferroni-corrected significant admixture mapping associations (p < 1.6×10 -7 ), by comparing methylation at a given locus with the genotype of SNPs within 1 MB of the CpG site using an additive genotypic model, adjusted for both global and local genomic ancestry, demographic . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016.     Mexicans are relatively hypermethylated compared to Puerto Ricans (p = 1.4 × 10 -19 ).

Figure Legends
[C] Plot showing the association between Native American ancestry at the locus and methylation levels at the locus colored by ethnicity; Native American ancestry accounts for 58% of the association between ethnicity and methylation at the locus.       . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint

GALA II Individual Ancestry Estimates
. CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint   . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted January 15, 2016. ; https://doi.org/10.1101/036822 doi: bioRxiv preprint