Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans

Ziyatdinov, Andrey; Parker, Margaret M.; Vaysse, Amaury; Beaty, Terri H.; Kraft, Peter; Cho, Michael H.; Aschard, Hugues

doi:10.1038/s41431-019-0545-8

Article
Published: 13 December 2019

Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans

European Journal of Human Genetics volume 28, pages 656–668 (2020)Cite this article

877 Accesses
5 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Admixture mapping has led to the discovery of many genes associated with differential disease risk by ancestry, highlighting the importance of ancestry-based approaches to association studies. However, the potential of admixture mapping in deciphering the interplay between genes and environment exposures has been seldom explored. Here we performed a genome-wide screening of local ancestry–smoking interactions for five spirometric lung function phenotypes in 3300 African Americans from the COPDGene study. To account for population structure and outcome heterogeneity across exposure groups, we developed a multi-component linear mixed model for mapping gene–environment interactions and empirically showed its robustness and increased power. When applied to the COPDGene study, our approach identified two 11p15.2-3 and 2q37 loci, exhibiting local ancestry–smoking interactions at genome-wide significant level, which would have been missed by standard single-nucleotide polymorphism analyses. These two loci harbor the PARVA and RAB17 genes previously recognized to be involved in smoking behavior. Overall, our study provides the first evidence for potential synergistic effects between African ancestry and smoking on pulmonary function, and underlines the importance of ethnic diversity in genetic studies.

You have full access to this article via your institution.

Download PDF

A genome-wide association study identifies distinct variants associated with pulmonary function among European and African ancestries from the UK Biobank

Article Open access 14 January 2023

Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk

Article Open access 13 March 2023

Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies

Article Open access 07 November 2023

Introduction

The study of genetically diverse populations has become a priority in public health research. Several major initiatives started in the past few years, including the National Institute of Health (NIH) Trans-Omics in Precision Medicine Program, which aims to sequence over a hundred of thousands of whole genomes from a variety of ancestries. These initiatives compensate for the lack of participants of non-European ancestries in genetic studies [1,2,3]. Besides addressing health disparities among ethnic groups, studies of multi-ethnic cohorts and admixed populations can provide important information about the biology of complex diseases and help to identify associated genes [4]. Recently, admixed populations, such as African Americans, represent a special case of multi-ethnic cohorts with mosaic chromosomes derived from several ancestral populations. Admixture mapping, often applied to recently admixed populations, searches for genomic loci of unusual local ancestry at a putative disease risk locus compared with the genome-wide average [5]. Findings for respiratory disease, chronic renal disease, prostate cancer, and systemic lupus erythematosus have been reported as results of admixture mapping [6,7,8,9,10,11].

The data analysis in admixture mapping consists of two main steps: inferring local ancestry and testing for association between every local ancestry segment and an observed phenotype. Given unobservable ancestry information, current methods on ancestry inference probabilistically define the location of every ancestral switch using genotyping array data, reference haplotypes, and algorithms, based on hidden Markov models (HMM) [12, 13]. These methods were empirically shown to produce reliable results on African Americans [14], as they represent a relatively simple two-way admixture and are well-modeled by available reference panels [15]. The standard approach for association testing is similar to genome-wide association studies (GWAS) on single-nucleotide polymorphisms (SNPs) and runs linear regression to estimate the correlation between local ancestry and phenotype. To avoid confounding due to population structure that is inherently present in admixed individuals, most studies also included global ancestry components (i.e., the genome-wide proportions of ancestry derived from each ancestral population) as covariates. Going beyond linear regression, the framework of linear mixed models (LMMs) was applied to quantify individual similarities by ancestry, showing how the phenotypic variance is explained by local ancestry [16] and linking it to the heritability of complex traits estimated from SNP data [17]. Notably, several admixture mapping studies applied LMMs to control for family structure (kinship), household groups, and census blocks [18, 19], but modeling individual relationships through local ancestry genome-wide has not be explored yet, despite potential mapping improvements [13].

Recently, we found that the correlation between local ancestry and untyped causal variants can be leveraged to detect distant gene–gene interactions in admixed populations through local ancestry–local ancestry screening [20]. That work also demonstrated that the power of such admixture mapping increases with the number of causal variants within local ancestry tested and with the degree of differentiation of variants between the ancestral populations. Here we suggest that the same principle can be applied to search for gene–environment interactions. Regarding previous studies of gene–environment interactions, most works focused on interactions between the global ancestry and environmental factors using linear regression [17, 21,22,23]. As here we sought to screen for local ancestry–environment interactions, a type of admixture mapping seldom explored, we argue that such application might face multiple methodological challenges due to both population stratification and outcome heterogeneity among individual groups stratified by environmental exposure. Hence, the use of LMMs will be particularly appropriate to account for complex genetic and environmental relationships in admixed individuals.

Application of admixture mapping to lung function phenotypes in African Americans is especially relevant [24]. European Americans and African Americans are well known to show differences in spirometric measures of lung function such as forced expiratory volume in one second (FEV₁) and forced vital capacity (FVC) with higher values for these two traits for European Americans. Factors partially responsible for these differences include body habitus, early-life development conditions, socioeconomic status, and other environmental factors [25, 26]. Previous studies showed strong evidence that the proportion of African global ancestry is associated with lower lung function for a given ranges of height and age [21, 24]. Also, the higher proportion of African ancestry in African Americans was associated with an additional decrease in lung function for smokers [27].

To address the aforementioned methodological challenges, we used real data and proceed in a stepwise assessment of multi-component LMM, to define a robust interaction test of association. More precisely, we conducted a genome-wide scan of local ancestry–smoking interactions for five spirometric lung function phenotypes available in 3300 African Americans from the Genetic Epidemiology of COPD (COPDGene) study [28]. The search for gene–environment interactions related to pulmonary function phenotypes in the COPDGene study is markedly relevant, as it is one of the largest studies of African American smokers. In result, our application of the proposed LMM identified two genome-wide significant and five suggestive loci that would have been missed in standard single SNP-based approaches.

Materials and methods

The COPDGene dataset

In the analysis of COPDGene study (dbGaP phs000179.v5.p2), we focused on five correlated quantitative pulmonary phenotypes: FEV₁; FEV₁ as a percent of predicted (FEV₁ % predicted); FVC; FVC as a percent of predicted (FVC % predicted); and the ratio of FEV₁ to FVC (FEV₁/FVC). We derived two binary smoking exposures from the number of cigarettes smoked per day for gene–environment interactions: current smoker exposure (current smokers vs. former smokers) and heavy smoker exposure (current heavy smokers vs. current moderate smokers). We defined moderate current smokers with 1–14 cigarettes per day on average, whereas heavy current smokers were defined as with >14 cigarettes per day on average. When using heavy smoker exposure in the analysis, we excluded all subjects who were former smokers.

Locus-specific ancestry or local ancestry was inferred from 684,187 autosomal genotyped SNPs as previously described [28]. Briefly, we used the LAMP-LD program [29] to estimates local ancestry per individual using a HMM algorithm comparing observed genotypes and haplotypes from reference ancestral populations. We parameterized the algorithm with 15 HMM states, a window size of 50 SNPs, and used 99 CEU and 108 YRI unrelated individuals from the 1000 Genomes Project (Phase III, hg19) [30] as reference panels. Per SNP estimates of African ancestry were further used in two post-processing steps. First, we averaged the local ancestry per individual to derive the genome-wide proportion of African ancestry (i.e., the global ancestry). Second, we estimated local ancestry segments by merging neighboring SNPs with identical values. We next filtered out short ancestry segments of length <10,000 bases to mitigate possible artifacts of the inference procedure that might affect further admixture mapping analysis (LAMP-LD authors, personal communication; 2017).

Step-wise model selection for admixture mapping

Consider a quantitative trait stored in a n-dimensional vector y and a n × m data matrix Z of local ancestry segments, where n is the number of individuals and m is the number of local ancestry segments. Let z_l be a column of matrix Z corresponding to a single local ancestry segment and x_e being a n-dimensional column vector of a binary environmental exposure. We aim at testing the statistical interaction between the local ancestry segments z_l and the exposure x_e on the phenotype y. As discussed in the Supplementary Material, we are interested to assess a fixed effect of interaction z_l × x_e using the standard Wald’s test, while controlling for variance related to local ancestry, global ancestry, and exposure that might confound the estimate of effect. However, in regards to potentially non-trivial structure in the data, due to both ancestry admixture and potential heterogeneity of outcome across exposure groups, we defined our association model using a stepwise approach, where the model complexity was incremental until reaching the desired criteria of validity. Following standard practices [31], these criteria were designed (1) to reach a genomic inflation parameter (λ close to 1) and (2) to achieve an overall shape of the standard quantile–quantile plot (Q–Q plot) of the −log₁₀(P) matching the expected uniform distribution of p-values for the majority of the tests.

In practice, before evaluating interaction effects, we first assessed the robustness of a standard LMM when testing for the marginal effect of local ancestry z_l only:

$$y = C\beta _{\mathrm{{C}}} + \beta _{\mathrm{{e}}}x_{\mathrm{{e}}} + \beta _{\mathrm{{g}}}z_{\mathrm{{g}}} + \beta _{\mathrm{{l}}}z_{\mathrm{{l}}} + u_{\mathrm{{c}}} + e$$

(1)

where C is a matrix of trait-specific covariates and β_C is a vector of their fixed effects, z_g is a vector of global, and β_e, β_g, β₁ are fixed effects of exposure, global ancestry, and local ancestry, respectively. The random effects include a vector of the residual errors e and a vector of random-effect u_c encoding whether two given individuals belong to the same medical center. We further added additional random-effect components (described below) in the LMM, for which the importance was assessed incrementally (Supplementary Table S1). We next included our parameter of interest, the local ancestry–exposure interaction effect, on top of random-effect components selected at the previous step and continued our assessment of additional components until reaching the desired characteristics. Our full and final LMM was defined as follows:

$$y = \; C\beta _{\mathrm{{C}}} + \beta _{\mathrm{{e}}}x_{\mathrm{{e}}} + \left[ {\beta _{\mathrm{{g}}}z_{\mathrm{{g}}} + \delta _{\mathrm{{g}}}z_{\mathrm{{g}}}x_{\mathrm{{e}}}} \right] \\ + \left[ {\beta _{\mathrm{{l}}}z_{\mathrm{{l}}} + \delta _{\mathrm{{l}}}z_{\mathrm{{l}}}x_{\mathrm{{e}}}} \right] + u_{\mathrm{{m}}} + u_{\mathrm{{i}}} + u_{\mathrm{{h}}} + u_{\mathrm{{c}}} + e$$

(2)

where, in addition to notation in Eq. (1), z_g × x_e and z_l × x_e represent interactions between global and local ancestries and exposure, respectively; δ_g and δ_l are fixed effects of interactions between global ancestry and local ancestry and exposure, respectively. The first vector of additional random effects u_m captures the variance of local ancestry remaining after taking into account the global ancestry as a fixed effect (z_g) [16, 32]. The variance–covariance matrix of u_m is the ancestral relationship matrix (ARM) derived by the cross-product operation on column-wise centered and scaled Z matrix [32]. The second vector u_i complements the previous vector u_m and comes out due to testing the interaction effect (δ_l) rather than the marginal effect (β_l) [33]. The variance–covariance matrix of u_i is derived from ARM based on stratification by binary environmental exposure status (x_e) [33]. We refer to this matrix as environmental ARM or EARM. The third vector u_h models the heterogeneity of phenotypic variance across the three smoking groups. The variance–covariance matrix of u_h is a diagonal matrix, where entries are the same if they correspond to the same group.

To make the genome-wide screenings computationally efficient, we followed the standard two-step approach in GWAS [34,35,36,37]. First, we estimated the variance–covariance matrix of a trait by a LMM only once in the absence of local-ancestry fixed effects. Second, we applied generalized least squares to derive association test statistic for each local ancestry segment [34]. The test of association was the Wald’s test, which statistic is computed as ratio of estimated effect size (the marginal effect size β_l in Eq. 1 or the interaction effect size δ_l in Eq. 2) to its SE.

Multi-phenotype analysis

We first conducted single-trait admixture mapping of gene–environment interactions for each of the five pulmonary traits considered and each of the two exposures (current smoker and current heavy smoker). However, these five spirometric lung function measurements are all highly correlated and likely share genetic association signals. To reduce the penalty for multiple testing and potentially increase power, we focused our primary screening on a multivariate association tests combining single-trait signals separately for each exposure without assuming any prior on the direction of the single-trait effects [36, 38,39,40]. In practice, given five Z-scores for a local ancestry segment, stored in a vector S_5×1, we estimated the multivariate Z-score S_joint in two steps. We first estimated $\Sigma _{5 \times 5}$, the variance–covariance matrix among Z-scores under the null hypothesis of no association and then, for each local ancestry segment, we derived the multivariate statistics using the Mahalanobis distance, defined as:

$$S_{\mathrm{{joint}}} = S^{\mathrm{{T}}}\hat{\Sigma}^{-1}S$$

(3)

which squared statistics $T = S_{\mathrm{{joint}}}^2$ follows a χ² distribution with 5 degrees of freedom under the null composite hypothesis of no interaction effect on any of these five traits. For the estimation of $\Sigma$, previous works suggested using the complete Z-score data from the genome-wide scan, i.e., $\hat{\Sigma} = S^{\mathrm{{T}}}S$, assuming that the vast majority of Z-scores are distributed under the null [38, 40]. Here we made the same assumption and additionally discarded large single-trait Z-scores above a given threshold to reduce the risk of bias. Further, we approximated $\hat{\Sigma}$ from the resulting truncated multivariate normal distribution by the maximum likelihood estimation. We empirically found the best threshold value equals 3 for our admixture mapping Z-scores (Supplementary Material).

Follow-up analysis

We performed association and further fine-mapping analysis using genotyped data, following up regions identified by admixture mapping. Considering x_g is a vector of genotypes for a single SNP, we extended the interaction model (Eq. 2) by adding two terms for marginal genetic (x_g) and interaction (x_g × x_e) effects. It is noteworthy that the exposure term (x_e) was already included in this model. We performed a univariate Wald’s test with one degree of freedom to derive the p-value for interaction effect between genotype and exposure. By including a local ancestry term when testing for the interaction effect of genotype, we accounted for possible different Linkage Disequilibrium (LD) patterns for European and African ancestral backgrounds [41]. Such conditional analysis can reduce power but assures that the interaction effect of genotype is driven by a biological mechanism rather than a better SNP tagging in a particular ancestral population.

As discussed in our previous work [20], we expected SNPs in regions of local ancestry–smoking interactions to show multiple-SNP effects on the trait as well as high allelic frequency differentiation at SNPs between ancestral populations. Hence, we performed a comparative study of allelic frequencies between the two ancestral populations and fine-mapping analysis to assess the potential presence of multiple causal variants. First, we computed allele frequency differences (ΔDAF) [42] at all SNPs to measure the allelic heterogeneity between European and African population groups from the 1000 Genomes Project (Phase III) [30]. The ΔDAF measures were previously found to be highly correlated with Weir and Cockerham’s F_ST in the 1000 Genomes sample [43]. We assessed cases of extreme ΔDAF in the regions of interest by comparing the observed value against the threshold proposed by Colona et al. [44] In brief, that study grouped 36.8 million variants in African and European populations from the 1000 Genomes Project [30] into bins of non-overlapping sets of 5000 variants and derived the distribution of the maximum ΔDAF for further comparisons. Second, we applied a Bayesian method implemented in the software package FINEMAP [45] to estimate the posterior probability of each single SNP interaction to be causal, conditionally on in-sample linkage disequilibrium pattern. FINEMAP implements a shotgun stochastic search algorithm to efficiently explore the most likely causal configurations and we ran FINEMAP with the default parameters on the maximum number of causal SNPs (5), prior probabilities on the number of causal SNPs, and the prior probabilities of a single SNP to be causal.

Results

Participants characteristics and description of traits

We used a dataset of 3300 African American research volunteers from the COPDGene study [46]. Self-identified non-Hispanic African Americans and non-Hispanic European Americans between 45 and 80 years of age with a history of at least 10 pack-years of smoking were enrolled from 21 medical centers across the United States. Details on phenotyping, genome-wide genotyping of the cohort, and inference of local ancestry are provided in Methods and Supplementary Material.

Individual characteristics of the COPDGene dataset are described in Table 1 for the whole sample and stratified by current smoking status: non-smokers, moderate smokers (1–14 cigarettes per day), and heavy smokers (>14 cigarettes per day). The three smoker groups have similar overall characteristics, although there are some differences due to the COPDGene study enrollment protocol: the current smokers are younger and have higher proportions of males; non-smokers have higher numbers of accumulated pack-years. After a stringent quality-control procedure (Supplementary Figs. S1 and S2, and Supplementary Table S2), we observed a total of 30,043 local ancestry segments. The distribution of proportions of local African ancestry (averaged across individuals) is roughly uniform along the genome (Fig. 1a), confirming that the local ancestry data were free from artifacts. The proportion of global African ancestry ranged between 26.3% and 99.8% across individuals with an average of 80.3% (Fig. 1b). We confirmed that individuals with higher proportions of African ancestry tend to have lower pulmonary function at a highly significant level (P < 0.001; Supplementary Table S3) for all the traits, except FEV₁/FVC. We also assessed the variance heterogeneity among the smoking groups (not current smokers, moderate current smokers, and heavy current smokers) by its explicit modeling as random effects (Eq. 2). Both exploratory data analysis (Supplementary Fig. S3) and formal statistical tests (P < 0.0001 for all traits; Supplementary Table S4) highlighted differences among these three groups not only by mean, but also by variance (the group of non-smokers has the largest variance).

Table 1 Characteristics of African American participants in the COPDGene project.

Full size table

**Fig. 1: The African ancestry of African American participants of the COPDGene project.**

Mixed model accounts for population structure and variance heterogeneity

To address the methodological question of performing a test of local ancestry–exposure interaction in admixed population, we chose a data-driven approach where the robustness of candidate models was assessed using the genomic control parameter (λ) and the overall shape of the standard Q–Q plot of the −log₁₀(p-value). We started with the simplest model testing the marginal effect of local ancestry and incrementally added terms necessary for robust testing of local ancestry–exposure interaction, our main parameter of interest.

We proposed to use a LMM including up to four random effects: u_m, u_i, u_h, and u_c, capturing structure due to shared local ancestry, local ancestry–smoking interaction, smoking status, and recruitment medical center, respectively. We added each of these components into the model through a stepwise procedure, assessing their relative contribution in various combinations. When all components are present, the model has the form given in Eq. 2. All variance–covariance matrices for the random effects observed in real data are illustrated in Supplementary Fig. S4, whereas additional details of the model selection are provided in Supplementary Material and are summarized in Supplementary Table S1.

We used the FEV₁% predicted phenotype as an illustrative scenario and started with a marginal association model for single-trait admixture mapping (Fig. 2a) (i.e., without ancestry–environment interaction) including only u_c, as there was strong heterogeneity in the distribution of traits by medical center (Supplementary Table S4). This initial model is given in Eq. 1. Marginal local ancestry test statistics in the initial model showed substantial inflation (λ = 1.19). We further accounted for correlation by genome-wide local ancestry across individuals, which was done by modeling the ARM (the u_m random effect as in Eq. 2). The addition of the ARM component substantially mitigated the inflation (λ = 1.04). We next considered adding the heterogeneity component (the u_h random effect as in Eq. 2) that captured substantial portion of variance of modeled traits (Supplementary Table S4). The last additional component did not impact the overall distribution of p-values (λ = 1.03), but resulted in an improved power of admixture mapping because of reduced amount of the residual variance after modeling the heterogeneity. When comparing two models with and without the heterogeneity component, the average test statistic at associated ancestry segments (P < 3.06 × 10⁻⁵) increased from 11.17 to 15.01, which is equivalent to boosting the effective sample size by 34% [37].

**Fig. 2: Robustness of mixed-model admixture mapping assessed by quantile–quantile (Q–Q) plots.**

We then expanded the LMM to address the model components related to testing local ancestry–smoking interactions (Eq. 2 and Supplementary Table S1). Following the exploration of the marginal model, we examined the role of four random effects by conducting admixture mapping for a single trait, the FEV₁% predicted phenotype, and the current smoker exposure in different model configurations (Fig. 2b). Overall, the association test statistic was substantially inflated (λ = 1.61) for the initial model with only the u_c random effect and without any of the other three components u_m, u_i, and u_h in Eq. 2. Including the components separately in the model, either the heterogeneity component (u_h), one genetic component with ARM (u_m), or two genetic components (u_m and u_i), was not sufficient to fix the inflation (λ = 1.16, 1.6, and 1.39, respectively). The association test statistic was well-behaved only when all three components were added (λ = 1.06). Hence, our final model for admixture mapping of local ancestry–smoking (gene–environment) interaction (Eq. 2) has three additional random effects u_m, u_i, and u_h, which capture structures due to shared local ancestry, local ancestry–smoking interaction, and outcome heterogeneity across smoking groups.

Local ancestry–smoking interactions detected by admixture mapping

We conducted admixture mapping of local ancestry–environment interaction for all five spirometric traits considering the two binary exposures independently (Supplementary Figs. S5 and S6). Single-trait statistics showed limited inflation for all analyses (λ = 0.93–1.09), except for FEV₁/FVC and current smoker exposure (λ = 1.12; Supplementary Figs. S7 and S8). For each exposure, single trait results were combined to form a multi-trait test (Fig. 3). The two multi-trait analyses show a well-controlled type I error rate (λ = 0.98, 1.02; Supplementary Fig. S9). Following the eigenMT approach [20], we estimated the number of effective tests to be 1635 (Supplementary Material) and thus reduced the multiple-testing burden from the Bonferroni threshold (0.05/30,043 = 1.51 × 10⁻⁶) to the effective Bonferroni threshold (0.05/1635 = 3.06 × 10⁻⁵). We also considered the suggestive Bonferroni threshold (0.5/1635 = 3.06 × 10⁻⁴) to select additional candidate loci of interest.

**Fig. 3: Admixture mapping identifies two genome-wide significant and five suggestive loci of local ancestry–smoking interactions.**

We identified two genome-wide significant and five suggestive interaction association signals (Table 2). The first genome-wide significant locus in the region chromosome 11p15.2-3 spanned 11 ancestry segments of average length 37 Kb and 17 SNPs per segment. The top ancestry segment Chr11: 12,341,061–12,373,680 of length 32,619 kb and 21 SNPs had a multi-trait positive interaction effect with current smoker exposure with multi-trait Z = 5.34 and P = 2.79 × 10⁻⁵. The signal was driven by all single-trait associations, where all p-values passed the suggestive Bonferroni threshold. The top single-trait association for FEV₁ % predicted showed Z = 4.53 and P = 5.86 × 10⁻⁶. That top single-trait ancestry segment was 47 kb away from the top multi-trait one, but the two segments were highly correlated after adjusting for the global ancestry (the Pearson’s correlation coefficient, 0.98). The second genome-wide significant locus in chromosome 2q37.3 included 11 ancestry segments of length 32 kb and 9 SNPs on average. The strongest multi-trait and top single-trait (FEV₁) association signals were both localized in the same ancestry segment Chr2: 238,430,224–238,486,767 of length 56,543 kb and 21 SNPs. The positive interaction effects with the current heavy smoker exposure showed Z = 5.34 and P = 2.90 × 10⁻⁶, and Z = 4.53 and P = 2.50 × 10⁻⁶ for multi-trait and top single-trait associations, respectively.

Table 2 Top local ancestry segments-smoking interactions.

Full size table

The other five suggestive loci (Table 2) were mostly detected in the interaction admixture mapping with current heavy smoker exposure (13q12.3-13.1, 11q21, 7p15.2-3, and 8q21.13), except one locus 1q44 from the mapping with current smoking exposure. In contrast to the genome-wide significant loci, multi-trait association signals were much stronger than top single-trait signals for all suggestive loci (the difference in p-values was several orders of magnitude for some tests).

Genotype-smoking interactions reveal differentiated genetic variants

For each region showing at least suggestive significance in the multi-trait admixture mapping analysis, we assessed potential interactions of single SNPs available in the region around the top admixture signal. We conducted association analysis for a total of 888 SNPs available across the seven regions lying within ancestry segments (the average number of SNPs per region was 126.9 and the average number of SNPs per segment was 13.7). Here we focused on the single trait showing the largest association signal in admixture mapping. Overall, none of these SNPs passed a stringent Bonferroni correction threshold accounting for all SNPs tested in each region. However, the top SNPs especially in the genome-wide significant loci helped to localize the association signal (Table 3). In the first genome-wide significant region 11p15.2-3 (Fig. 4), the top SNP rs933920 (hg19 chr11:g.12481110C> T) (P = 0.0036) is an intronic variant in the PARVA gene (MIM 608120). In the second genome-wide significant region 2q37.3 (Fig. 4), the first top SNP rs7569427 (hg19 chr2:g.238413338A> G) (P = 0.02) was an intronic variant in the MLPH gene (MIM 606526) and the second top SNP rs2280289 (hg19 chr2:g.238483729A> G) (P = 0.036) was a missense variant in the RAB17 gene (MIM 602206).

Table 3 Top SNP-smoking interactions in regions identified by admixture mapping.

Full size table

**Fig. 4: Detailed regional association plots for three selected loci 11p15.2-3, 2q37.3, and 11q21.**

To assess the level of allelic heterogeneity of SNPs in these identified regions, we computed the allele frequency differences (defined as ΔDAF in Materials and Methods). The SNPs exhibited high levels of heterogeneity for all regions (ΔDAF in 0.62 and 0.78). Overall, three out of our seven loci (1q44, 2q37.3, and 8q21.13) matched the 0.7 threshold proposed by Colonna et al. [44], defining the 1% of the genome displaying the most extreme differentiation across populations. These differences in minimum allele frequency between European and African ancestries can also be visually assessed on Fig. 4 and Supplementary Figs. S10–13. Finally, we also evaluated the hypothesis of multiple causal SNPS per region using the FINEMAP software [23], but we were not able to find any strong evidence for multiple causal SNPs (Supplementary Table S5).

Replication of association signals in European GWASs

We performed replication analyses of association signals detected at individual SNP level and region level, for the seven loci reported in Tables 2 and 3. We considered two large studies of pulmonary phenotypes and COPD conducted in individuals of European ancestry: the CHARGE consortium (N = 50,047), which had genome-wide summary results for SNP-by-smoking interaction [47], and the most recent and largest meta-analysis of UK Biobank and SpiroMeta cohorts (COPD cases = 35,735, COPD controls = 222,076; N = 400,102 for pulmonary function phenotypes), which provides an up-to-date list of variants with genome-wide significant marginal genetic effect [48, 49].

Matching our nine top SNPs with the CHARGE consortium results [47], we found that three were missing, being rare or monomorphic in European population. Interaction effects of three out of the six remaining SNPs were replicated at the nominal significance level (P < 0.05) (Supplementary Table S6) in SNP–smoking interaction screening of FEV₁/FVC, where packs-years was used as a proxy for heavy smoking.

We next assessed the presence of marginal genetic effect at our seven loci using the aforementioned meta-analysis of pulmonary phenotypes [48] and COPD [49]. Although our marginal signals of top local ancestry segments and SNPs were weak (Supplementary Tables S7 and S8), we observed that nine genome-wide significant SNPs from GWASs [48, 49] were located <1 Mb away from the three loci 2q37.3, 11p15.2-3, and 7p15.2-3 (Supplementary Table S9). In particular, two SNPs rs80145403 (hg19 chr11:g.12493292G> A) and rs7114698 (hg19 chr11:g.12707876C> T) were within the same PARVA and TEAD1 genes in Chromosome 11 as in our SNP–smoking interaction analysis (Table 3). Notably, five SNPs come from two loci that each has three distinct signals estimated by the conditional analysis [49]. Such a scenario with multiple SNPs driving either marginal or interaction association is beneficial for our ancestry-based approach to detect gene–environment interactions, and thus may explain the signal overlap at the region level for two 11p15.2-3 and 2q37.3 loci.

Discussion

Broadening the diversity of ethnicities in genetic analysis can provide important information for disease pathogenesis. Leveraging local ancestry through admixture mapping could improve power to discover marginal genetic and gene–environment effects, although the technical and statistical challenges still remain. To address these challenges, we introduced a multi-component LMM and empirically demonstrated its robustness in admixture mapping on real data in 3300 African American participants in the COPDGene study. We detected two genome-wide significant and five suggestive loci showing smoking-dependent effects of local ancestry on spirometric lung function phenotypes.

Although the functional effects of variants in the identified genomic regions is unknown, these regions harbor genes previously known for traits related to smoking. The top SNP rs933920 in the first genome-wide significant locus 11p15.2-3 (P = 2.79 × 10⁻⁵) is located within the PARVA gene, which produces a focal adhesion protein [50]. Two previous studies reported that this gene was differentially methylated in small airway epithelium [51] and buccal mucosa [52] when stratified by current smoking status. The second top SNP rs2280289 for the second genome-wide significant region on 2q37.3 (P = 2.90 × 10⁻⁵) is a missense variant in RAB17, which was previously associated with a smoking cessation genotype success score [53, 54]. As for the most relevant result found among the five suggestive loci, the top SNP rs11020968 (hg19 chr11:g.94602414C> T) in the locus 11q21 (P = 5.0 × 10⁻⁵) is a missense variant for angiomotin-like protein 1 (AMOTL1; MIM 614657), a tight junction protein hypothesized to play a role in COPD through endothelial tight junction permeability and whose expression is affected by cigarette smoking [55]. Additional genes around other loci may warrant further investigation.

We further attempted to evaluate whether SNPs within these identified regions show multiple-SNP effects and exhibit high allelic differentiation, as our previous work on gene–gene interaction admixture mapping suggested this kind of genetic architecture [20]. Overall, allele frequency heterogeneity between European and African ancestries was very strong and persistent in the identified regions. Although our fine-mapping analysis did not show evidence for multiple causal variants, SNP–smoking interaction analysis is known to have limited power [56]; thus, we cannot rule out the possibility of multiple causal variants. Indeed, the conditioning on the primary signal of local ancestry is likely to decrease the statistical power to detect interactions at the SNP level even in larger samples [56]. Alternative methods, such as jointly modeling ancestry and genotype association signals, might help to overcome this limitation [57].

Our methodological contributions to admixture mapping are multiple. First, we extended the concept of genetic relationship matrix originally proposed to control population structure in GWAS [34]: the ARM was similarly computed on local ancestry data and further used in association tests. Second, we adopted the population stratification approach recently designed specifically for GWAS of gene–environment interactions [33]: two matrices, the standard ARM, but also a second environmental ARM or EARM, were essential to control for spurious association results when testing local ancestry–environment interactions. Finally, we modeled the outcome heterogeneity among groups stratified by environmental exposure. Modeling this heterogeneity increased the power because of reduced residual phenotypic variance (up to 65%) and substantially decreased the inflation of interaction test statistic. Admixture mapping has a lower computational complexity compared with GWAS and our complete single-trait analysis took up to 4 min on the standard desktop computer.

Our study also has limitations. COPDGene is one of the largest studies of African American smokers, with a high proportion of subjects with COPD, which makes suitable replication cohorts challenging. Nevertheless, we were able to reproduce some of the top SNP–smoking interactions in the CHARGE consortium [47] at nominal significance level. More importantly, our study identified three loci 2q37.3, 11p15.2-3, and 7p15.2-3 using a dataset of only 3300 African American individuals, whereas the same loci only passed the genome-wide significance threshold of standard univariate association in an independent replication cohort including up to 400,000 individuals [48, 49]. We also note that our screenings for local ancestry–smoking interactions were limited to binary smoking exposures. We explored a three-level exposure model; however, we found that the parameters increase made the estimation computationally unstable [32]. We believe that such complex gene–environment LMMs would require datasets with larger sample sizes [58]. Further, the method for admixture mapping can also be optimized. When conducting association analysis, excluding the local ancestry segment under testing from the ARM construction will be able improve power [13], but is computationally more burdensome. We attempted a more efficient out-of-chromosome strategy commonly applied in GWAS [35], but we observed fairly inflated test statistics (data not shown).

In conclusion, our study reports a powerful approach for gene–environment interaction association studies, leveraging the unique genetic architecture of complex traits measured in recently admixed populations. The proposed statistical model has shown to be robust to population structure and outcome variance heterogeneity. In our application to the COPDGene study, we have found two genome-wide significant local ancestry–smoking interactions of lung function phenotypes that would have been missed in standard single SNP interaction analyses. Overall, our findings provide additional evidence of the importance of ethnic diversity in genetic clinical studies.

Code and data availability

The COPDGene study, the dbGaP database, study accession phs000179.v5.p2, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000179.v5.p2

Code used here for admixture mapping, https://gist.github.com/variani/a28c18797c39a62bacab587e6e708529

The 1000 Genomes Project (Phase III, version 5, hg19), http://csg.sph.umich.edu/abecasis/mach/download/1000G.Phase3.v5.html

LAMP-LD software for ancestry inference, http://bogdan.bioinformatics.ucla.edu/software/lamp/

Gaston R package for mixed models, https://cran.r-project.org/package=gaston

Public GWAS summary-statistics from the CHARGE consortium, the dbGaP database, study accession phs000930.v3.p1, https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000930.v3.p1

OMIM, http://www.omim.org.

References

Kessler MD, Yerges-Armstrong L, Taub MA, Shetty AC, Maloney K, Jeng LJB, et al. Challenges and disparities in the application of personalized genomic medicine to populations with African ancestry. Nat Commun. 2016;7:12521.
Popejoy AB, Fullerton SM. Genomics is failing on diversity. Nature 2016;538:161–4.
CAS PubMed PubMed Central Google Scholar
Bustamante CD, De La Vega FM, Burchard EG. Genomics for the world. Nature 2011;475:163–5.
CAS PubMed PubMed Central Google Scholar
Hindorff LA, Bonham VL, Brody LC, Ginoza MEC, Hutter CM, Manolio TA, et al. Prioritizing diversity in human genomics research. Nat Rev Genet. 2018;19:175–85. https://doi.org/10.1038/nrg.2017.89.
Article CAS PubMed Google Scholar
McKeigue PM. Prospects for admixture mapping of complex traits. Am J Hum Genet. 2005;76:1–7. http://linkinghub.elsevier.com/retrieve/pii/S0002929707625397.
CAS PubMed Google Scholar
Amundadottir LT, Sulem P, Gudmundsson J, Helgason A, Baker A, Ag- narsson BA, et al. A common variant associated with prostate cancer in European and African populations. Nat Genet. 2006;38:652–8.
CAS PubMed Google Scholar
Freedman ML, Haiman CA, Patterson N, McDonald GJ, Tandon A, Wal- iszewska A, et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl Acad Sci USA 2006;103:14068–73. https://doi.org/10.1073/pnas.0605832103.
CAS PubMed PubMed Central Google Scholar
Gudmundsson J, Sulem P, Manolescu A, Amundadottir LT, Gudbjarts- son D, Helgason A, et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nat Genet. 2007;39:631–7.
CAS PubMed Google Scholar
Kao WHL, Klag MJ, Meoni LA, Reich D, Berthier-Schaad Y, Li M, et al. MYH9 is associated with nondiabetic end-stage renal disease in African Americans. Nat Genet 2008;40:1185–92. https://doi.org/10.1038/ng.232.
Article CAS PubMed Google Scholar
Molineros JE, Maiti AK, Sun C, Looger LL, Han S, Kim-Howard X, et al. Admixture mapping in lupus identifies multiple functional variants within IFIH1 associated with apoptosis, inflammation, and autoantibody production. PLoS Genet. 2013;9:e1003222.
Kopp JB, Smith MW, Nelson GW, Johnson RC, Freedman BI, Bowden DW, et al. MYH9 is a major-effect risk gene for focal segmental glomerulosclerosis. Nat Genet. 2008;40:1175–84.
CAS PubMed PubMed Central Google Scholar
Seldin MF, Pasaniuc B, Price AL. New approaches to disease mapping in admixed populations. Nat Rev Genet. 2011;12:523–8. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3142784.
CAS PubMed PubMed Central Google Scholar
Shriner D. Overview of admixture mapping. Curr Protoc Hum Genet. 2017;2017(July):1.23.1–8.
Google Scholar
Pasaniuc B, Zaitlen N, Lettre G, Chen GK, Tandon A, Kao WHL, et al. Enhanced statistical tests for GWAS in admixed populations: Assessment using african americans from CARe and a breast cancer consortium. PLoS Genet. 2011;7:e1001371.
Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, et al. Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 2009;5:e1000519.
Shriner D, Bentley AR, Doumatey AP, Chen G, Zhou J, Adeyemo A, et al. Phenotypic variance explained by local ancestry in admixed African Americans. Front Genet. 2015;6(Oct):1–8.
Google Scholar
Reiner AP, Carlson CS, Ziv E, Iribarren C, Jaquish CE, Nickerson DA. Ge- netic ancestry, population sub-structure, and cardiovascular disease-related traits among African-American participants in the CARDIA Study. Hum Genet. 2007;121:565–75.
PubMed Google Scholar
Wang H, Cade BE, Sofer T, Sands SA, Chen H, Browning SR, et al. Admixture mapping identifies novel loci for obstructive sleep apnea in His- panic/Latino Americans. Hum Mol Genet. 2019;28:675–87.
CAS PubMed Google Scholar
Sofer T, Baier LJ, Browning SR, Thornton TA, Talavera GA, Wassertheil- Smoller S, et al. Admixture mapping in the Hispanic Community Health Study/Study of Latinos reveals regions of genetic associations with blood pressure traits. PLoS ONE 2017;12:1–15.
Google Scholar
Aschard H, Gusev A, Brown R, Pasaniuc B. Leveraging local ances- try to detect gene-gene interactions in genome-wide data. BMC ge- netics 2015;16:124 http://www.biomedcentral.com/1471-2156/16/124.
Google Scholar
Kumar R, Seibold MA, Aldrich MC, Williams LK, Reiner AP, Colangelo L, et al. Genetic ancestry in lung-function predictions. N Engl J Med. 2010;363:321–30. https://doi.org/10.1056/NEJMoa0907897.
CAS PubMed PubMed Central Google Scholar
Burchard EG, Ziv E, Coyle N, Gomez SL, Tang H, Karter AJ, et al. The importance of race and ethnic background in biomedical research and clinical practice. N Engl J Med. 2003;348:1170–5. https://doi.org/10.1056/NEJMsb025007.
PubMed Google Scholar
Shaffer JR, Kammerer CM, Reich D, McDonald G, Patterson N, Good- paster B, et al. Genetic markers for ancestry are correlated with body composition traits in older African Americans. Osteoporos Int. 2007;18:733–41.
CAS PubMed Google Scholar
Parker MM, Foreman MG, Abel HJ, Mathias RA, Hetmanski JB, Crapo JD, et al. Admixture mapping identifies a quantitative trait locus asso- ciated with FEV1/FVC in the COPDGene Study. Genet Epidemiol. 2014;38:652–9.
PubMed PubMed Central Google Scholar
Rona RJ, Gulliford MC, Chinn S. Effects of prematurity and intrauterine growth on respiratory health and lung function in childhood. BMJ (Clin Res ed) 1993;306:817–20.
CAS Google Scholar
Raby BA. Low-normal gestational age as a predictor of asthma at 6 years of age. Pediatrics 2004;114:e327–e332. https://doi.org/10.1542/peds.2003-0838-L.
PubMed Google Scholar
Aldrich MC, Kumar R, Colangelo LA, Williams LK, Sen S, Kritchevsky SB, et al. Genetic ancestry-smoking interactions and lung function in African Americans: a cohort study. PLoS ONE. 2012;7:e39541.
Parker MM, Foreman MG, Abel HJ, Mathias RA, Hetmanski JB, Crapo JD, et al. Admixture mapping identifies a quantitative trait locus asso- ciated with FEV1/FVC in the COPDGene Study. Genet Epidemiol. 2014;38:652–9.
PubMed PubMed Central Google Scholar
Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, et al. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 2012;28:1359–67.
CAS PubMed PubMed Central Google Scholar
Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR, Chakravarti A, et al. A global reference for human genetic variation. Nature 2015;526:68–74.
PubMed Google Scholar
Devlin B, Roeder K. Genomic control for association studies. Am J Hum Genet. 1999;65:A83–A83.
Google Scholar
Zaitlen N, Pasaniuc B, Sankararaman S, Bhatia G, Zhang J, Gusev A, et al. Leveraging population admixture to characterize the heritability of complex traits. Nat Genet. 2014;46:1356–62. https://doi.org/10.1038/ng.3139.
CAS PubMed PubMed Central Google Scholar
Sul JH, Bilow M, Yang WY, Kostem E, Furlotte N, He D, et al. Ac- counting for population structure in gene-by-environment interactions in genome-wide association studies using mixed models. PLoS Genet. 2016;12:e1005849. http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005849.
PubMed PubMed Central Google Scholar
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–54. https://doi.org/10.1038/ng.548.
Article CAS PubMed PubMed Central Google Scholar
Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D. Improved linear mixed models for genome-wide association studies. Nat Methods 2013;9:525–6.
Google Scholar
Zhou X, Stephens M. Efficient algorithms for multivariate linear mixed models in genome-wide association studies. Nat Methods 2014;11:407–9.
CAS PubMed PubMed Central Google Scholar
Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL. Mixed-model associ- ation for biobank-scale datasets. Nat Genet 2018;50(July):906–8. https://doi.org/10.1038/s41588-018-0144-6.
Article CAS PubMed PubMed Central Google Scholar
Province MA, Borecki IB. A correlated meta-analysis strategy for data mining “OMIC” scans. Pac Symp Biocomput. 2013:236–46. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3773990.
Aschard H, Vilhj ́almsson BJ, Greliche N, Morange PE, Tr ́egou ̈et DA, Kraft P. Maximizing the power of principal-component analysis of correlated phe- notypes in genome-wide association studies. Am J Hum Genet. 2014;94:662–76.
CAS PubMed PubMed Central Google Scholar
Zhu X, Feng T, Tayo BO, Liang J, Young JH, Franceschini N, et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am J Hum Genet. 2015;96:21–36. https://doi.org/10.1016/j.ajhg.2014.11.011.
CAS PubMed PubMed Central Google Scholar
Park DS, Eskin I, Kang EY, Gamazon ER, Eng C, Gignoux CR, et al. An ancestry-based approach for detecting interactions. Genet Epidemiol. 2018;42:49–63.
PubMed Google Scholar
Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, et al. An integrated map of genetic variation from 1,092 human genomes. Nature 2012;491:56–65.
Google Scholar
Morelli G, Song Y, Mazzoni CJ, Eppinger M, Roumagnac P, Wagner DM, et al. Yersinia pestis genome sequencing identifies patterns of global phylo- genetic diversity. Nat Genet. 2010;42:1140–3. https://doi.org/10.1038/ng.705.
Article CAS PubMed PubMed Central Google Scholar
Colonna V, Ayub Q, Chen Y, Pagani L, Luisi P, Pybus M, et al. Human genomic regions with exceptionally high levels of population differentiation identified from 911 whole-genome sequences. Genome Biol. 2014;15:R88.
Benner C, Spencer CCA, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 2016;32:1493–501.
CAS PubMed PubMed Central Google Scholar
Regan Ea, Hokanson JE, Murphy JR, Lynch Da, Beaty TH, Curran-everett D, et al. Genetic epidemiology of COPD (COPDGene) study design. Epi demiology 2011;7:1–10.
Google Scholar
Hancock DB, Artigas MS, Gharib SA, Henry A, Manichaikul A, Ramasamy A, et al. Genome-wide joint meta-analysis of SNP and SNP-by-smoking interaction identifies novel loci for pulmonary function. PLoS Genet. 2012;8:e1003098.
Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Mel- bourne CA, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat Genet. 2019;51:481.
CAS PubMed PubMed Central Google Scholar
Sakornsakolpat P, Prokopenko D, Lamontagne M, Reeve NF, Guyatt AL, Jackson VE, et al. Genetic landscape of chronic obstructive pulmonary disease identifies heterogeneous cell-type and phenotype associations. Nat Genet. 2019;51:494.
CAS PubMed PubMed Central Google Scholar
Korenbaum E, Olski TM, Noegel AA. Genomic organization and expression profile of the parvin family of focal adhesion proteins in mice and humans. Gene 2001;279:69–79. http://www.ncbi.nlm.nih.gov/pubmed/11722847.
CAS PubMed Google Scholar
Buro-Auriemma LJ, Salit J, Hackett NR, Walters MS, Strulovici-Barel Y, Staudt MR, et al. Cigarette smoking induces small airway epithelial epi- genetic changes with corresponding modulation of gene expression. Hum Mol Genet. 2013;22:4726–38.
CAS PubMed PubMed Central Google Scholar
Qiu W, Wan E, Morrow J, Cho MH, Crapo JD, Silverman EK, et al. The impact of genetic variation and cigarette smoke on DNA methylation in current and former smokers from the COPDGene study. Epigenetics 2015;10:1064–73.
PubMed PubMed Central Google Scholar
Uhl GR, Liu QR, Drgon T, Johnson C, Walther D, Rose JE, et al. Molecular genetics of successful smoking cessation: convergent genome-wide association study results. Arch Gen Psychiatry 2008;65:683–93. http://archpsyc.jamanetwork.com/article.aspx?articleid=482738.
CAS PubMed PubMed Central Google Scholar
Rose JE, Behm FM, Drgon T, Johnson C, Uhl GR. Personalized smoking cessation: interactions between nicotine dose, dependence and quit success genotype score. Mol Med. 2010;16:1. http://www.molmed.org/content/pdfstore/09{\}159{\}Rose.pdf.
Google Scholar
Suzuki R, Nakamura Y, Chiba S, Mizuno T, Abe K, Horii Y, et al. Mitigation of tight junction protein dysfunction in lung microvascular endothelial cells with pitavastatin. Pulm Pharmacol Ther. 2016;38:27–35. https://doi.org/10.1016/j.pupt.2016.04.003.
Article CAS PubMed Google Scholar
Aschard H. A perspective on interaction effects in genetic asso- ciation studies. Genet Epidemiol. 2016;40:678–88. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5132101.
PubMed PubMed Central Google Scholar
Szulc P, Bogdan M, Frommlet F, Tang H. Joint genotype- and ancestry- based genome-wide association studies in admixed populations. Genet Epidemiol. 2017;41:555–66.
PubMed Google Scholar
Robinson MR, English G, Moser G, Lloyd-Jones LR, Triplett MA, Zhu Z, et al. Genotype-covariate interaction effects and the heritability of adult body mass index. Nat Genet. 2017;49:1174.
CAS PubMed Google Scholar

Download references

Acknowledgements

This project was supported by NHGRI R21HG007687 (AZ, HA, and MHC), and NHLBI R01HL113264 and R01HL137927 (MHC) The COPDGene study (NCT00608764) is supported by NHLBI R01 HL089897 and R01 HL089856, and the COPD Foundation through contributions made to an Industry Advisory Board composed of AstraZeneca, Boehringer Ingelheim, Novartis, Pfizer, GlaxoSmithKline, Siemens, and Sunovion. MHC has received grant support from GSK. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health. Additional information can be found in the Supplementary Material.

Author information

Authors and Affiliations

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Andrey Ziyatdinov, Peter Kraft & Hugues Aschard
Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Margaret M. Parker & Michael H. Cho
Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France
Amaury Vaysse & Hugues Aschard
Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
Terri H. Beaty
Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, USA
Michael H. Cho

Authors

Andrey Ziyatdinov
View author publications
You can also search for this author in PubMed Google Scholar
Margaret M. Parker
View author publications
You can also search for this author in PubMed Google Scholar
Amaury Vaysse
View author publications
You can also search for this author in PubMed Google Scholar
Terri H. Beaty
View author publications
You can also search for this author in PubMed Google Scholar
Peter Kraft
View author publications
You can also search for this author in PubMed Google Scholar
Michael H. Cho
View author publications
You can also search for this author in PubMed Google Scholar
Hugues Aschard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrey Ziyatdinov.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ziyatdinov, A., Parker, M.M., Vaysse, A. et al. Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans. Eur J Hum Genet 28, 656–668 (2020). https://doi.org/10.1038/s41431-019-0545-8

Download citation

Received: 04 January 2019
Revised: 30 October 2019
Accepted: 01 November 2019
Published: 13 December 2019
Issue Date: May 2020
DOI: https://doi.org/10.1038/s41431-019-0545-8

This article is cited by

Admixture mapping reveals the association between Native American ancestry at 3q13.11 and reduced risk of Alzheimer’s disease in Caribbean Hispanics
- Andréa R. V. R. Horimoto
- Diane Xue
- Elizabeth E. Blue
Alzheimer's Research & Therapy (2021)

Mixed-model admixture mapping identifies smoking-dependent loci of lung function in African Americans

Subjects

Abstract

Similar content being viewed by others

A genome-wide association study identifies distinct variants associated with pulmonary function among European and African ancestries from the UK Biobank

Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk

Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies

Introduction

Materials and methods

The COPDGene dataset

Step-wise model selection for admixture mapping

Multi-phenotype analysis

Follow-up analysis

Results

Participants characteristics and description of traits

Mixed model accounts for population structure and variance heterogeneity

Local ancestry–smoking interactions detected by admixture mapping

Genotype-smoking interactions reveal differentiated genetic variants

Replication of association signals in European GWASs

Discussion

Code and data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplementary Material

Rights and permissions

About this article

Cite this article

This article is cited by

Admixture mapping reveals the association between Native American ancestry at 3q13.11 and reduced risk of Alzheimer’s disease in Caribbean Hispanics

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

A genome-wide association study identifies distinct variants associated with pulmonary function among European and African ancestries from the UK Biobank

Multi-ancestry genome-wide association analyses improve resolution of genes and pathways influencing lung function and chronic obstructive pulmonary disease risk

Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies

Introduction

Materials and methods

The COPDGene dataset

Step-wise model selection for admixture mapping

Multi-phenotype analysis

Follow-up analysis

Results

Participants characteristics and description of traits

Mixed model accounts for population structure and variance heterogeneity

Local ancestry–smoking interactions detected by admixture mapping

Genotype-smoking interactions reveal differentiated genetic variants

Replication of association signals in European GWASs

Discussion

Code and data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Supplementary information

Supplementary Material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Admixture mapping reveals the association between Native American ancestry at 3q13.11 and reduced risk of Alzheimer’s disease in Caribbean Hispanics

Search

Quick links