Main

Pulmonary function is an easily measurable and reliable index of the physiological state of the lungs and airways1. Pulmonary function also predicts mortality in the general population, even among people who have never smoked (never-smokers) who have only modestly reduced pulmonary function and no respiratory symptoms2,3. The peak level of pulmonary function attained in early adulthood and its subsequent decline with age are likely influenced by genetic and environmental factors. Tobacco smoking is a major environmental cause of accelerated decline in pulmonary function with age. Other inhaled pollutants also appear to contribute. Familial aggregation studies suggest a genetic contribution to lung function, with heritability estimates exceeding 40%4,5, but little is known about the specific genetic factors involved. A relatively uncommon deficiency of α1-antitrypsin is the only established genetic risk factor for accelerated decline in pulmonary function and for development of chronic obstructive pulmonary disease (COPD), especially in smokers4,6. However, α1-antitrypsin accounts for little of the population variability in pulmonary function4. Candidate gene studies suggest that other genetic variants may influence the time course of pulmonary function and its decline in relation to smoking, but these putative genetic risk factors remain unknown4.

Forced expiratory volume in the first second (FEV1) and its ratio to forced vital capacity (FEV1/FVC) are two clinically relevant pulmonary function measures. Although both FEV1 and FVC are influenced by lung size and can be reduced by restrictive lung diseases, obstructive lung disease leads to proportionately greater reduction in FEV1 than FVC. Therefore, reduced FEV1/FVC, an indicator of airflow obstruction that is independent of lung size, is the primary criterion for defining an obstructive ventilatory defect1. Whereas low FEV1/FVC indicates the presence of airflow obstruction, FEV1 is used to classify the severity and follow the progression of obstructive lung disease over time5,7,8.

The first genome-wide association study (GWAS) for pulmonary function evaluating 70,987 SNPs in about 1,220 Framingham Heart Study (FHS) participants revealed no genome-wide significant loci9. Recently, a GWAS of FEV1/FVC using 2,540,223 SNPs in 7,691 FHS participants identified several SNPs on chromosome 4q31 near HHIP with genome-wide significance10. A GWAS of COPD11 also implicated the HHIP region along with CHRNA3-CHRNA5 on chromosome 15, a region previously associated with nicotine dependence12,13.

We conducted meta-analyses of GWAS results for a cross-sectional analysis of pulmonary function (FEV1/FVC and FEV1) in 20,890 individuals of European ancestry from four Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium14 studies: Atherosclerosis Risk in Communities (ARIC), Cardiovascular Health Study (CHS), FHS and Rotterdam Study (RS-I and RS-II). Given that cigarette smoking is a major risk factor for pulmonary function decline, we conducted meta-analyses with adjustment for smoking status and quantity (pack-years), and in subgroups of those who have ever smoked (ever-smokers) and never-smokers. Loci meeting genome-wide significance and other selected high-signal hits were evaluated for replication with the SpiroMeta consortium, an independent consortium having a combined GWAS sample size of 20,288 participants of European ancestry as described in the companion paper15.

Results

Meta-analyses of CHARGE genome-wide association results

Meta-analyses for FEV1/FVC and FEV1 were conducted using approximately 2,534,500 SNPs in 20,890 CHARGE participants of European ancestry (n = 7,980 from ARIC, n = 3,140 from CHS, n = 7,694 from FHS, n = 1,224 from RS-I and n = 852 from RS-II) and in subgroups of ever- (n = 11,963) and never-smokers (n = 8,927). Characteristics of the cohort participants are presented in Table 1. We applied genomic control, although cohort-specific genomic inflation factors (λgc) were low (for FEV1/FVC ranging from 1.00 (RS-I and RS-II) to 1.05 (ARIC) and for FEV1 ranging from 1.01 (RS-II) to 1.05 (FHS)), suggesting minimal population stratification. The meta-analysis λgc was 1.04 for FEV1/FVC and 1.03 for FEV1 in all participants. Quantile-quantile (Q-Q) plots show large deviations between observed and expected P values for high-signal SNPs in analyses of FEV1/FVC and FEV1 in all participants (Supplementary Fig. 1a,b), FEV1/FVC in never-smokers (Supplementary Fig. 2a), and FEV1 in ever-smokers (Supplementary Fig. 3c). Genome-wide significant associations (P < 5 × 10−8) were found for multiple SNPs in each of these analyses (see Fig. 1a,b for overall analyses and Supplementary Fig. 2b,d and Supplementary Fig. 3b,d for analyses stratified by ever or never smoking). The top 2,000 SNPs associated with each measure, FEV1/FVC and FEV1, beyond genome-wide significance (P > 5 ×10−8) are presented in Supplementary Table 1.

Table 1 Characteristics of cohort participants in the CHARGE consortium at the time of pulmonary function assessment
Figure 1: Meta-analyses of approximately 2,534,500 SNPs tested for association with pulmonary function measures in all participants from the CHARGE Consortium.
figure 1

(a) FEV1/FVC. (b) FEV1. The Manhattan plots (also known as −log10 (P) association plots) show the chromosomal position of SNPs exceeding the genome-wide significance threshold (P < 5 × 10−8 as indicated by the solid black line).

For FEV1/FVC, genome-wide significant associations were seen for 119 SNPs at seven loci (Supplementary Table 2). The SNP with the smallest P value, rs1980057 (P = 4.90 × 10−11), is located on chromosome 4q31.22 81 kb away from the 5′ end of HHIP. There were 27 other genome-wide significant SNPs in the HHIP region (Fig. 2a). Additionally, 69 genome-wide significant SNPs were located in or near the 3′ end of GPR126 on chromosome 6q24.1, with the top SNP (rs3817928) having P = 2.60 × 10−10 (Fig. 2b). Fifty-nine of these 69 GPR126 SNPs were associated with FEV1/FVC at genome-wide significance among never-smokers (Supplementary Table 2). Seven SNPs on chromosome 5q33.3 located in ADAM19 (Fig. 2c), two correlated SNPs on chromosome 6p21.32 (r2 = 0.66, Fig. 2d) located in two genes (AGER and PPT2), four SNPs on chromosome 4q22.1 near the 5′ end of FAM13A (Fig. 2e), two SNPs on chromosome 9q22.32 in PTCH1 (Fig. 2f), and six SNPs on chromosome 2q36.3 near the 3′ end of PID1 (Fig. 2g) were also significantly associated with FEV1/FVC in all participants. SNPs in AGER, PPT2, PTCH1 and PID1 had minor allele frequencies (MAFs) between 4% and 10%, whereas all other significantly associated SNPs had MAFs exceeding 10%. Absolute values of β (per-allele change) for FEV1/FVC ranged from 0.44% to 1.14%. The directions of β were consistent across the CHARGE cohorts for all genome-wide significant SNPs except for the GPR126 SNPs noted in Supplementary Table 2. A borderline significant association (P = 5.37 × 10−8, MAF = 0.42, β = −0.43) with FEV1/FVC was noted for the chromosome 5q33.1 SNP rs11168048 in HTR4 (Fig. 2h). Cohort-specific association results for SNPs with the smallest P value from each locus implicated at or near genome-wide significance are shown in Supplementary Table 3.

Figure 2: Regional association plots for loci associated with FEV1/FVC in the CHARGE consortium at or near genome-wide significance.
figure 2

(a-h) Loci included HHIP on chromosome 4q31.22 (a), GPR126 on chromosome 6q24.1 (b), ADAM19 on chromosome 5q33.3 (c), AGER-PPT2 on chromosome 6p21.32 (d), FAM13A on chromosome 4q22.1 (e), PTCH1 on chromosome 9q22.32 (f), PID1 on chromosome 2q36.3 (g) and HTR4 on chromosome 5q33.1 (h). For each locus, correlations between the target SNP (the SNP with the lowest P value, depicted in black) and other SNPs in the region are depicted in red when r2 = 1, blue when 0.8 ≤ r2 < 1, yellow when 0.5 ≤ r2 < 0.8, orange when 0.2 ≤ r2 < 0.5 and white when r2 < 0.2. The r2 values were based on the HapMap CEU population. Gene annotations are shown in green, and estimated recombination rates from HapMap are shown in light blue.

For FEV1, genome-wide significant associations were observed for 46 SNPs on chromosome 4q24 in or near four adjacent genes (Supplementary Table 4). The SNP with the smallest P value, rs17331332 (P = 4.00 × 10−10), is located near NPNT. The 45 other significantly associated SNPs include four SNPs located near the 5′ end of NPNT, 5 SNPs located in INTS12 or near its 3′ end, seven SNPs located in FLJ20184 or near its 3′ end and 29 SNPs located in GSTCD. FLJ20184 encodes a hypothetical protein, according to several genome browsers including the UCSC Genome Browser (see URLs)16, but there is no approved Human Gene Organization (HUGO) gene name for this locus17. The SNP rs17331332 is correlated at r2 > 0.5 with most other significantly associated SNPs in this region (Fig. 3), suggesting that the associations in the four adjacent genes represent one independent finding. The significantly associated SNPs had MAFs between 6% and 8%. The absolute β (per-allele change) values for FEV1 ranged from 55.92 to 71.43 ml (Supplementary Table 4), and the β directions were consistent across the CHARGE cohorts for all 46 genome-wide significant SNPs (Supplementary Table 3 for rs17331332). Among these 46 SNPs, 39 were associated with FEV1 at genome-wide significance among ever-smokers (Supplementary Table 4).

Figure 3: Regional association plot for the chromosome 4q24 locus associated with FEV1 in the CHARGE consortium at genome-wide significance, which includes FLJ20184, INTS12, GSTCD and NPNT.
figure 3

Correlations between the target SNP (the SNP with the lowest P value, depicted in black) and other SNPs in the region are depicted in red when r2 = 1, blue when 0.8 ≤ r2 <1, yellow when 0.5 ≤ r2 < 0.8, orange when 0.2 ≤ r2 < 0.5 and white when r2 < 0.2. The r2 values were based on the HapMap CEU population. Gene annotations are shown in green, and estimated recombination rates from HapMap are shown in light blue.

To evaluate whether other loci may also influence pulmonary function, we created Q-Q plots for FEV1/FVC and FEV1 among all participants after removing SNPs at or close to genome-wide significance and nearby SNPs correlated at r2 > 0.2 with the top SNP for each locus (totaling 1,862 SNPs removed for FEV1/FVC and 284 SNPs removed for FEV1). The resulting Q-Q plots show some excess of small P values for FEV1/FVC (Supplementary Fig. 4a) and FEV1 (Supplementary Fig. 4b) over expectation.

Putative functional polymorphisms

Three SNPs among the 119 genome-wide significant SNPs for FEV1/FVC are nonsynonymous (missense) polymorphisms: rs11155242 (resulting in a lysine-to-glutamine substitution) in GPR126, rs1422795 (serine-to-glycine substitution) in ADAM19 and rs2070600 (glycine-to-serine substitution) in AGER. The Polymorphism Phenotyping (PolyPhen) program (see URLs)18 predicts that the amino acid substitutions resulting from rs11155242 and rs1422795 cause benign changes but that rs2070600 has a possibly damaging impact on the structure and function of AGER.

All other SNPs implicated for FEV1/FVC or FEV1 are intergenic, intronic or located in 3′ untranslated regions. Of these, three intronic GPR126 SNPs (rs9496346, rs1040525 and rs6929442) and one intergenic SNP near NPNT (rs10516529) are located in transcription factor binding sites, according to the UCSC Genome Browser (see URLs)16.

Replication with the SpiroMeta consortium

Thirty high-signal SNPs associated with FEV1/FVC (18 SNPs from eight loci) or FEV1 (12 SNPs from three loci) at or close to genome-wide significance were tested for replication in the SpiroMeta consortium, with results reported in a companion paper15. We evaluated these SNPs in 16,178 SpiroMeta participants of European ancestry with complete quantitative smoking data using the CHARGE analytic method (see statistical analysis section of Online Methods), which included adjustment for smoking status and pack-years, and performed joint meta-analyses of CHARGE GWAS and SpiroMeta replication results (Tables 2 and 3). P values that exceeded the significance threshold in SpiroMeta (P < 8.33 × 10−4 based on 60 tests) or the genome-wide significance threshold in joint meta-analyses (P < 5 × 10−8) were considered significant evidence for replication at their corresponding SNPs.

Table 2 Joint meta-analysis of SNPs selected from the top 8 loci implicated for FEV1/FVC in the CHARGE GWAS and tested for replication with FEV1/FVC in the SpiroMeta consortium15
Table 3 Joint meta-analysis of SNPs selected from the top three loci implicated for FEV1 in the CHARGE GWAS and tested for replication with FEV1 in the SpiroMeta consortium15

For FEV1/FVC, among 18 SNPs tested for replication, six SNPs in three loci were significantly associated with this measure in SpiroMeta: rs1980057 and rs1032295 near HHIP (r2 = 0.72), rs2070600 in AGER and rs10947233 in PPT2 (r2 = 0.66), and rs11168048 and rs7735184 in HTR4 (r2 = 0.93) (Table 2). Their joint meta-analysis P values ranged from 3.21 × 10−20 to 6.23 ×10−11 (Table 2). Five additional SNPs in GPR126 (rs3817928, rs7776375 and rs6937121) and ADAM19 (rs2277027 and rs1422795) were not significantly associated with FEV1/FVC at genome-wide significance in SpiroMeta alone, but these SNPs were associated at genome-wide significance in the joint meta-analysis, with P values ranging from 9.93 × 10−11 to 1.25 × 10−8 (Table 2). For replicated SNPs, the allele frequencies and the direction and magnitude of the associations with FEV1/FVC were similar between consortia (Table 2). Further, the HHIP, ADAM19 and HTR4 SNPs were significantly associated with FEV1 in SpiroMeta (Supplementary Table 5). The HHIP SNP rs1980057 and the HTR4 SNPs rs11168048 and rs7735184 were also associated with FEV1 at genome-wide significance in the joint meta-analysis (P ranging from 5.86 × 10−9 to 1.58 × 10−8, Supplementary Table 5). SNPs in FAM13A, PTCH1 and PID1 that gave genome-wide significance in CHARGE were not confirmed in analyses with SpiroMeta.

For FEV1, among the 12 SNPs tested for replication, eight SNPs from one locus with four adjacent genes were significantly associated with this measure in SpiroMeta, including rs17331332 and rs17036341 near NPNT, rs11727189 and rs17036090 in or near INTS12, rs17036052 and rs17035960 in or near FLJ20184, and rs11097901 and rs11728716 in GSTCD (Table 3). For replicated SNPs, the allele frequencies and the direction and magnitude of the associations with FEV1 were similar between consortia, and P values from joint meta-analysis ranged from 4.66 × 10−17 to 9.42 × 10−14 (Table 2). None of these SNPs were significantly associated with FEV1/FVC in CHARGE or SpiroMeta (Supplementary Table 5).

Associations in individuals with normal pulmonary function

To address whether the genetic associations hold even among people with normal pulmonary function, we repeated the meta-analyses after excluding individuals with asthma or COPD, leaving 17,855 individuals (n = 6,912 from ARIC, n = 2,634 from CHS, n = 6,371 from FHS, n = 1,126 from RS-I and n = 812 from RS-II). Asthma was defined by self-report of ever having asthma or self-report of ever having physician-diagnosed asthma. COPD was defined spirometrically as having both FEV1/FVC and FEV1 less than the lower limit of normal values using National Health and Nutrition Examination Survey III prediction equations19,20. Comparing the original meta-analyses to the meta-analyses with exclusions for asthma and COPD, β estimates were highly correlated for the high-signal SNPs tested for replication (Pearson's r > 0.99 for 18 FEV1/FVC SNPs and 12 FEV1 SNPs). β estimates remained highly correlated for SNPs with P values as high as 0.01 in the original meta-analyses (r = 0.92 for FEV1/FVC and r = 0.96 for FEV1). As expected, there was some attenuation in P values for many of the SNPs in our implicated loci given the substantial power loss due to both reduced sample size and the truncation of the FEV1/FVC and FEV1 distributions, but there was substantial overlap in the top-ranking SNPs between the two meta-analyses (data not shown). The P values for some top-ranking SNPs became smaller, including several ADAM19, FAM13A and HTR4 SNPs associated with FEV1/FVC. Of note, 12 SNPs in HTR4, a locus with one SNP, rs11168048, showing borderline genome-wide significance in the original meta-analysis showed genome-wide significance in the subset of individuals without asthma or COPD (P = 6.93 × 10−9 for rs11168048).

Discussion

In meta-analyses of GWAS results in 20,890 CHARGE participants of European ancestry, we identified genome-wide significant associations with FEV1/FVC for SNPs in seven previously unrecognized independent loci (GPR126, ADAM19, AGER-PPT2, FAM13A, PTCH1, PID1 and HTR4) and with FEV1 for one previously unrecognized independent locus annotated by at least three genes (INTS12-GSTCD-NPNT). The SpiroMeta consortium independently reported genome-wide significant associations of GSTCD, HTR4, AGER, TNS1 and THSD4 with FEV1/FVC and FEV1 (companion paper in this issue, ref. 15). Both consortia confirm previous GWAS findings implicating the HHIP region for FEV1/FVC10.

Several SNPs near the gene encoding the hedgehog interacting protein (HHIP) were associated with FEV1/FVC at genome-wide significance in CHARGE and SpiroMeta, confirming earlier GWAS findings in FHS10. The hedgehog (Hh) signaling pathway is crucial in several embryonic development processes, including the branching morphogenesis of the lung21,22. Furthermore, several polymorphisms in three genes in the Hh signaling pathway (IHH, HHIP and PTCH1) were significantly associated with adult height in a previous GWAS23. Several PTCH1 SNPs were also significantly associated with FEV1/FVC in CHARGE, but these associations were not confirmed in SpiroMeta15. Epithelial cells produce Hh protein, which binds to its membrane receptor (encoded by PTCH1) on mesenchymal cells and orchestrates tissue and organ patterning. Hh pathway dysfunction during the fetal stage in humans is responsible for severe lung malformations24,25. In adults, the Hh-signaling pathway may participate in the response of the airway epithelium to injury, such as smoking and hyperoxia26,27.

A nonsynonymous AGER SNP (rs2070600) was associated with FEV1/FVC at genome-wide significance in our study and independently confirmed in SpiroMeta15. The AGER protein, a membrane-bound or soluble pattern-recognition receptor, belongs to the immunoglobulin superfamily of cell surface receptors. The SNP rs2070600 has functional significance: for example, it promotes higher ligand affinity and the production of proinflammatory proteins upon activation28. In healthy adult mice and humans, AGER is highly expressed in the lungs29, and its absence contributes to the pathogenesis of idiopathic pulmonary fibrosis30,31. AGER signaling is involved in host defense, inflammation and tissue remodeling, processes that are relevant to accelerated decline in pulmonary function with age.

Polymorphisms in HTR4 were associated with FEV1/FVC at genome-wide significance in the joint meta-analysis of CHARGE and SpiroMeta results. HTR4 encodes a G-coupled transmembrane receptor that regulates cyclic AMP production in response to 5-hydroxytryptamine (serotonin). Elevated levels of free serotonin have been found in the plasma of individuals with symptomatic asthma32, and serotonin signaling pathways involving HTR4 have been implicated in cholinergic and immune-mediated airway reactivity33,34. Upon activation by serotonin, HTR4 in human airway epithelial cells regulates the release of a proinflammatory cytokine, a signature characteristic of asthma35.

ADAM19 SNPs were associated with FEV1/FVC at genome-wide significance in CHARGE and in the joint meta-analysis with SpiroMeta. ADAM19 is a member of 'a disintegrin and metalloprotease' (ADAM) family of membrane-anchored glycoproteins that control cell-matrix interactions and help regulate growth and morphogenesis. Polymorphisms in a gene that encodes for another ADAM family member, ADAM33, have been associated with bronchial hyper-responsiveness and accelerated lung function decline in those with asthma and in the general population36,37,38. ADAM19 has not been previously implicated in human pulmonary disorders, but it is abundantly expressed in alveolar epithelial cells and bronchial smooth muscle tissue39.

GPR126 polymorphisms were associated with FEV1/FVC at genome-wide significance in CHARGE and in the joint meta-analysis with SpiroMeta. GPR126 belongs to a superfamily of G protein–coupled receptors involved in cell adhesion and signaling40. Although the precise function of this gene has not been elucidated, its expression in mice is temporally increased during embryonic organ development and is highest in the adult lungs41. In humans, recent GWAS have linked GPR126 variants with adult height, and more specifically with trunk height42,43,44. We adjusted all analyses for standing height. Therefore, we repeated analyses after adjusting for sitting height (a more reliable indicator of trunk height) for GPR126 SNPs in ARIC, where both height variables were measured, and associations with FEV1/FVC remained significant (data not shown). Thus, these associations are not likely to be due to residual confounding by trunk height.

Genome-wide significant associations with FEV1 were observed in CHARGE for numerous SNPs spanning at least three genes on chromosome 4q24, and these associations were significant for all eight SNPs tested for replication in SpiroMeta (Table 3). There is moderate to strong linkage disequilibrium among the chromosome 4q24 SNPs, and the specific genes influencing FEV1 remain speculative. The genes are ordered INTS12-GSTCD-NPNT along chromosome 4q24, and joint meta-analysis with SpiroMeta showed that SNPs from the genes INTS12 and GSTCD had the most significant associations with FEV1 (Table 3). The product of INTS12 is a subunit of the integrator complex that associates with the C-terminal domain of RNA polymerase II and mediates 3′-end processing of small nuclear RNAs45. The glutathione S-transferase C-terminal domain (GSTCD) could influence lung function via mechanisms involving the detoxification by glutathione S-transferases of xenobiotics that might damage the lungs.

The most distal gene in the chromosome 4q24 region, NPNT, encodes nephronectin, which is expressed in fetal and adult lungs46,47. The NPNT SNP rs10516529 is located in a binding site for the transcription factor POU6F1 (also known as mPOU homeobox protein), which is known to be expressed in adult lungs and hypothesized to play a role in lung development48,49,50. A fourth predicted gene in the region, FLJ20184, is located proximal to the other three genes. Although FLJ20184 encodes a hypothetical protein of unknown function, FLJ20184 contains allelic variants associated with successful smoking cessation in a GWAS of patients in smoking cessation trials51.

The genetic factors identified here gave estimated effect sizes consistent with those for well-established risk factors for pulmonary function decline. Carrying one copy of an implicated reference allele resulted in a FEV1 difference ranging from 50–70 mL. These effect sizes correspond to approximately 2.8–3.9 years of age-related decline in pulmonary function based on a mean decline of about 18 mL/year and to approximately 1.7–2.3 years of active smoking-related decline based on a mean decline of about 30 mL/year52. Second-hand smoke exposure has also been associated with decline in FEV1 (15 mL decline for a 10-year exposure in the home and 41 mL decline for a 10-year workplace exposure)53. For FEV1/FVC, carrying one copy of an implicated reference allele resulted in a difference ranging from 0.30% to 1%. The lower effect-size estimates are comparable with the mean FEV1/FVC decline related to second-hand smoking (0.35 for a 10-year exposure in the home and 0.14 for a 10-year workplace exposure)53. These comparisons demonstrate that the identified genetic factors have a moderate impact on pulmonary function. Individuals carrying these polymorphisms will have lower pulmonary function than predicted at a given age, thus placing them at greater risk for developing COPD and at a greater risk of mortality2,3.

A GWAS of COPD identified CHRNA3-CHRNA5 on chromosome 15 as a susceptibility locus11. CHRNA3-CHRNA5 has also been associated with nicotine dependence12,13. In CHARGE, one identified SNP in this locus (rs1051730) was associated with FEV1/FVC (P = 0.00070) and FEV1 (P = 0.016), whereas the other identified SNP in this locus (rs8034191) was not associated with FEV1/FVC (P = 0.11) or FEV1 (P = 0.36). The nominal evidence for replication may reflect differences in study design and a potential gene-environment interaction involving smoking.

Our study has several important strengths. The CHARGE cohorts are well phenotyped with pulmonary function measures passing stringent quality control criteria, thus minimizing measurement error. Our large sample size of 20,890 participants offers a powerful resource to examine associations of common SNPs with modest to large effects14. Testing our most significant results in the SpiroMeta consortium provided independent replication of these associations. However, even with the large sample sizes in these combined consortia, we likely have insufficient power to detect associations of polymorphisms with small effect sizes or low frequencies.

Population-based cohorts are subject to population stratification, and analytic steps were taken to minimize this potential bias. Cohort-specific λgc values were low (1.00–1.05), and a genomic control adjustment was made in the meta-analyses to reduce inflation in the test statistics. The two largest cohorts, with the largest (albeit modest) λgc values (ARIC and FHS), incorporated principal components as potential confounders in their cohort-specific association tests. Although we cannot eliminate the possibility that some findings are subject to residual confounding by population stratification, the Q-Q plots showing deviations between observed and expected P values for many high- to moderate-signal SNPs and the replication of association for multiple top loci in SpiroMeta suggest a multifactorial influence on pulmonary function.

Our study identified several previously unrecognized loci related to two clinically important pulmonary function measures with evidence for replication, including GPR126, ADAM19, AGER-PPT2 and HTR4 for FEV1/FVC and INTS12-GSTCD-NPNT for FEV1 and confirmed previous reports of association with FEV1/FVC in the HHIP region. These loci include genes with biologically plausible functions, and their identification here encourages future investigations to examine the mechanisms underlying their influence on pulmonary function. Fine mapping of these regions is needed to identify and characterize functional variants. Understanding the genetic determinants of pulmonary function is paramount in identifying the biological mechanisms that lead to its decline and in ultimately lessening the mortality burden associated with reduced pulmonary function.

Methods

Pulmonary function measurements.

Study design details of the participating CHARGE cohorts are described elsewhere14,54,55,56,57,58,59. Study protocols were approved by the relevant institutional review boards, and all participants provided written informed consent.

Pulmonary function testing was conducted by trained spirometry technicians at a single visit for RS and at more than one visit for ARIC, CHS and FHS. FEV1/FVC and FEV1 measures meeting American Thoracic Society or European Respiratory Society criteria for acceptability were tested for association with SNPs in participants of European ancestry who were successfully genotyped and provided informed consent for genetic testing.

In ARIC and CHS, pulmonary function measures and questionnaire data from the baseline visit were analyzed. ARIC measurements were made with a Collins Survey II water-seal spirometer (Collins Medical, Inc.) and Pulmo-Screen II software (PDS Healthcare Products, Inc.)60. CHS measurements were made with a Collins Survey I water-seal spirometer (Collins Medical, Inc.) and software from S&M Instruments61,62.

In three generations of families participating in the FHS, data from the most recent examination were analyzed. Eligible examinations providing spirometry and questionnaire data included examinations 13, 16, 17 and 19 in the original cohort (in approximate 2-year intervals); examinations 3, 5, 6 and 7 in the offspring generation (in approximate 4-year intervals); and the one examination completed to date for the third generation. Equipment used in the standard protocol evolved as technology improved over the decades of study63. A Collins Survey water-filled spirometer (Collins Medical, Inc.) was used for most examinations, with measurements made by Eagle II microprocessor (Collins Medical, Inc.) or by software from the S&M Instruments. In more recent examinations, a Collins Comprehensive Pulmonary Laboratory dry rolling-seal spirometer and Collins 2000 Plus/SQL Software (Collins Medical, Inc.) were used.

In RS, pulmonary function was measured at the fourth center visit of participants from the original cohort (RS-I) and the second center visit of participants from the first extension cohort (RS-II). Spirometry was performed using a SpiroPro portable spirometer (Erich Jaeger GmbH)64,65.

Genotyping, imputation and quality control.

Different genotyping platforms were used across the cohorts14 (Table 1). Imputation was conducted using either MACH66 or BIMBAM67 to generate approximately 2.5 million autosomal SNP genotype dosages for meta-analysis. The imputation methods perform similarly, although MACH generally produces higher accuracy rates than the imputation process used in BIMBAM (fastPHASE)68. Differing imputation methods across cohorts is not a source of bias for meta-analysis because all comparisons using the imputed data are within-cohort comparisons.

ARIC. Among 8,861 self-identified white ARIC participants genotyped, 8,127 participants remained in the study after exclusions for call rate <95%, genotypic and phenotypic sex mismatch, discordances with previous genotype data, suspected first-degree relative of an included individual based on genotype data, more than 8 s.d. for any of the first ten principal components using EIGENSTRAT69, or outlying average identity-by-state estimates using PLINK70. Of these, 7,980 participants had available pulmonary function measures and complete covariate information.

A total of 704,588 autosomal genotyped SNPs remained after exclusions for call rate < 95%, MAF < 1%, Hardy-Weinberg equilibrium (HWE) P < 10−5 or lack of strand annotation. MACH (version 1.00.16)66 was used to impute all autosomal SNPs with reference to HapMap CEU (release 21, build 35)71 from these 704,588 SNPs. Imputed SNPs failing additional quality control criteria (monomorphism, HWE P < 10−6, or genotype frequencies between two genotyping phases differing by P < 10−6) were excluded, leaving 2,515,866 genotyped or imputed SNPs for analysis.

CHS. CHS genotyped 3,980 participants free of cardiovascular disease at baseline with available DNA and consent to genetic testing. After exclusions for call rate < 95%, sex mismatch or discordance with prior genotyping, 3,291 self-identified white participants remained. Of these, 3,140 had pulmonary function measures and complete covariate information.

A set of 306,655 autosomal genotyped SNPs remained after exclusions for call rate <97%, HWE P < 10−5, more than two duplicate errors or mendelian inconsistency (for reference HapMap CEU trios)71, heterozygote frequency > 0 or no mapping in dbSNP. Imputation of autosomal SNPs was based on these 306,655 SNPs using BIMBAM (version 0.99)67 with reference to HapMap CEU (release 22, build 36)71. The analysis dataset included 2,543,887 genotyped or imputed SNPs.

FHS. A total of 8,481 participants remained after exclusions for call rate < 97%, heterozygosity >5 s.d. from the mean, or excessive non-inheritance. The analysis dataset included 7,694 participants with complete spirometry and covariate data.

MACH (version 1.00.15)66 was used for imputation based on 378,163 autosomal SNPs remaining after exclusions for HWE P < 10−6, call rate < 97%, differential missingness related to genotype (mishap procedure in PLINK70) with P < 10−9, mendelian errors > 100, MAF < 1% or absence from HapMap. Two hundred unrelated individuals with high call rates were used to infer model parameters, which were subsequently applied to all 8,481 individuals. Imputation, using HapMap CEU (release 22, build 36)71, produced genotype dosages on 2,543,887 genotyped or imputed SNPs.

RS. All RS participants with available DNA were genotyped; 5,974 RS-I participants and 2,157 RS-II participants remained after exclusion for call rate < 97.5%, excess autosomal heterozygosity, sex mismatch or outlying identity-by-state clustering estimates. Of these, 1,224 RS-I participants and 852 RS-II participants had pulmonary function measures and complete covariate information.

After exclusions for call rate < 98%, HWE P < 10−6, and MAF < 1%, 512,349 autosomal SNPs in RS-I and 466,389 autosomal SNPs in RS-II were used for imputation in MACH (version 1.00.15 for RS-I and 1.00.16 for RS-II)66 with reference to the 2,543,887 SNPs of the HapMap CEU (release 22, build 36)71.

Statistical analysis.

In cross-sectional analyses, FEV1/FVC and FEV1 were tested for association with SNP genotypes using a 1-degree-of-freedom additive model of the dosage value (estimated reference allele count with a fractional value ranging from 0 to 2.0) as a predictor in linear regression models. Associations were examined overall and stratified into ever- and never-smokers. Overall models were adjusted for age, sex, standing height, smoking status (current, past or never-smoker) and pack-years of smoking. Current, past or never smoking was based on questionnaire responses, and pack-years were calculated for current and past smokers by multiplying smoking dose (packs per day) and duration (years smoked). Stratified models used the same covariates as the overall models, except that the ever-smoker stratum included adjustment for smoking status as current or past and the never-smoker stratum included no smoking-related covariates. Additional study-specific covariates included recruitment cohort (FHS), recruitment center (ARIC and CHS) and principal component eigenvalues for population stratification adjustments (ten components for ARIC and statistically significant components for FHS). Models were implemented using ProbABEL72 in ARIC, R73 in CHS, linear mixed effects models with fixed effects for SNPs and random effects for individuals correlated within families74 in FHS, and MACH2QTL66 in RS as implemented in GRIMP75. In FHS, the kinship package in R generated a covariance matrix for each family based on the kinship coefficient for each relative pair. The kinship matrix, which includes the full set of family-specific covariance matrices, specified the covariance matrix for the random effects.

GWAS results from the four cohorts were combined using inverse variance weighted meta-analysis in METAL (see URLs). Meta-analysis was performed on approximately 2,534,500 SNPs after applying genomic control for each study and filtering out SNPs with extremely low imputation quality ratios (<0.01) and MAFs (<1%).

The genome-wide significance threshold was defined a priori as P < 5 × 10−8, the Bonferroni adjustment for 1 million independent tests76. Information on SNP function and position relative to genes, microRNA and transcription factor binding sites was obtained using a Perl script (J.B.W.) that queries tables of the UCSC genome browser16 (hg18, March 2006 genome build, see URLs). Functional effects of nonsynonymous SNPs on protein structure and function were predicted using PolyPhen (see URLs)18.

Replication in the SpiroMeta consortium.

We exchanged 30 SNPs for replication testing with the SpiroMeta consortium (companion manuscript15). No additional genotyping was required, as these SNPs were available from the SpiroMeta GWAS. We aimed to select two SNPs from each of the top genes implicated for FEV1/FVC or FEV1, with nearly all the genes exceeding genome-wide significance. The SNP with the lowest P value in or near each gene was selected. A second SNP, genotyped (instead of imputed) in at least one cohort, was selected with preference for nonsynonymous SNPs and SNPs not in strong linkage disequilibrium with the first selected SNP. Only one SNP was available for AGER, PPT2, TSPYL4 and NT5DC1. Four SNPs were selected from two linkage disequilibrium blocks for the largest gene, GPR126. In total, 18 SNPs from 9 genes (8 independent loci) implicated for FEV1/FVC and 12 SNPs from 7 genes (3 independent loci) implicated for FEV1 were tested for replication.

Unlike CHARGE, SpiroMeta used normalized residuals as phenotypes, adjusted for age2 rather than age, and did not adjust for smoking. For better comparison, SpiroMeta conducted modified analyses following the CHARGE analytic method described above in 16,178 participants from adult cohorts with complete quantitative smoking data available. Results from the CHARGE GWAS and SpiroMeta replication were combined in a joint meta-analysis using inverse variance weighting with METAL. SpiroMeta results with P < 8.33 × 10−4, based on an overly conservative Bonferroni correction for 60 tests (30 SNPs tested for association with two traits, FEV1/FVC and FEV1), or joint meta-analysis results with P < 5 × 10−8 (genome-wide significance threshold), were considered statistically significant.

URLs.

METAL, http://www.sph.umich.edu/csg/abecasis/metal/; UCSC Genome Browser, http://genome.ucsc.edu/; Polymorphism Phenotyping (PolyPhen), http://genetics.bwh.harvard.edu/pph/.