A Comprehensive Evaluation of Potential Lung Function Associated Genes in the SpiroMeta General Population Sample

Rationale Lung function measures are heritable traits that predict population morbidity and mortality and are essential for the diagnosis of chronic obstructive pulmonary disease (COPD). Variations in many genes have been reported to affect these traits, but attempts at replication have provided conflicting results. Recently, we undertook a meta-analysis of Genome Wide Association Study (GWAS) results for lung function measures in 20,288 individuals from the general population (the SpiroMeta consortium). Objectives To comprehensively analyse previously reported genetic associations with lung function measures, and to investigate whether single nucleotide polymorphisms (SNPs) in these genomic regions are associated with lung function in a large population sample. Methods We analysed association for SNPs tagging 130 genes and 48 intergenic regions (+/−10 kb), after conducting a systematic review of the literature in the PubMed database for genetic association studies reporting lung function associations. Results The analysis included 16,936 genotyped and imputed SNPs. No loci showed overall significant association for FEV1 or FEV1/FVC traits using a carefully defined significance threshold of 1.3×10−5. The most significant loci associated with FEV1 include SNPs tagging MACROD2 (P = 6.81×10−5), CNTN5 (P = 4.37×10−4), and TRPV4 (P = 1.58×10−3). Among ever-smokers, SERPINA1 showed the most significant association with FEV1 (P = 8.41×10−5), followed by PDE4D (P = 1.22×10−4). The strongest association with FEV1/FVC ratio was observed with ABCC1 (P = 4.38×10−4), and ESR1 (P = 5.42×10−4) among ever-smokers. Conclusions Polymorphisms spanning previously associated lung function genes did not show strong evidence for association with lung function measures in the SpiroMeta consortium population. Common SERPINA1 polymorphisms may affect FEV1 among smokers in the general population.

Objectives: To comprehensively analyse previously reported genetic associations with lung function measures, and to investigate whether single nucleotide polymorphisms (SNPs) in these genomic regions are associated with lung function in a large population sample.
Methods: We analysed association for SNPs tagging 130 genes and 48 intergenic regions (+/210 kb), after conducting a systematic review of the literature in the PubMed database for genetic association studies reporting lung function associations.
Conclusions: Polymorphisms spanning previously associated lung function genes did not show strong evidence for association with lung function measures in the SpiroMeta consortium population. Common SERPINA1 polymorphisms may affect FEV 1 among smokers in the general population.

Introduction
Pulmonary function is usually assessed by measurement of forced expiratory volume in one second (FEV 1 ), forced vital capacity (FVC), and the ratio of FEV 1 to FVC. The measurements are integral to the diagnosis of chronic obstructive pulmonary disease (COPD), and also are important long term predictors of population morbidity and mortality [1]. Reduced FEV 1 /FVC defines airways obstruction; whereas reduced FEV 1 grades the severity of obstruction [2].
Pulmonary function is determined by both environmental and genetic factors. Tobacco smoking is the major environmental risk factor for the development of COPD. A genetic contribution to pulmonary function is well established with heritability estimates reaching 77 percent for FEV 1 [3]. Linkage analyses within families have previously identified multiple genomic regions associated with spirometry measures and respiratory diseases. In addition, candidate gene studies have identified more than 100 genes which have been suggested to contribute to variability in lung function. The majority have been studied because of their potential pathophysiological role in the development of COPD. Some genes have been examined for association with lung function measurements in individuals with other specific respiratory diseases (most commonly asthma), or to a lesser extent, in the general population. With the exception of SERPINA1, which is the best documented genetic risk factor to influence the development of COPD [4], these genes have not shown consistent associations across different studies [5,6].
The identification of these genes offers potential insight into the pathophysiology of altered lung function. The SpiroMeta consortium provides a powerful resource in which to study genetic associations with lung function. We aimed to comprehensively evaluate whether genes studied in candidate gene or small genome-wide association studies, and reported to be associated with lung function or COPD in these studies, were associated with lung function measures in this large general population sample.

Literature search
The literature search identified 1719 publications. Of these, 104 reported one or more genetic associations: these are listed in text S1 in the online supporting information. These publications varied according to their study designs and the populations studied. 47 papers reported association with COPD using case control or family based designs. The remaining literature reported association with lung function traits within populations with specific respiratory diseases (asthma (26) and COPD (17)), or in general population cohorts (14). Nine publications studied other populations which included patients with cystic fibrosis (2), SERPINA1 deficiency (2), cotton and grain workers (2), lung cancer (1), fire fighters (1) and post myocardial infarction (MI) patients (1). Some papers reported more than one endpoint.
These 104 relevant publications identified 130 genes and 48 intergenic SNPs. We investigated association between FEV 1 and FEV 1 /FVC and each of the 16,936 genotyped and imputed SNPs spanning these regions in the SpiroMeta dataset.

Contribution of all tested genes to lung function measures in SpiroMeta
Quantile-quantile (Q-Q) plots did not show large deviations between observed and expected P values for FEV 1 and FEV 1 / FVC in all participants and for FEV 1 /FVC in ever-smokers ( Figure 1). The plot of FEV 1 in ever-smokers, however, shows slight deviations for high signal SNPs. The genomic inflation factor, l for FEV 1 is 0.83 in all individuals and 1.05 in smokers; l for FEV 1 /FVC in all individuals is 0.92 and 1.13 in smokers.
Using the Bonferroni corrected P value threshold of 1.3610 25 , none of the tested SNPs demonstrated significant association with either FEV 1 or FEV 1 /FVC.

Association results in all individuals
In order to examine possible signals in greater detail, we also explored region plots for the top SNPs identified (SNPs with the lowest P values). The three top loci with the most significant P values for all regions tested in all individuals are presented in table 1.
Among all individuals, the strongest association with FEV 1 was with rs204652 in MACRO domain containing 2 (MACROD2) on chromosome 20. SNP rs17133553 in Contactin 5 (CNTN5) on chromosome 11 was the second top locus for association with FEV 1 and third for FEV 1 /FVC ratio. SNP rs803450 in Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1like (MTHFD1L) on chromosome 6 showed association with FEV 1 in all individuals. For FEV 1 /FVC ratio in all individuals the strongest association was with rs3887893 in ATP-binding cassette, sub-family C, member 1 (ABCC1) on chromosome 16, the second strongest signal was for rs11155818 in estrogen receptor 1(ESR1) on chromosome 6.
The region association plots around the most significant SNPs associated with FEV 1 and FEV 1 /FVC in all individuals provide little evidence from supporting SNPs to suggest strong regions of association in MACROD2, CNTN5, MTHFD1L, and ESR1, and ABCC1 in these data (See figure S1 in the online supporting information).

Association results in ever-smokers
To study the impact of smoking on potential genetic associations with lung function, we repeated the analysis restricted to individuals who had ever smoked (ever-smokers). The most significant loci identified are shown in table 1.
The strongest association with FEV1/FVC ratio among smokers was observed with rs9322335 in 1(ESR1) on chromosome 6. The second strongest association was rs1864271 in with rhomboid domain containing 1 (RHBDD1) on chromosome 2, followed by rs1738567 in (MTHFD1L) on chromosome 6. The region association plots for SERPINA1 and PDE4D among ever-smokers (figure 2) show some supportive evidence for the association of these two loci. The region association plots for the additional loci among ever-smokers reported in table 1 are shown in figure S2 in the online supporting information.

Association results excluding loci identified in previous GWAS
Because some of the regions identified were observed in the previously published small GWAS studies included in our   literature search, we also present the top three genes for the relevant end points after excluding GWAS hits ( Table 2). The additional genes identified in this analysis for association with FEV 1 among all individuals were the transient receptor potential cation channel, subfamily V, member 4 (TRPV4) on chromosome 12, and N-acetyltransferase 2 (NAT2) on chromosome 8. Among ever-smokers, association results for FEV 1 identified B-cell CLL/lymphoma 2 (BCL2) on chromosome 18. Association results for FEV 1 /FVC ratio identified allograft inflammatory factor 1(AIF1) on chromosome 6 among all individuals, and cluster of differentiation; CD22 molecule (CD22) on chromosome 19 among ever-smokers. The region association plots for the most significant loci in table 2 and not presented earlier are shown in figure S3 in the online supporting information. The plots show some additional support for all presented loci except for ABCC1 among ever-smokers.

Discussion
In the SpiroMeta study, we generated a comprehensive dataset to analyse associations between genetic variants and lung function in the general population [7]. There have been many small previous studies, mostly of individual candidate genes examining association with lung function, which have produced conflicting results. Therefore, in this paper, we undertook a comprehensive literature review to identify relevant gene regions and analysed potential associations with FEV 1 and FEV 1 /FVC ratio in all individuals within SpiroMeta. In addition, given the impact of smoking on lung function, we also analysed the associations separately in ever-smokers. There were no strong association signals in never-smokers group (data available on request).
The main conclusion from this study is that, within 178 previously reported regions, we found no SNP associations which exceeded the significance threshold (P,1.3610 25 ) we employed after correction for multiple testing. Our results suggest these regions do not constitute major genetic determinants of lung function measures at the general population level. The lack of replication and sometimes contradicting results in previous studies may reflect the fact that many previously reported associations came from studies with small sample sizes, possibly leading to false positive results.
Despite the failure to identify any overall significant contribution of a single SNP from previously reported genes to lung function, there are some potentially interesting signals apparent from the region plots suggesting that there may be a small signal from variants in some of the genes of interest.
SERPINA1 showed the strongest association with FEV 1 among smokers (8.41610 25 ). It encodes alpha-1 Antitrypsin protein (AAT), mainly produced in the liver and has the primary role of inhibiting neutrophil elastase in the lungs [11]. Protein variants of this gene have been classified based on their migration in an isoelectric pH gradient from A to Z. Among Caucasians, the M allele is the most common allele with six subtypes: M1-M6 with allele frequencies greater than 95 percent and associated with normal AAT levels. The common deficiency variants; S (frequency 0.02-0.03) and Z (frequency 0.01-0.03), are associated with mild and severe reductions in serum AAT levels, respectively [11,12]. The r 2 between the Z allele rs28929474 and rs3748312 is 0.08 (based on 1000 Genomes Project pilot 1 data from 120 CEU individuals). Our top SNP, rs3748312, is in LD (r 2 = 0.603) with the M1 allele SNP rs6647, but is in very weak LD with M2 rs709932 (r 2 = 0.033) and M3 rs1303 (r 2 = 0.051). The S allele SNP rs45551939 (merged into rs17580) was not found in HapMap (version24). It is possible that the signal observed in our data is due to variants with effects on gene expression and/or protein levels, and this idea is supported by a previous study showing novel variants in SERPINA1 to be associated with increased susceptibility to COPD independently of the Z allele [13]. The relatively strong signal observed in our study suggests a possible role for variants in SERPINA1 in smokers at the general population level beyond that observed in carriers of known deficient alleles. The PDE4D gene encodes the type 4D phosphodiesterase, which degrades cyclic adenosine monophosphate (cAMP), an important signal transduction molecule in all cell types. Polymorphisms within PDE4D have been associated with stroke [14], and bone mineral density [15]. PDE4D is the most dominant phosphodiesterase in the lungs and plays an important role in regulating airway smooth muscle contractility [16] demonstrated by PDE4D knockout mice lacking response to methacholine [17]. A study in a Japanese population reported association of one PDE4D SNP (rs829259) and a haplotype consisting of rs10075508 and one interleukin 13 (IL13) SNP with COPD [18]. SNP rs829259 was not associated with FEV 1 in all individuals (P = 0.68) and in smokers (P = 0.21) in our study, and SNP rs10075508 was not genotyped or imputed in SpiroMeta. A recent GWAS has also identified PDE4D as an asthma susceptibility gene [19], however, none of the top 5 SNPs associated with asthma is present in our dataset, and the linkage disequilibrium (LD) with SNPs in SpiroMeta is low, so it is difficult to comment on their contribution to lung function measures in our study.
Our study has a number of strengths. First, we have power to detect associations of small magnitude, with data on 20,288 individuals from 14 European studies with more than 2.5 million genotyped and imputed SNPs. Second, we aimed to minimise Type 1 error whilst taking appropriate account of the correlation between neighbouring SNPs. Finally, the literature search was designed to be comprehensive to include all reported genetic variants with effect on lung function irrespective of disease status or ethnicity. To our knowledge, this is the first study to comprehensively evaluate the role of previously associated genes in a large genome-wide association study.
However, it is important to recognise the limitations of our study. We have tested for association in a general population sample; the magnitude of effect of these genetic variants may be greater in populations enriched with individuals with respiratory diseases such as asthma and COPD. Second, we have tested with cross sectional lung function measures. Some of the variants tested might affect longitudinal changes by accelerating or decelerating the decline in lung function, although this would still be expected to result in effects evident in cross sectional data. Third, the power of our study to detect associations of SNPs with modest effect sizes on lung function was limited given our relatively conservative approach to multiple testing, therefore we cannot rule out a real but modest effect of some of these loci on lung function and susceptibility to respiratory diseases in the general population. Alternative approaches could be to utilise a priori evidence about the reported direction of effect and a priori assumptions about the likely presence of multiple causal variants. Fourth, we tested for association with lung function measures among individuals of European ancestry, and the contribution of these variants to lung function in other populations may vary. Finally, the coverage of tested genetic regions varies depending on the genome-wide arrays used and imputation quality metrics.
In conclusion, we have shown that none of the SNPs tagging the genes previously reported to determine lung function were significantly associated with FEV 1 or FEV 1 /FVC ratio in the SpiroMeta general population study. We found some evidence to suggest a possible contribution for the SERPINA1 and PDE4D loci to lung function in smokers which warrant further study. As a resource to the scientific community we have provided the complete association results (Dataset S1) in the online supporting information. From the search results, we included relevant papers reporting only positive association results. For the three GWAS papers identified, we took a more inclusive approach and included all loci presented in the publication body, and not just those meeting genome-wide significance. We excluded papers reporting associations with respiratory diseases (e.g. asthma) without association with lung function measurements.

Statistical analysis
The genes and intergenic SNPs identified in the relevant literature were evaluated in the SpiroMeta dataset using an extended region of +/210 kilobases (kb) from the gene coordinates downloaded from the UCSC genome browser (we used the SNP coordinate +/210 kb for intergenic SNPs). Meta-analysis association results for SNPs in these (+/210 kb extended) regions were extracted from the SpiroMeta dataset for both FEV 1 and FEV 1 /FVC in all individuals and separately in ever-smokers. The complete cohort descriptions, study design and methods have been previously reported [7], but we provide here a brief summary. At study level, non-genotyped SNPs were imputed using standard approaches [18,20] to facilitate meta-analysis of studies employing different genotyping platforms. Thus up to 2,705,257 SNPs were tested for association with FEV1 and FEV1/FVC using additive models and adjusting for age, sex, height and ancestry principal components. Then, the results were meta-analysed across studies using inverse variance weighting. Genomic control was applied at the study level and after the meta-analysis to correct for test inflation due to population stratification [21]. We excluded SNPs which were not well measured or imputed in the study (identifiable by an ''effective sample size'' of ,50% of the total sample size) [7]. In all, we identified 16,936 genotyped and imputed SNPs in the gene and intergenic regions described above which met our inclusion criteria.
In order to correct for multiple testing of SNPs in linkage disequilibrium we used Li and Ji's [22] method for calculating the effective number of independent tests from pairwise SNP correlations. Pairwise SNP correlations were obtained from reference genotypes of 1468 subjects in the Busselton study [23]. We estimated that the association tests for the 16,936 highly correlated SNPs we selected in the regions of interest equated to 3,891 independent tests.
To maintain a Type 1 error rate of 5%, we adjusted the significance threshold using a Bonferroni correction (0.05/3891). Thus a threshold of 1.3610 25 was used to determine statistical significance. Dataset S1 Complete FEV 1 and FEV 1 /FVC association results for all individuals and separately for ever-smokers. (XLS)