SNPs associated with HHIP expression have differential effects on lung function in males and females

Adult lung function is highly heritable and 279 genetic loci were recently reported as associated with spirometry-based measures of lung function. Though lung development and function differ between males and females throughout life, there has been no genome-wide study to identify genetic variants with differential effects on lung function in males and females. Here, we present the first genome-wide genotype-by-sex interaction study on four lung function traits in 303,612 participants from the UK Biobank. We detected five SNPs showing genome-wide significant (P<5 × 10−8) interactions with sex on lung function, as well as 21 suggestively significant interactions (P<1 × 10−6). The strongest sex interaction signal came from rs7697189 at 4:145436894 on forced expiratory volume in 1 second (FEV1) (P = 3.15 × 10−15), and was replicated (P = 0.016) in 75,696 individuals in the SpiroMeta consortium. Sex-stratified analyses demonstrated that the minor (C) allele of rs7697189 increased lung function to a greater extent in males than females (untransformed FEV1 β = 0.028 [SE 0.0022] litres in males vs β = 0.009 [SE 0.0014] litres in females), and this effect was not accounted for by differential effects on height, smoking or age at puberty. This SNP resides upstream of the gene encoding hedgehog-interacting protein (HHIP) and has previously been reported for association with lung function and HHIP expression in lung tissue. In our analyses, while HHIP expression in lung tissue was significantly different between the sexes with females having higher expression (most significant probeset P=6.90 × 10−6) after adjusting for age and smoking, rs7697189 did not demonstrate sex differential effects on expression. Establishing the mechanism by which HHIP SNPs have different effects on lung function in males and females will be important for our understanding of lung health and diseases, such as chronic obstructive pulmonary disease (COPD), in both sexes.


Introduction
Measures of lung function, including forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC), are used to determine diagnosis and severity of chronic obstructive pulmonary disease (COPD). COPD refers to a group of complex lung disorders characterised by irreversible (and usually progressive) airway obstruction, and is projected to be the third leading cause of death globally in 2020 (1). The major risk factor for COPD is smoking, but other environmental and genetic factors have been identified.
Physiological lung development and function differ throughout life between males and females.
Female foetuses have smaller airways and fewer bronchi, but their lungs mature faster and they produce surfactant earlier than lungs of male foetuses (2). During childhood, females have smaller lungs compared to height-matched males, but have a higher flow rate per lung volume, perhaps reflecting airway growth lagging behind lung growth in males (3). In adulthood, females have smaller diameter airways, fewer alveoli, and smaller lung volumes and diffusion surfaces compared to males (4). However, there is some evidence to suggest that age-related decline in lung function is slower in females (5). As well as these anatomical differences, it is known that sex hormones can influence lung structure and function throughout life but the mechanisms are not well understood (5,6).
The incidence and presentation of lung diseases such as COPD also exhibit sex disparity.
Traditionally viewed as a disease of older males, COPD has been increasing in prevalence amongst females over the last two decades. It has been reported that females are more vulnerable to environmental risk factors for COPD and are over-represented amongst sufferers of early-onset severe COPD (7,8). Females are also more likely to present with small airway disease whereas males are more likely to develop emphysematous phenotype. Moreover, females report more frequent and/or severe exacerbations of respiratory symptoms than males and higher levels of dyspnea and cough (7).
In a recent paper, 279 genetic loci were reported as associated with lung function traits, but these only explain a small proportion of the heritability (9). One possible source of hidden heritability is the interaction between genetic factors and biological sex on lung function traits. A genome-wide genotype-by-sex interaction study in three studies comprising 6260 COPD cases and 5269 smoking controls found a putative sex-specific risk factor for COPD in the CELSR1 gene, a region not previously implicated in COPD or lung function (10). However, having sufficient statistical power to reproducibly detect genotype-by-sex interactions will require much larger sample sizes. Statistical power can also be enhanced by using quantitative lung function traits as outcomes but we are not aware of any genome-wide genotype-by-sex interaction studies on lung function traits.
Understanding the role of sex in lung function and COPD will be important for developing therapeutics that work for both males and females (11).
In this study, we tested for an interaction effect of 7,745,864 variants and sex on FEV1, FEV1/FVC, FVC and PEF in 303,612 individuals from the UK Biobank resource. We sought replication of our findings in 75,696 independent individuals from the SpiroMeta consortium. To our knowledge this is the first genome-wide sex-by-genotype interaction study on lung function traits and the largest sexby-genotype interaction study to focus on COPD-related outcomes.

UK Biobank participants
The UK Biobank is described here: http://www.ukbiobank.ac.uk. It comprises over 500,000 volunteer participants aged 40-69 years at time of recruitment, with demographic, lifestyle, clinical and genetic data (12). Individuals were selected for inclusion in this study if (i) they had no missing data for sex, age, height, and smoking status, (ii) their spirometry data passed quality control, as described previously (9), (iii) they had genome-wide imputed genetic data, (iv) they were of genetically determined European ancestry, and (v) they were not first-or second-degree relatives of any other individual included in the study. In total, 303,612 individuals met these criteria (Supplementary Table 1).
Participants' DNA was genotyped using either the Affymetrix Axiom® UK BiLEVE array or the Affymetrix Axiom® UK Biobank array (12). Genotypes were imputed based on the Human Reference Consortium (HRC) panel, as described elsewhere (12). Variants with minor allele frequency (MAF) < 0.01 and imputation quality (info) scores < 0.3 were excluded from the analysis.

SpiroMeta consortium
The SpiroMeta consortium meta-analysis comprised 75,696 individuals from 20 studies. Ten studies (N = 17,280) were imputed using 1000 Genomes Phase 1 reference panel (13,14), nine studies (N = 37,919) were imputed using the Haplotype Reference Consortium (HRC) panel (12), and one study (N = 2077) was imputed using the HapMap CEU Build 36 Release 22. Two of the studies (ALSPAC and Raine) also provided data on children with an average age of 8 years (N = 5645). Supplementary Tables 2 and 3 show the definitions of all abbreviations, study characteristics, details of genotyping platforms and imputation panels and methods. Measurements of spirometry for each study are as previously described (9,15). Fourteen SpiroMeta studies had data on PEF so the sample size for replication of PEF signals was 51,555.

The lung eQTL study
The lung expression quantitative trait loci (eQTL) study database has been described previously (16)(17)(18). It consists of non-tumour lung tissue samples from 1,111 individuals who had undergone lung resection surgery, mainly current or former smokers, genotyped on the Illumina Human1M-Duo BeadChip array.

Statistical analysis
Spirometry-based lung function traits FEV1, FEV1/FVC, FVC, and PEF were pre-adjusted for age, age 2 , standing height (or sitting height in the sensitivity analysis) and smoking status and the residuals rank-transformed to normality using the rntransform function of the GenABEL package in R. To test each imputed variant for an interaction effect, a linear regression model with genotype (additive effect), sex, genotype-by-sex interaction, genotyping array and the first ten principal components included as covariates was implemented using Plink 2.0 software (https://www.coggenomics.org/plink/2.0/). Genome-wide results were visualised using the R packages qqman, manhattanly and Circos v0.65. Region plots were generated using LocusZoom.
Following the genome-wide interaction study, sentinel SNPs were identified by selecting the SNP with the strongest sex interaction P value, excluding all SNPs (irrespective of LD) +/-1Mb, and then selecting the SNP with the strongest sex interaction P value from the remaining signals. This process was repeated until no SNPs with P<1 x 10 -6 remained.
Step-wise conditional analyses to identify independently associated variants within the 2Mb region surrounding the sentinel SNPs were then undertaken using GCTA software (19,20). The SNPs with the strongest interaction P value at each independent locus were then selected for replication in SpiroMeta consortium studies.
Sentinel SNPs were also tested for association with lung function traits in males and females separately to provide sex-specific effect sizes.
Regression analysis to test genotype-by-sex interactions on height were conducted using a model including genotype (additive effect), age, age 2 , sex, genotyping array and the first ten principal components as covariates. Interactions between smoking status and genotype on lung function were tested using lung function traits transformed as described above (with sex included in the model instead of ever-smoking status). The linear regression model included genotype (additive effect), ever-smoking status, a genotype-by-smoking interaction term, genotyping array and the first ten principal components.
To test whether pubertal timing has differential effects on the association between SNPs and lung function in males and females, the regression model was adjusted for relative age at menarche in females and relative age at voice breaking in males. Relative age at voice breaking is categorised as earlier than average (1), around average (2) and later than average (3) in UK Biobank. Age at menarche is given as the participant's age at menarche in years. To make these variables comparable, age at menarche was categorised as early (<12 years old), average (12-14 years old) and late (>14 years old) as in a previous study (21). As in the lung function analyses, ancestry-based principal components and genotyping array were included in all the regression models.
For the SpiroMeta consortium, summary statistics were generated by each contributing cohort separately according to the same analysis plan as the UK Biobank data. That is, the lung function traits were transformed as above and then tested for association with SNP-by-sex interaction terms, adjusting for sex and ancestry-based principal components. Meta-analysis of SpiroMeta cohorts was conducted using inverse-variance weighted fixed effects meta-analysis the metagen function of the meta package in R.

Results
We tested 7,745,864 genome-wide variants with MAF ≥ 0.01 and imputation quality scores > 0.  Table 1). This SNP resides upstream of the HHIP gene and is in linkage disequilibrium with two previously reported lung function-associated sentinel SNPs, rs13141641 (27, 28) (r 2 = 0.91) and rs13116999 (28) (r 2 = 0.56). SNP rs7697189 was also genome-wide and suggestively significant for interactions with sex on PEF and FEV1/FVC respectively, but did not meet the threshold for suggestive significance in FVC (P = 8.71 x 10 -5 ) (Supplementary Table 4, Figure 2).

rs7697189 interacts with sex on lung function independently of height, smoking and pubertal timing
As SNPs in HHIP are also reported to be associated with height (29) and increased height is associated with increased lung function, it is possible that rs7697189 has differential effects on lung function in males and females through differential effects on height. However, the association of rs7697189 with standing height was not modified by sex in a combined analysis of UK Biobank males and females with a genotype-by-sex interaction term (interaction P = 0.806). We also conducted a sensitivity analysis showing that the rs7697189-by-sex interaction on FEV1 remained significant after adjustment for sitting height (β = -0.04 [SE = 0.005], P = 1.97 x 10 -15 ).
Amongst the 303,612 UK Biobank participants in this study, the proportion of ever-smokers was higher in males (52.8%) than females (40.3%) (Supplementary Table 1). A larger effect of rs7697189 on lung function in males compared to females could arise if there was an interaction effect with smoking. However, there was no interaction between rs7697189 and ever-smoking status on FEV1 in this study (interaction P = 0.63). SNP rs7697189, and correlated SNPs in the region, have been shown to be associated with expression levels of Hedgehog-interacting protein (HHIP) in lung tissue (30). HHIP is a critical protein during early development and HHIP variants have been associated with lung function in infancy (31).
We tested whether HHIP SNPs also have differential effects on lung function in females compared to males in childhood using data from children with an average age of 8 years in the ALSPAC and Raine studies (N = 5645). No significant association between rs7697189 and FEV1, and no significant interaction between rs7697189 and sex on FEV1 was detected, possibly due to the much smaller sample size of the childhood cohorts (Supplementary Figure 2). Finally, as pubertal timing has been associated with adult lung function (15), we tested for an effect of relative age at puberty on the association between rs7697189 and lung function in a sex-stratified analysis. The association between HHIP SNPs and lung function was adjusted for relative age at voice breaking in males and for age at menarche in females, but neither adjustment changed the effect estimate of the SNPs on lung function (Supplementary Table 5).

rs7697189 is associated with HHIP expression, but no interaction with sex
It is possible that rs7697189 interacts with sex on lung function through differential effects on HHIP expression. We confirm that rs7697189 is associated with HHIP expression in lung tissue but we do not detect an interaction with sex on HHIP expression (Supplementary Table 6). However, HHIP (in all samples irrespective of genotype at rs7697189) does show differential expression between males and females, with females showing higher expression (Supplementary Table 7).

Discussion
We identified a genome-wide significant genotype-by-sex interaction signal at a locus previously reported for association with lung function upstream of the HHIP gene (rs7697189, FEV1 interaction P = 3.15 x 10 -15 ). The signal was nominally significant in 75,696 individuals from 20 independent studies of the SpiroMeta consortium. We demonstrated that the differential effects of this SNP in males and females (untransformed FEV1 β = 0.028 [SE 0.0022] litres in males vs β = 0.009 [SE 0.0014] litres in females) was not mediated by effects on height, smoking behaviour or pubertal age.
SNPs at the HHIP locus also exhibited genome-wide or suggestive significant interactions with sex on two additional lung function traits in UK Biobank: FEV1/FVC and PEF (P = 8.98 x 10 -8 and P = 8.78 x 10 -12 respectively). Stratified analyses in males and females separately demonstrated that these SNPs appeared to have a stronger effect on lung function in males compared to females. There was no interaction between these SNPs and ever-smoking status on lung function in UK Biobank, suggesting that the stronger effect in males is not due to differences in smoking behaviour. We also demonstrate that an association between these SNPs and height is not modified by sex, suggesting that differential effects on height in males and females do not explain the genotype-by-sex interaction on lung function.
The genome-wide significant sex interaction locus is located upstream of the HHIP gene, a region previously reported to be associated with lung function (23,26) and HHIP gene expression (30). The HHIP gene encodes hedgehog-interacting protein, a negative regulator of hedgehog signalling. The hedgehog signalling pathway regulates numerous physiological processes such as growth, selfrenewal, cell survival, differentiation, migration, and tissue polarity and plays a vital role in the morphogenesis of lung and other organs (32). Hedgehog signalling has also been shown to participate in regulation of stem and progenitor cell populations in adult tissues, impacting tissue homeostasis and repair (33). SNP rs7697189, showing the strongest sex interaction on lung function in our study, is in strong linkage disequilibrium (R 2 = 0.93) with SNPs residing in an HHIP enhancer region (30). These enhancer-region SNPs also exhibit genome-wide significant genotype-by-sex interactions on lung function in our data. According to previous reports, the alleles associated with decreased lung function are also associated with reduced enhancer activity and reduced HHIP expression in lung tissues, providing a potential mechanism by which SNPs in this linkage disequilibrium block might modulate hedgehog signalling in the lungs (30). However, in our study, though HHIP was expressed at lower levels in males compared to females in lung tissue, the association between rs7697189 and HHIP expression was not modified by sex. This may be because there is no sex differential effect on expression, or the study might have been underpowered to detect an interaction effect (based on 472 males and 566 females). It is therefore still not clear why SNPs upstream of HHIP would be showing different effects in males and females.
Investigating the effects of HHIP at different stages of development by sex may help to shed light on this conundrum. In our study we had access to genetic and lung function data from 5645 children with an average age of 8 years. Though underpowered to detect the association between rs7697189 and FEV1 seen in UK Biobank adults, the lack of a similar trend in children suggests that HHIP variants may have differential effects at different developmental stages (though the genotype-by-sex interaction is in the same direction as in adults). We also looked for an effect of timing of puberty on the association between rs7697189 and lung function in adults, but adjustment for relative age of voice breaking in males and relative age at menarche in females made no difference to the relationship between rs7697189 and lung function.
We identified four additional genome-wide significant (interaction P<5x10 -8 ) sex-by-genotype interactions on lung function in our discovery analysis in UK Biobank, with a further 21 that met a less stringent threshold of interaction (P<1x10 -6 ). As far as we are aware, this is the first genomewide sex-by-genotype interaction study for lung function traits. We did not find a significant genotype-by-sex interaction on lung function or COPD at the CELSR1 locus (interaction P = 0.525 and P = 0.503 respectively) previously reported to have sex-specific effects on risk of COPD (10).
In conclusion, we have identified a novel genotype-by-sex interaction at SNPs at a putative enhancer region upstream of the hedgehog-interacting protein (HHIP) gene. Establishing the mechanism by which HHIP has sex differential effects on lung function will be important for our understanding of COPD and for realising the potential of precision medicine by optimising treatment in males and females.  Bold text in final column indicates that the effect in SpiroMeta was in the same direction to the effect in UK Biobank