Variants associated with HHIP expression have sex-differential effects on lung function

Lung function is highly heritable and differs between the Background: sexes throughout life. However, little is known about sex-differential genetic effects on lung function. We aimed to conduct the first genome-wide genotype-by-sex interaction study on lung function to identify genetic effects that differ between males and females. We tested for interactions between 7,745,864 variants and sex Methods: on spirometry-based measures of lung function in UK Biobank (N=303,612), and sought replication in 75,696 independent individuals from the SpiroMeta consortium. Five independent single-nucleotide polymorphisms (SNPs) Results: showed genome-wide significant (P<5x10 ) interactions with sex on lung function, and 21 showed suggestive interactions (P<1x10 ). The strongest signal, from rs7697189 (chr4:145436894) on forced expiratory volume in 1 second (FEV ) (P=3.15x10 ), was replicated (P=0.016) in SpiroMeta. The C allele increased FEV more in males (untransformed FEV β=0.028 [SE 0.0022] litres) than females (β=0.009 [SE 0.0014] litres), and this effect was not accounted for by differential effects on height, smoking or pubertal age. rs7697189 resides upstream of the hedgehog-interacting protein ( ) HHIP gene and was previously associated with lung function and lung HHIP expression. We found expression was significantly different between HHIP the sexes (P=6.90x10 ), but we could not detect sex differential effects of rs7697189 on expression. We identified a novel genotype-by-sex interaction at a Conclusions: putative enhancer region upstream of the gene. Establishing the HHIP mechanism by which SNPs have different effects on lung function in HHIP males and females will be important for our understanding of lung health and diseases in both sexes.


Abstract
Lung function is highly heritable and differs between the Background: sexes throughout life. However, little is known about sex-differential genetic effects on lung function. We aimed to conduct the first genome-wide genotype-by-sex interaction study on lung function to identify genetic effects that differ between males and females.
We tested for interactions between 7,745,864 variants and sex Methods: on spirometry-based measures of lung function in UK Biobank (N=303,612), and sought replication in 75,696 independent individuals from the SpiroMeta consortium.
Five independent single-nucleotide polymorphisms (SNPs) Results: showed genome-wide significant (P<5x10 ) interactions with sex on lung function, and 21 showed suggestive interactions (P<1x10 ). The strongest signal, from rs7697189 (chr4:145436894) on forced expiratory volume in 1 second (FEV ) (P=3.15x10 ), was replicated (P=0.016) in SpiroMeta. The C allele increased FEV more in males (untransformed FEV β=0.028 [SE 0.0022] litres) than females (β=0.009 [SE 0.0014] litres), and this effect was not accounted for by differential effects on height, smoking or pubertal age. rs7697189 resides upstream of the hedgehog-interacting protein ( ) HHIP gene and was previously associated with lung function and lung HHIP expression. We found expression was significantly different between HHIP the sexes (P=6.90x10 ), but we could not detect sex differential effects of rs7697189 on expression.
We identified a novel genotype-by-sex interaction at a Conclusions: putative enhancer region upstream of the gene. Establishing the HHIP mechanism by which SNPs have different effects on lung function in HHIP males and females will be important for our understanding of lung health and diseases in both sexes.
Keywords genome-wide interaction study, lung function, sex, HHIP, expression  The following authors report potential competing interests: L.V.W.: Louise V. Wain has received grant support from GSK.

Introduction
Measures of lung function, including forced expiratory volume in 1 second (FEV 1 ) and forced vital capacity (FVC), are used to determine diagnosis and severity of chronic obstructive pulmonary disease (COPD). COPD refers to a group of complex lung disorders characterised by irreversible (and usually progressive) airway obstruction, and is projected to be the third leading cause of death globally in 2020 1 . The major risk factor for COPD is smoking, but other environmental and genetic factors have been identified.
Physiological lung development and function differ throughout life between males and females 2 . It is known that sex hormones can influence these processes but the mechanisms are not well understood 3,4 . The incidence and presentation of lung diseases such as COPD also exhibit sexual dimorphism. Traditionally viewed as a disease of older males, COPD has been increasing in prevalence amongst females over the last two decades. It has been reported that females are more vulnerable to environmental risk factors for COPD and are over-represented amongst sufferers of early-onset severe COPD 5,6 . Females are also more likely to present with small airway disease whereas males are more likely to develop emphysematous phenotype. Moreover, females report more frequent and/or severe exacerbations of respiratory symptoms than males and higher levels of dyspnoea and cough 5 .
In a recent paper, 279 genetic loci were reported as associated with lung function traits, but these only explain a small proportion of the heritability 7 . One possible source of hidden heritability is the interaction between genetic factors and biological sex on lung function traits. A genome-wide genotype-by-sex interaction study in three studies comprising 6260 COPD cases and 5269 smoking controls found a putative sex-specific risk factor for COPD in the CELSR1 gene, a region not previously implicated in COPD or lung function 8 . However, having sufficient statistical power to reproducibly detect genotype-by-sex interactions requires much larger sample sizes. Statistical power can also be enhanced by using quantitative lung function traits as outcomes instead of COPD diagnoses, but we are not aware of any genomewide genotype-by-sex interaction studies on lung function traits. Understanding the role of sex in lung function and COPD will be important for developing therapeutics that work for both males and females 9 .
In this study, we tested for an interaction effect of 7,745,864 variants and sex on FEV 1 , FEV 1 /FVC, FVC and PEF in 303,612 individuals from the UK Biobank resource. We sought replication of our findings in 75,696 independent individuals from the SpiroMeta consortium. To our knowledge this is the first genome-wide sex-by-genotype interaction study on lung function traits, and the largest sex-by-genotype interaction study to focus on COPD-related outcomes.

Results
We tested 7,745,864 genome-wide variants with minor allele frequency (MAF) ≥ 0.01 and imputation quality scores ≥ 0.3 for genotype-by-sex interactions on lung function in 303,612 unrelated individuals of European ancestry from UK Biobank. Five independent signals were identified showing genome-wide significant (P<5 x 10 -8 ) interaction with sex on at least one of four lung function traits (FEV 1 , FEV 1 /FVC, FVC, and PEF) with a further 21 SNPs showing suggestive significance (P<1 x 10 -6 ) ( Table 1; Figure S1, Extended data 10 ). The top three genome-wide significant signals had been previously reported for association with lung function: rs7697189 near the gene encoding hedgehoginteracting protein (HHIP) (interaction P = 3.15 x 10 -15 ), rs9403386 near the gene encoding Adhesion G Protein-Coupled Receptor G6 (ADGRG6, previously known as GPR126) (interaction P = 4.56 x 10 -9 ), and rs162185 downstream of the gene encoding transcription factor 21 (TCF21) (interaction P = 4.87 x 10 -9 ) [11][12][13][14][15][16] . This may, in part, reflect greater power to detect interactions with variants with strong main effects on lung function. Only rs355079 (interaction P = 8.84 x 10 -7 ) showed significant effects in opposite directions in males compared to females.
We sought evidence for replication of all 26 signals in up to 75,696 individuals from 20 cohorts of the SpiroMeta consortium. One variant, rs76911399, was excluded because it was poorly imputed in SpiroMeta cohorts and had no directly genotyped or well-imputed proxies (at r 2 threshold 0.8). Of the remaining 25 signals, 19 exhibited the same direction of interaction effect as in UK Biobank. Furthermore, the effect sizes (beta coefficients) from the regression analyses of all 25 SNPs in UK Biobank and SpiroMeta showed a correlation of 0.51 ( Figure S2, Extended data 10 ). The SNP with the strongest evidence for interaction with sex on lung function in SpiroMeta cohorts was rs7697189 (near HHIP) (replication interaction P = 0.016) ( Table 1, Figure 1). The minor (C) allele of rs7697189 had a larger effect on lung function in males (β = 0.052 [SE 0.004], P = 2.13 x 10 -33 ) compared to females (β = 0.013 [SE 0.003], P = 1.16 x 10 -5 ) ( Table 1). This SNP resides upstream of the HHIP gene and is in linkage disequilibrium with two previously reported lung function-associated sentinel SNPs, rs13141641 16,17 (r 2 = 0.91) and rs13116999 17 (r 2 = 0.56). SNP rs7697189 also showed some evidence of interaction with sex on PEF (β = -0.035 (0.005), P = 8.78 x 10 -12 ), FEV 1 /FVC (β = -0.028 (0.005), P = 8.98 x 10 -8 ), and FVC (β = -0.020 (0.005), P = 8.71 x 10 -5 ) (Table S1, Extended data 10 ; Figure 2). rs7697189 interacts with sex on lung function independently of height, smoking and pubertal timing As SNPs in HHIP are also reported to be associated with height 18 and increased height is associated with increased lung function, it is possible that rs7697189 has differential effects on lung function in males and females through differential effects on height. However, the association of rs7697189 with standing height was not modified by sex in a combined analysis of UK Biobank males and females with a genotype-by-sex interaction term (interaction P = 0.806). We also conducted a sensitivity analysis showing that the effect of the rs7697189-by-sex interaction on FEV 1 was consistent with the original estimate after adjustment for sitting height (β = -0.04 [SE = 0.005], P = 1.97 x 10 -15 ).
Amongst the 303,612 UK Biobank participants in this study, the proportion of ever-smokers was higher in males (52.8%) than females (40.3%) (Table S2). A larger effect of rs7697189 on

0.622
The SNPs are those that demonstrate a sex-interaction effect on lung function in UK Biobank (P<1x10 -6 ) (N = 303,612). Lung function traits were pre-adjusted for age, age 2 , standing height and smoking status and the residuals rank-transformed to normality. The regression models also included genotyping array and the first ten ancestry-based principal components. For each SNP, columns 4-9 provide minor allele frequency (MAF), and beta-coefficients, standard errors and the P value for their association with lung function in males and females separately. Columns 10-11 show the results of the SNP-by-sex interaction in UK Biobank, where the effect is given in females relative to males. For example, the top SNP (rs7697189) shows a less positive effect in females compared to males and its beta coefficient is therefore negative. Columns 12-13 show the results of the SNP-by-sex interaction in 20 cohorts of the SpiroMeta consortium (N = 75,696). Bold text in final column indicates that the effect in SpiroMeta was in the same direction to the effect in UK Biobank.

Figure 1. Meta-analysis of rs7697189-by-sex interaction effects on lung function in SpiroMeta cohorts.
The forest plot shows the betacoefficients (test effects, TE) and standard errors for the interaction between rs7697189 and sex on forced expiratory volume in 1 second (FEV 1 ) in 20 cohorts of the SpiroMeta consortium (total N = 75,696). The overall effect size from fixed effects meta-analysis is represented by the diamond.
lung function in males compared to females could arise if there was an interaction effect with smoking. However, there was no interaction between rs7697189 and ever-smoking status on FEV 1 in this study (interaction P = 0.63). Pack years data was available for 94,750 UK Biobank participants. In sensitivity analyses we found a similar rs7697189-by-sex effect size on FEV 1 when adjusted for pack years (β = -0.033 [SE = 0.009], P = 3.50 x 10 -4 ) and no interaction between genotype and pack years on FEV 1 (interaction P = 0.80).
SNP rs7697189, and correlated SNPs in the region, have been shown to be associated with expression levels of HHIP in lung tissue 19 . HHIP is a critical protein during early development and HHIP variants have been associated with lung function in infancy 20 . We tested whether HHIP SNPs also have differential effects on lung function in females compared to males in childhood using data from children with an average age of eight years in the ALSPAC and Raine studies (N = 5645). In the meta-analysis of ALSPAC and Raine ( Figure S3, Extended data 10 ), whilst we observed a point estimate for the rs7697189-by-sex interaction effect on FEV 1 that was consistent with the confidence intervals for the discovery effect observed in UK Biobank, the confidence intervals overlapped the null (which likely reflects in part the smaller numbers studied in these cohorts). Finally, as pubertal timing has been associated with adult lung function 21 , we tested for an effect of relative age at puberty on the association between rs7697189 and lung function in a sex-stratified analysis. The association between HHIP SNPs and lung function was adjusted for relative age at voice breaking in males and for age at menarche in females, but adjusted effect estimates were highly consistent with the unadjusted estimates of the SNPs on lung function (Table S3, Extended data 10 ).
rs7697189 is associated with HHIP expression, but no interaction with sex It is possible that rs7697189 interacts with sex on lung function through differential effects on HHIP expression. We confirmed that rs7697189 is associated with HHIP expression in lung tissue The SNP with the strongest association in the rs7697189-proximal region is represented by a blue diamond. The FEV 1 and PEF sentinels are rs7697189, the FEV 1 /FVC sentinel is rs1512281 (R 2 = 0.95 with rs7697189), and the FVC sentinel is rs7681384 (R 2 = 0.57 with rs7697189). Note that there is an independent suggestively significant signal from rs2353939 and surrounding SNPs for FVC, but this did not replicate in SpiroMeta cohorts. All other SNVs are colour coded according to their linkage disequilibrium (R 2 ) with the sentinel SNP (as shown in the key). All imputed SNVs are plotted irrespective of MAF, demonstrating that rarer variants are not exhibiting significant interactions with sex on lung function. The locations of genes in the region are shown in the lower panel of each plot. Recombination rate is represented by the blue lines. These plots were generated using LocusZoom software.
Page 10 of 14 but we did not detect an interaction with sex on HHIP expression (Table S4, Extended data 10 ). However, HHIP (in all samples irrespective of genotype at rs7697189) did show differential expression between males and females, with females showing higher expression (Table S5; Extended data 10 ). This agrees with GTEx data on HHIP lung expression in males and females ( Figure S4, Extended data 10 ).
rs7697189 is in linkage disequilibrium with a SNP predicted to disrupt SREBP and SRF motifs HaploReg v4.1 22 was used to identify whether rs7697189, or SNPs in linkage disequilibrium, affected transcription factor binding motifs. This demonstrated that rs7697189 itself was predicted to change FAC1 and FOXO motifs and was within a chromatin mark indicative of enhancer activity in embryonic stem cell lines differentiated to CD56+ mesoderm and CD184+ endoderm cultured cells. A SNP (rs12504628) in complete linkage disequilibrium with rs7697189 changes SREBP and SRF motifs. These transcription factors have been reported to be involved in sex hormone signalling 23,24 .

Discussion
We identified a genome-wide significant genotype-by-sex interaction signal at a locus previously reported for association with lung function upstream of the HHIP gene (rs7697189, There was evidence that SNPs at the HHIP locus demonstrated interactions with sex on two additional lung function traits in UK Biobank: FEV 1 /FVC and PEF (β = -0.028 (0.005), P = 8.78 x 10 -12 and β = -0.035 (0.005), P = 8.78 x 10 -12 , respectively). Stratified analyses in males and females demonstrated that these SNPs appeared to have a stronger effect on lung function in males compared to females. There was no interaction between these SNPs and ever-smoking status on lung function in UK Biobank, suggesting that the stronger effect in males is not due to differences in smoking behaviour. We also demonstrate that an association between these SNPs and height is not modified by sex, suggesting that differential effects on height in males and females do not explain the genotype-by-sex interaction on lung function.
In contrast to these results, a recent study found comparatively weak evidence of an interaction effect between a SNP (rs13140176) in high LD with rs7697189 (r 2 = 0.93) and sex on risk of COPD in UK Biobank 25 . This is likely in part to be due to reduced power to detect interaction effects on a binary trait. Indeed, in our study, the rs13140176-by-sex interaction effect on FEV 1 /FVC passes the conventional threshold for genome-wide significance (P<5x10 -8 ) but when COPD was defined as FEV 1 / FVC<0.7 this threshold was not met (P=0.023). Nevertheless, rs13140176 shows a consistent direction of effect between the studies: the lung function-lowering allele increases risk of COPD to a greater extent in males than females 25 .
The genome-wide significant sex interaction locus is located upstream of the HHIP gene, a region previously reported to be associated with lung function 12,15 and HHIP gene expression 19 .
The HHIP gene encodes hedgehog-interacting protein, a negative regulator of hedgehog signalling. The hedgehog signalling pathway regulates numerous physiological processes such as growth, self-renewal, cell survival, differentiation, migration, and tissue polarity and plays a vital role in the morphogenesis of lung and other organs 26 . Hedgehog signalling has also been shown to participate in regulation of stem and progenitor cell populations in adult tissues, impacting tissue homeostasis and repair 27 . SNP rs7697189, showing the strongest sex interaction on lung function in our study, is in strong linkage disequilibrium (R 2 = 0.93) with SNPs residing in an HHIP enhancer region 19 . These enhancer-region SNPs were reported to be associated with enhancer activity and HHIP expression in lung tissues. They also exhibit genome-wide significant genotype-by-sex interactions on lung function in our data. We therefore tested the effect of rs7697189 on HHIP expression in lung tissue from 472 males and 566 females to look for sex differential effects. In contrast to the previous study 19 , we found that the lung-function lowering G allele was associated with enhanced expression of HHIP in both males and females, and that expression was lower in males than females. However, the association between rs7697189 and HHIP expression was not modified by sex. This may be because there is no sex differential effect on expression, or the study might have been underpowered to detect an interaction effect. It is therefore still not clear why SNPs upstream of HHIP would be showing different effects in males and females. Our in silico analyses predict that rs7697189 and a SNP in linkage disequilibrium (rs12504628) change transcription factor motifs that may be relevant to the effect of sex hormones on lung development, but experimental analyses will be required to test these hypotheses.
Investigating the effects of HHIP at different stages of development by sex may help to shed light on its mechanism of action. In our study we had access to genetic and lung function data from 5645 children with an average age of eight years. Though underpowered to detect the association between rs7697189 and FEV 1 seen in UK Biobank adults, the lack of a similar trend in children suggests that HHIP variants may have differential effects at different developmental stages (though the genotype-bysex interaction is in the same direction as in adults). We also looked for an effect of timing of puberty on the association between rs7697189 and lung function in adults, but adjustment for relative age of voice breaking in males and relative age at menarche in females made no difference to the relationship between rs7697189 and lung function. As UK Biobank participants were aged between 40 and 69 years at recruitment, we did not have the longitudinal data to investigate the effect of HHIP SNPs on trajectories of lung function decline throughout life 28 , but this could be an interesting area for future studies.
We identified four additional genome-wide significant (interaction P<5x10 -8 ) sex-by-genotype interactions on lung function in our discovery analysis in UK Biobank, with a further 21 that met a less stringent threshold of interaction (P<1x10 -6 ). As far as we are aware, this is the first genome-wide sex-by-genotype interaction study for lung function traits. We did not find a significant genotype-by-sex interaction on lung function or COPD at the CELSR1 locus (interaction P = 0.525 and P = 0.503, respectively) previously reported to have sex-specific effects on risk of COPD 8 .
In conclusion, we have identified a novel genotype-by-sex interaction at SNPs at a putative enhancer region upstream of the hedgehog-interacting protein (HHIP) gene. Establishing the mechanism by which HHIP has sex differential effects on lung function will be important for our understanding of the biological underpinnings of COPD in males and females. This knowledge, in turn, will be crucial to optimising treatment in males and females. Full ethics statements for each SpiroMeta consortium cohort is included in the S1 Appendix (Extended data, 10 ).

UK Biobank
The UK Biobank is described here: http://www.ukbiobank. ac.uk. Individuals were included in this study if (i) they had no missing data for sex, age, height, and smoking status, (ii) their spirometry data passed quality control, as described previously 7 , (iii) their genetically inferred sex matched their reported sex, (iv) they had genome-wide imputed genetic data, (v) they were of genetically determined European ancestry, and (vi) they were not first-or second-degree relatives of any other individual included in the study. In total, 303,612 individuals met these criteria (Table S2, Extended data 10 ).
Participants' DNA was genotyped using either the Affymetrix Axiom ® UK BiLEVE array or the Affymetrix Axiom ® UK Biobank array 29 . Genotypes were imputed based on the Human Reference Consortium (HRC) panel, as described elsewhere 29 . Variants with minor allele frequency (MAF)<0.01 were excluded, as were variants with imputation quality scores <0.3.

SpiroMeta consortium
The SpiroMeta consortium meta-analysis comprised 75,696 individuals from 20 studies (see S1 Appendix for details, Extended data 10 ). Ten studies (N=17,280) were imputed using 1000 Genomes Phase 1 reference panel 30,31 , nine (N=37,919) were imputed using the Haplotype Reference Consortium (HRC) panel 29 , and one (N=2077) was imputed using the HapMap CEU Build 36 Release 22. The ALSPAC (RRID: SCR_007260) and Raine studies also provided data on children with an average age of eight years (N=4426 and N=1219, respectively). Tables S6 and S7 show definitions of all abbreviations, study characteristics, details of genotyping platforms and imputation panels and methods (Extended data 10 ). Measurements of spirometry for each study are as previously described 7,21 . Fourteen SpiroMeta studies had data on PEF (N=51,555).

Statistical analysis
Spirometry-based lung function traits FEV 1 , FEV 1 /FVC, FVC, and PEF were pre-adjusted for age, age 2 , standing height (or sitting height in the sensitivity analysis) and smoking status and the residuals rank-transformed to normality using the rntransform function of the GenABEL package (RRID: SCR_001842) in R (RRID: SCR_001905). To test each imputed autosomal variant for an interaction effect, a linear regression model with genotype (additive effect), sex, genotype-by-sex interaction, genotyping array and the first ten principal components included as covariates was implemented using Plink 2.0 software (RRID: SCR_001757).
Step-wise conditional analyses to identify independently associated variants were undertaken using GCTA software 32,33 .
Regression analysis to test genotype-by-sex interactions on height were conducted using a model including genotype (additive effect), age, age 2 , sex, genotyping array and the first ten principal components as covariates. Interactions between smoking status and genotype on lung function were tested using lung function traits transformed as described above (with sex included in the model instead of ever-smoking status). The linear regression model included genotype (additive effect), ever-smoking status, a genotype-by-smoking interaction term, genotyping array and the first ten principal components.
To test whether pubertal timing has differential effects on the association between SNPs and lung function in males and females, the regression model was adjusted for relative age at menarche in females and relative age at voice breaking in males. Relative age at voice breaking is categorised as earlier than average (1), around average (2) and later than average (3) in UK Biobank. Age at menarche is given as the participant's age at menarche in years. To make these variables comparable, age at menarche was categorised as early (<12 years old), average (12-14 years old) and late (>14 years old) as in a previous study 34 . As in the lung function analyses, ancestry-based principal components and genotyping array were included in all the regression models.
For the SpiroMeta consortium, summary statistics were generated by each contributing cohort separately according to the same analysis plan as the UK Biobank data. Meta-analysis of SpiroMeta cohorts was conducted using inverse-variance weighted fixed effects meta-analysis using the metagen function of the meta package in R.

The lung eQTL study
The lung expression quantitative trait loci (eQTL) study database has been described previously [35][36][37] and in S1 Appendix (Extended data 10 ). HHIP differential gene expression analysis between females and males was performed using linear regression. Association of rs7697189 and rs7697189-by-sex interaction with gene expression was tested in 1,038 subjects with genotypes using MatrixEQTL package in R. All analyses were done separately in Laval, UBC and Groningen, and then combined using a meta-analysis with fixed-effects model and inverse-variance weights. This project contains Fawcett_et_al_Extended_data_supplement. docx, which contains the following extended data:

Data availability
• Supplementary materials and methods • Figure S1. Genome-wide interaction SNP-by-sex interaction results on four measures of lung function in UK Biobank • Figure S2. Correlation between genotype-by-sex interaction effect sizes in UK Biobank and the SpiroMeta studies • Figure S3. Association between rs7697189 and FEV 1 in children from the ALSPAC and Raine cohorts • Figure S4. GTEx data on expression of HHIP by sex in different tissues • Table S1. Association between rs7697189 and lung function traits in males and females, and genotype-by-sex interaction results •