Genetically informed precision drug repurposing for lung function and implications for respiratory infection

Impaired lung function is associated with significant morbidity and mortality. Restrictive and obstructive lung disorders are a large contributor to decreased lung function, as well as the acute impact of infection. Measures of pulmonary function are heritable, and thus, we sought to utilise genomics to propose novel drug repurposing candidates which could improve respiratory outcomes. Lung function measures were found to be genetically correlated with metabolic and hormone traits which could be pharmacologically modulated, with a causal effect of increased fasting glucose on diminished lung function supported by latent causal variable models and Mendelian randomisation. We developed polygenic scores for lung function specifically within pathways with known drug targets to prioritise individuals who may benefit from particular drug repurposing opportunities, accompanied by transcriptome-wide association studies to identify drug-gene interactions with potential lung function increasing modes of action. These drug repurposing candidates were further considered relative to the host-viral interactome of three viruses with associated respiratory pathology (SARS-CoV2, influenza, and human adenovirus). We uncovered an enrichment amongst glycaemic pathways of human proteins which putatively interact with virally expressed SARS-CoV2 proteins, suggesting that antihyperglycaemic agents may have a positive effect both on lung function and SARS-CoV2 progression.

. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06. 25.20139816 doi: medRxiv preprint 2 ABSTRACT Impaired lung function is associated with significant morbidity and mortality. Restrictive and obstructive lung disorders are a large contributor to decreased lung function, as well as the acute impact of infection. Measures of pulmonary function are heritable, and thus, we sought to utilise genomics to propose novel drug repurposing candidates which could improve respiratory outcomes. Lung function measures were found to be genetically correlated with metabolic and hormone traits which could be pharmacologically modulated, with a causal effect of increased fasting glucose on diminished lung function supported by latent causal variable models and Mendelian randomisation. We developed polygenic scores for lung function specifically within pathways with known drug targets to prioritise individuals who may benefit from particular drug repurposing opportunities, accompanied by transcriptomewide association studies to identify drug-gene interactions with potential lung function increasing modes of action. These drug repurposing candidates were further considered relative to the host-viral interactome of three viruses with associated respiratory pathology (SARS-CoV2, influenza, and human adenovirus). We uncovered an enrichment amongst glycaemic pathways of human proteins which putatively interact with virally expressed SARS-CoV2 proteins, suggesting that antihyperglycaemic agents may have a positive effect both on lung function and SARS-CoV2 progression.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 26, 2020.

MAIN
Optimal lung (pulmonary) function is vital for the ongoing maintenance of homeostasis, with reduced pulmonary function associated with a marked increase in the risk of mortality 1,2 . This is particularly critical due to the considerable number of disorders for which diminished pulmonary function is a clinical hallmark. For instance, chronic obstructive pulmonary disease (COPD), characterised by an irreversible limitation of airflow, is one of the leading causes of death worldwide 3 . Pulmonary manifestations are also common amongst disorders not directly classified as respiratory conditions, including diabetes 4,5 , congenital heart disease 6 , and inflammatory bowel disease 7,8 . Bacterial and viral infection, such as Streptococcus pneumoniae, Mycobacterium tuberculosis, influenza, and coronaviruses, also cause severe declines in respiratory function. In order to better manage the spectrum of respiratory disorders there is a desperate need for new interventions, including those that can be targeted to an individual's heterogeneous risk factors. While the development pathway for new compounds is difficult, there are likely to be opportunities for precision repurposing of existing drugs to enhance lung function and improve patient outcomes.
Spirometry measures of pulmonary function have been shown to display significant heritability both in twin designs and genome-wide association studies (GWAS) [9][10][11] . Genomics may reveal clinically relevant insights into the biology underlying lung function, and thus, could be leveraged for drug repurposing. We sought to interrogate the genomic architecture of three spirometry indices to propose drug repurposing candidates which could be used to improve lung function: forced expiratory volume in one second (FEV1), forced vital capacity (FVC), and their ratio (FEV1/FVC). Firstly, we assessed each lung function trait for evidence of genetic correlation with biochemical traits that could be pharmacologically modulated, followed by models to investigate whether there was evidence of causation. The previously developed . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint 4 pharmagenic enrichment score framework was then implemented to identify druggable pathways enriched with lung function associated variation and calculate pathway specific polygenic scores to prioritise individuals who may benefit from a repurposed compound which interacts with the pathway 12 . A transcriptome-wide association study of FEV1 and FVC was also undertaken to reveal genes which could be targeted by existing drugs that may increase pulmonary function. Finally, we considered the repurposing candidates proposed by these strategies in the context of three respiratory viruses (SARS-CoV2, influenza, and human adenovirus), specifically, analysing the interactions between viral and human proteins. An overview schematic of this study is detailed in figure 1.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint

Druggable biological pathway
Effect size of SNPs in biological pathway (m) on lung function

Causal exposures Heritable factors
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint Figure 1. Overview of strategies for genetically informed drug repurposing to improve lung function. The left column outlines our workflow for using causal inference to identify drug targets, while the right side shows the workflow for functionally partitioning the heritable component into drug targets. In both cases we utilize or integrate GWAS data for lung function (including three spirometry phenotypes: forced expiratory volume in one second (FEV1), forced vital capacity (FVC), and their ratio (FEV1/FVC)) and quantitative biochemical traits (e.g. hormones and metabolites) which can be pharmacologically modulated. Using this data, we established genetic correlation between lung function and the biochemical traits using LD score regression (LDSC) (left column). We then constructed a latent causal variable (LCV) model to investigate evidence of causality for significantly correlated biochemical-lung function trait pairs. To further support causal inference between significant pairs we implemented Mendelian randomisation. Where a causal relationship between a modifiable biochemical trait and lung function is established, we can infer a novel treatment. The right column shows the workflow for utilising the pharmagenic enrichment score (PES) framework for precision drug repositioning. Specifically, polygenic scores for lung function were calculated using lung function GWAS SNPs within biological pathways that can be targeted by approved drugs, rather than a genome-wide score. Individuals with low genetically predicted lung function by a PES (low PES) relative to a reference population (orange shaded distribution in right panel 3) may benefit from a compound which modulates said pathway. To further support putative genetically predicted targets for drug repositioning a transcriptomewide association study (TWAS) of lung function was performed. Druggable genes for which genetically predicted expression was correlated with a spirometry measure. Genes with positive genetic covariance between imputed expression and lung function (i.e. increased expression associated with increased lung function) could be modulated by an agonist compound, whilst . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint genes for which decreased predicted expression is associated with improved lung function could be targeted by an antagonist compound.

Measures of lung function were genetically correlated with clinically significant metabolites and hormones
We assessed genetic correlation between three pulmonary function measurements (FEV1, FVC, and FEV1/FVC) and 172 GWAS summary statistics of European ancestry using bivariate linkage disequilibrium score regression (LDSC) 13,14 . A number of clinically significant traits displayed significant genetic correlation with FEV1, FVC, and/or FEV1/FVC after the correcting for the number of tests performed (P < 2.9 x 10 -4 , Figure 2a . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Interestingly, there was evidence of genetic correlation between measures of lung function and circulating levels of both metabolites and hormones. This is notable as these molecules can be pharmacologically modulated, potentially informing novel therapeutic strategies and drug repurposing opportunities to improve lung function. Significant genetic correlations were observed with four metabolites (fasting glucose, high-density lipoprotein [HDL], triglycerides, and urate) and two hormones (fasting insulin and leptin) for at least one measure of lung function (Table 1). These significant relationships were as follows: FEV1 was negatively correlated with fasting insulin, leptin (adjusted and unadjusted for BMI), urate, and fasting glucose; FVC was negatively correlated with the same traits as FEV1 but was further positively correlated with HDL and negatively correlated with circulating triglycerides; FEV1/FVC was conversely negatively correlated with HDL.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 26, 2020.

Evidence of a causal relationship between fasting glucose and lung function supports antihyperglycaemic compounds as drug repurposing candidates
The genetic correlations observed between lung function measures and metabolite/hormone traits may be clinically actionable, however, a significant estimate of genetic correlation does not imply causality 15 . In response, we constructed a latent causal variable (LCV) model to estimate mean posterior genetic causality proportion (!"# $ ) for each metabolite or hormone trait and the lung function measure with which it is genetically correlated (Figure 2b, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 0.036), although these models displayed comparatively weaker evidence. A non-zero mean posterior GCP estimate was observed for urate and FVC (!"# $ = 0.73), however, the relatively low heritability z score as calculated by the LCV framework (z < 7) may lead to an inflated estimate. There was no significant evidence of genetic causality between any of the remaining LDSC prioritised hormone or metabolite traits and FEV1, FVC, or FEV1/FVC.
As it was the most significant LCV model, the causal effect of fasting glucose on FEV1 and FVC was further investigated utilising a Mendelian randomisation (MR) approach. MR differs . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint from an LCV model as it exploits genome-wide significant variants as genetic instrumental variables (IV) to calculate a causal estimate of an exposure (fasting glucose) on an outcome (lung function). We selected 32 genome-wide significant variants associated with glucose in approximate linkage equilibrium as IVs (P < 5 x 10 -8 , r 2 < 0.001) to ensure that variants were both rigorously associated with the exposure and independent from one another. . The weighted median method relaxes the assumption that all IVs must be valid, as described elsewhere 17 . An MR-Egger model was then constructed, which includes a non-zero intercept term which can be used as a measure of unbalanced pleiotropy 18 . The causal estimate using MR-Egger was in the same direction for FEV1 and FVC, however, was nonsignificant (Supplementary Table 6). It should be noted that the MR-Egger method has notably . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint 13 less power than the IVW approach, particularly when fewer instruments are used 18 .
Importantly, the MR-Egger intercept was not significantly different from zero in the FEV1 or FVC model, indicating no evidence of unbalanced pleiotropy. This was supported by a nonsignificant global test of pleiotropy implemented as part of the MR PRESSO (MR-Pleiotropy Residual Sum and Outlier) framework ( Supplementary Table 6) 19 .
CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.   Table 7) 20 . An analogous causal estimate was derived regardless of which IV was removed, however, there were five IVs (FEV1 model = two outlier SNPs, FVC model = four outlier SNPs, [two outlier SNPs shared]) for which the estimate was marginally nonsignificant after exclusion (maximum P = 0.11, IVW with multiplicative random-effects). We then used a phenome-wide association approach to demonstrate that these five SNPs were, i) annotated to genes with important roles in glycaemic homeostasis, and ii) were almost exclusively associated with glycaemic traits or diabetes (Supplementary Note, Supplementary   Tables 8-12). As a result, we concluded that these IVs did not likely represent horizontal pleiotropy, which would bias the causal estimate, but instead were biologically salient IVs with large effects.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint Whilst smoking status (ever vs never smoked) was a covariate in the lung function GWAS, we sought to assess whether the relationship between blood glucose and lung function could be driven by residual effects of smoking. There was a significant genetic correlation between the number of cigarettes smoked per day and fasting glucose (rg = 0.16, SE = 0.043), although this was not observed with the 'ever vs never smoked' phenotype (rg = 0.007, SE = 0.039).
However, a latent causal variable model constructed for fasting glucose and cigarettes smoked per day did not indicate evidence of genetic causality, in contrast to the glucose/lung function models -!"# $ = -0.47, SE = 0.33, P H0:GCP = 0 = 0.33. The MR IVs for glucose were further checked for association with either 'ever vs never smoked' and 'cigarettes per day', with none of the IVs demonstrating any association with either smoking phenotype at a genome-wide (P < 5 x 10 -8 ) or suggestive (P < 1 x 10 -5 ) significance threshold (Supplementary Tables 13,14).
In summary, these data suggested there is an effect of fasting glucose on lung function beyond what is directly attributable to a residual impact of smoking.

Implementation of the pharmagenic enrichment score for genetically informed drug repurposing in respiratory distress
We aimed to further expand drug repurposing opportunities for lung function using the pharmagenic enrichment score (PES) approach (Online methods) 12 . Briefly, PES aims to implement genetically informed drug repurposing with polygenic scores (PGS) calculated using genetic variants specifically within druggable pathways (Figure 4a). In the context of this study, individuals with a depleted PES for lung function (lower genetically predicted lung function) mapped to pathways with known drug targets may specifically benefit from drugs which modulate these pathways. The PES approach differs from a traditional genome wide PGS by providing direct biological insights into the potential impact of trait associated variation residing in drug function related gene-sets rather than the undifferentiated sum total . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint of trait associated variation. Candidate pathways for the generation of PES profiles are obtained using GWAS summary statistics by identifying druggable gene-sets which display an enrichment of common variant associations. Firstly, we performed gene-set association of FEV1 and FVC using a collection of high-quality gene-sets from the molecular signatures database (MSigDB). These sets contain at least one gene which is modulated by an approved pharmacological agent (NSets = 1030, Online Methods). The FEV1/FVC phenotype is less directly interpretable in this context, given that it is used primarily as a diagnostic tool rather than as a quantitative measure, and thus, we focused on repurposing candidates for FEV1 and FVC individually. Previously, we extended the concept of P-value thresholding (PT) for PGS to the multi-marker gene level test-statistic and implemented this in our gene set analysis 12 . We argue that distinct biological processes in individuals may only be captured when the optimal spectrum of polygenic variation is included in the model. A variety of PT could be utilised; for simplicity, we selected four P-values thresholds (all SNPs, PT < 0.5, PT < 0.05, and PT < 0.005), in accordance with our previous work 12 . We annotated variants to genes using genomic proximity. Genic boundaries were extended to capture regulatory variation, with both conservative and liberal upstream and downstream boundary definitions. This involved an extension of 5 kilobases (kb) upstream of the gene, and 1.5 kb downstream for the conservative construct, whilst a larger 35 kb upstream and 10 kb downstream was implemented in the more liberal construct.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint are constructed using variants specifically within druggable pathways. Individuals with a depleted PES, that is, lower genetically predicted spirometry measures using variants in the gene-set, may benefit from a drug which modulates the pathway in question. (b) The number of FDA-approved drugs with overrepresented targets in at least one candidate PES gene-sets per anatomic therapeutic classification (ATC) level one code.
Each ATC level one code is shaded a different colour with its frequency on the x-axis. and the y-axis is the -log10 P-value, with higher points more significant. Genes which are associated after multiple-testing correction for the number of genes in the pathway are coloured blue (strict FDR < 0.05) or red (lenient FDR < 0.1). The dotted line denotes an uncorrected nominally significant association (P < 0.05).
Gene-set association using the FEV1 and FVC GWAS was undertaken at each PT with both conservative and liberal genic boundaries. If a gene-set was significant at multiple PT, the most significantly associated PT was retained. The conservative genic-boundaries only yielded one druggable gene-set enriched with FEV1 associated variants after multiple testing correction (q < 0.05): Signalling events mediated by the Hedgehog family -% = 0.973, SE = 0.2, P = 9.3 x 10 -7 , PT < 0.5, NGenes = 22. There were no gene-sets with known drug targets using conservative . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.  Table 15). Extending the genic boundaries to capture more regulatory variation (liberal boundaries) uncovered more druggable gene-sets (Supplementary Table 16).
Specifically, there were seven and nine unique gene-sets which survived correction for FEV1 and FVC respectively (q < 0.05, Table 2). It should be noted that there were two pathways related to Hedgehog signalling, however, as these were from different annotation sources, and had a different number of genes, we considered them separately. A number of biological processes were encompassed by these prioritised gene-sets, such as: cancer (Pathways in cancer, Basal cell carcinoma), transforming growth factor (TGF)-beta superfamily signalling . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.  i) multiple gene-set members, and ii) more genes than expected by chance, were assumed to be particularly relevant for a biological pathway. There were six such gene-sets from the PES . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.  Single drug-gene matching was undertaken for remaining PES candidate gene-sets lacking an approved compound with statistically overrepresented target, retaining drug-gene interactions with at least two lines of evidence from DGIdb (Supplementary Tables 19-30).
In order to test the phenotypic relevance of FEV1 and FVC PES profiles, we utilised an independent genotyped cohort from the Hunter Community Study (HCS, N = 1804, Online Methods). Firstly, we constructed a genome-wide PGS for FEV1 and FVC at six different Pvalue thresholds (Supplementary Table 31). The optimum FEV1 genetic score explained approximately 6.4% of the variance in FEV1 measured in the HCS cohort, whilst the FVC PGS explained approximately 5.7% of variance in FVC. Each of the seven PES profiles were tested for association with FEV1 and/or FVC both with and without adjustment for genome-wide PGS. Four of the PES considered had at least a nominally significant association with their . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.  Table 3, Supplementary   Table 32). The variance explained by the significant PES was between 0.4% -0.7%, with the number of independent SNPs in these scores ranging from 76 to 16390. We then constructed a model which was adjusted for genome wide PGS at the same PT as the PES and found that is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint genome wide PGS (90 th percentile of HCS cohort) but low genetically predicted lung function using one of the PES (10 th percentile). Specifically, 12.17% and 12.05% of the HCS participants in the 90 th percentile PGS for FVC and FEV1 respectively had a depleted PES (10 th percentile, low predicted lung function by PES). Taken together, this suggests that pathway based polygenic scores provide distinct biological insights for some individuals with otherwise high genetic load of lung function increasing alleles.

Transcriptome-wide association identifies putative targets for pharmacological modulation of lung function
We performed a transcriptome-wide association study (TWAS) of the three lung function measures using SNP weights from lung and blood tissue. TWAS leverages models of genetically regulated expression to test for a correlation between predicted expression and a phenotype 22 . Models of imputed expression derived from cis-eQTLs are generated from genes for which expression displays significant cis-heritability, that is, a significant genetic contribution to expression variance. We aimed to identify genes for which increased or decreased expression was associated with increased lung function and had approved compounds available which could improve lung function based on their mechanism of action . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.  Figure 5b). Transcriptome-wide associated genes were only retained if they were not also associated with a smoking phenotype, to minimise residual smoking related confounding. Specifically, we tested whether predicted expression of the genes which survived correction in the FEV1 or FVC TWAS were associated with smoking behaviour ('ever vs never smoked' and 'cigarettes per day') in a TWAS using SNP weights from lung, blood, and two brain regions implicated in nicotine addiction (dorsolateral prefrontal cortex and nucleus accumbens -Online Methods, Supplementary Tables 45-52) 23,24 .
We searched each of these significant genes in the Drug-Gene Interaction Database (DGIdb v3.0.2) to ascertain compounds which may improve lung function based on the direction of effect from the TWAS analyses. In accordance with the PES analyses FEV1/FVC was not directly considered and we focused on FEV1 and/or FVC associated genes which could be pharmacologically modulated. A tiered system was utilised to select drug-gene interactions which may enhance lung function, whereby tier one were FDA approved compounds, and tier two were investigational (Online Methods, Supplementary Table 53). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint Probabilistic finemapping of these transcriptome-wide significant regions using a multi-tissue reference panel was then performed to prioritise whether these genes are likely causal at that locus (Online Methods). A credible set with 90% probability of containing the causal gene was computed for each locus utilising the marginal posterior inclusion probability (#&#) calculated from the observed TWAS statistics. We did not proceed with finemapping the PPARD locus due to its proximity to the defined boundaries of the MHC region. Two FEV1 associated genes with tier one and/or tier two drug interactions, AMT and PYGB, were included in the credible set with a PIP > 0.9. Tetrahydrofolate is a co-factor for AMT (ZTWAS = 5.96, #&# = 0.893, whole blood SNP weights), which has been previously implicated as having a beneficial effect on can be clarified to ensure that this signal is not driven by a residual effect of smoking.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.

Host-viral interactomes suggested proposed pulmonary drug repurposing candidates may be significant for respiratory virus infection
Respiratory viruses are an important contributor to acute, and potentially fatal, declines in lung function. We sought to investigate whether our proposed drug repurposing candidates for lung function may also exhibit anti-viral properties against these pathogens. The host-virus interactome was analysed for three respiratory viruses to perform computational drug repurposing -severe acute respiratory syndrome coronavirus 2 (SARS-CoV2), influenza We demonstrated using multiple lines of evidence a putative relationship between increased fasting blood glucose and lung function -therefore, we investigated whether any of the host-. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint viral interactome members were enriched within biological pathways involved in glycaemic homeostasis. Interestingly, there was an overrepresentation of SARS-CoV2 'prey proteins' amongst four gene-sets related to glucose metabolism, along with insulin and glucagon signalling pathways (Table 4). Fourteen SARS-CoV2 'prey proteins' were members of at least one of these gene-sets, with a greater number of interactions amongst these genes than expected by chance (P = 4.42 x 10 -12 , Supplementary Table 57). We outline evidence for the potential role of these viral prey genes in glycaemic homeostasis in supplementary table 57. These data support emerging evidence that SARS-CoV2 infected patients with hyperglycaemia are at higher risk of morbidity and mortality 28 .

Glycaemic gene-set P-value
Glucagon-like Peptide-1 (GLP1) regulates insulin secretion 7.02 x 10 -4 Glucagon signalling in metabolic regulation 2.33 x 10 -4 Glucose metabolism 2.69 x 10 -5 Regulation of insulin secretion 2.13 x 10 -3 None of the glycaemic 'prey proteins' were direct target of antidiabetic compounds, however, 57% of these proteins had a high confidence protein-protein interaction with antidiabetic target gene (Supplementary Table 58). For instance, GNB1 putatively binds with a SARS-CoV2 nonstructural proteins (Nsp7) that forms the part of the replicase / transcriptase complex, whilst this protein also demonstrated evidence of interacting with 15 proteins modulated by an antidiabetic compound -such as GLP1R, which is the primary target of GLP-1 analogues, including exenatide. Pharmacological interventions which seek to control blood glucose may . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.

DISCUSSION
This study demonstrated a variety of methods for which genomic data could be utilised to propose drug repurposing candidates, ranging from approaches which exploit genome wide variant effects, to the identification of candidate clinically significant drug-gene interactions.
Lung function is a particularly relevant phenotype to study in this context as its aetiology is influenced by a variety of complex biological factors and it is a significant contributor to global morbidity and mortality. We uncovered a number of putative pulmonary drug repositioning opportunities, with the role of glycaemic regulation in pulmonary function particularly interesting from a therapeutic perspective. Our study suggests a causal relationship between blood glucose and lung function using a genome-wide (LCV), and instrumental variable (MR) approach, whilst downregulation of the glycogen phosphorylase PYGB was also associated with FEV1 after probabilistic finemapping of TWAS loci. These data support previous literature suggesting that declines in pulmonary function are overrepresented amongst individuals with diabetes and correlates with poor glycaemic control 5,29-31 ; a phenomenon which has also been reported in non-diabetics 32,33 . There are a number of pathophysiological mechanisms postulated to underlie this relationship, including fibrosis mediated by hyperglycaemia accelerated epithelial-to-mesenchymal transition 34 , and aberrant inflammatory responses to dysglycaemia 35,36 . Respiratory sequalae after infection may also be significantly affected by dysregulation of glycaemic control. Acute hyperglycaemia is associated with a significant increase in morbidity and mortality amongst non-diabetic community-acquired pneumonia (CAP) patients, which further supports its utility as a treatment target [37][38][39][40] . Notably, even patients with mild hyperglycemia [serum glucose 6-10.99 mmol/L] have a purported . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint elevated risk of death at 90 days following CAP diagnosis 37 , whilst the association between type 2 diabetes and poor pneumonia outcomes appears to be driven by glycaemic control 40 .
Inflammation is likely to be an important component of glycaemic influenced adverse effects, for instance, the intracellular carbohydrate O-Linked β-N-acetylglucosamine has been recently linked to influenza-associated cytokine storms 41 . Our findings supported the relevance of glycaemia to respiratory infection through demonstrating that proteins which putatively interact with the SARS-CoV2 virus were overrepresented in glycaemic pathways. Whilst the viral prey proteins we identified as members of glycaemic pathways were not the direct targets of antihyperglycaemic agents, some interact with these compounds, although biological saliency of these interactions warrants future investigation. The presence of a viral-prey protein interaction also does not necessarily support its essentiality in the viral life cycle and further data are needed to support this. Furthermore, the viral prey proteins overrepresented in the glycaemic pathways were mostly genes such as nucleoporins and cAMP-dependent protein kinases which have pleiotropic regulatory roles spanning a number of biological systems.
These data taken together support the utility of managing blood glucose in the clinical improvement of respiratory outcomes.
Targeted drug application and repurposing is by its very nature confounded by biological heterogeneity amongst individuals. This is likely particularly true in the case of complex traits as their polygenic genetic architecture provides the substrate for each individual to display a unique profile of trait-associated variation. In the second stream of this study we stratified the polygenic architecture of lung function into a series of druggable pathways to provide a framework for pathway specific genetic scores we designate the pharmagenic enrichment score (PES). We suggest that leveraging inter-individual genetic heterogeneity in this way will improve the precision application of novel drug repurposing. A number of interesting drug . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint repositioning candidates had overrepresented targets amongst the candidate PES gene-sets. For example, magnesium sulfate had enriched targets in the Dilated cardiomyopathy PES and has previously shown promise as a repurposing candidate to improve pulmonary function in asthma 42,43 . Using an independent cohort, several PES profiles tested explained a small, but significant, percentage of variance in FEV1 and/or FVC. The Class B/2 secretin family receptors score for FVC was particularly noteworthy given that it remained significant after an adjustment for genome wide PGS. Interestingly, this gene-set features a number of proteins involved with glycaemic homeostasis, including antidiabetic drug targets glucagon-like peptide receptor 1 (GLP1R) and amylin receptors (RAMP1, RAMP2, and RAMP3).While all of the PES demonstrated significant correlation with genome wide PGS, in the majority of cases it was small (r < 0.2), suggesting that most of these functionally relevant foci of genomic risk in lung function GWASs were relatively independent of the total PGS. Importantly, we still identified individuals with high genetically predicted lung function using a genome wide PGS but observed low predicted lung function with a pathway-specific PES. This was supported by the observed correlation between the PES and related mRNA expression which was distinct from a genome wide PGS. Collectively, these data are consistent with the hypothesis that important treatment-related biology can be captured at a pathway level for individuals with or at risk of respiratory illness.
Taken together our approach provides template for genetically informed precision drug repositioning to improve lung function. The clinical implementation in its most basic form would involve common variant genotyping using a commercial SNP array followed by imputation and lung function PES based stratification of treatment options. This would be combined with other biochemical exposure measures, such as fasting glucose, that are causal risk factors and have approved treatments. To illustrate the clinical implementation of our . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020. . https://doi.org/10.1101/2020.06.25.20139816 doi: medRxiv preprint strategy, we generated a schematic representation of individual heterogeneity in biochemical and genetic components of risk in lung function and related them to candidates for precision drug repositioning ( Figure 6). We envisage that our approach to variant and exposure risk stratification can be applied more broadly to identify and implement precision drug repositioning in range of complex traits.
While there are some potential confounds in the use of GWAS data for causal inference via both latent causal variable models and Mendelian randomisation, such as, measurement error, population stratification, and horizontal pleiotropy, we are confident that the relationship between glycaemia and lung function presented in this study is robust given the multiple lines of support. Replicated, well-powered randomised controlled trials, however, are needed to fully resolve the clinical benefit of repurposing antihyperglycaemic compounds to improve lung function and in the context of viral infection. We also acknowledge that the direction of suitable pharmacological intervention is not inherently clear, such that an agonist or antagonist of genes within a pathway implicated by the PES approach is an important consideration 12 . Careful curation of proposed repurposing candidates will therefore be critical, particularly in the context of pulmonary traits where a variety of currently approved compounds have adverse respiratory effects. We suggest that TWAS could be utilised to help overcome these issues by identifying druggable genes which are members of candidate PES gene-sets for which a clinically beneficial impact on expression can be predicted. Interestingly, we also saw some evidence of cross talk between heritable risk at genes associated with lung function and fasting glucose, with the downregulation of the glycogen phosphorylase PYGB (associated with FEV1) observed through the probabilistic fine mapping of TWAS loci. In summary, we revealed candidate drug repurposing opportunities to potentially improve pulmonary function and provide the means for aligning their application in individuals that carry a high relative burden . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 26, 2020.  ) represents an individual with good lung function (pink lung tissue) and genomic and environmental components consistent with healthy lung function (grey to red nodes). These have a neutral to positive influence on lung function represented by the grey and red edges (arrow) respectively. Case 2 has high fasting glucose and neutral (grey) loading of genetic variants (PES) associated with lung function pathways. After treatment with a drug for diabetes such as metformin or insulin, lung function is improved (red edge) sufficiently for therapeutic effect represented by pink lungs. Case 3 has enrichment of genetic variants (PES) associated with poorer lung function in the Class b2 secretin pathway. To improve lung function they are treated with drugs, such as pramlintide and recombinant glucagon that works by modulating target genes in the Class b2 secretin pathway to ameliorate the enrichment of poor lung function variants in that pathway. The broken edge between fasting glucose and the Class b2 secretin pathway represents the probable connection or shared genes between these nodes. Case 4 also presents with poor lung function (blue lung tissue) and enrichment of poor lung function associated variants in the Circadian clock pathway (blue

Lung function GWAS
We obtained GWAS summary statistics for FEV1, FVC, and their ratio from a meta-analysis of the UK biobank sample with the SpiroMeta consortium cohorts as outlined extensively elsewhere (N = 400102) 11 . Phenotypes were adjusted for age, age 2 , sex, height, smoking status (ever vs never smoked) and genotyping array before the residuals were subjected to rank inverse-normal transformation.

Genetic correlation
Bivariate linkage disequilibrium score regression (LDSC) was performed between each lung function trait and a variety of GWAS as implemented by LDhub v1.9.3 14 . Lung function summary statistics were cleaned ('munged') prior to LDSC using munge_sumstats.py and merged with common HapMap3 SNPs excluding the MHC region due to its LD complexity, as is usual practice 13 . We retained estimates of genetic correlation (rg) for GWAS (N = 172) with European ancestry and a heritability z value > 4, as calculated by LDhub. When a phenotype had multiple GWAS, the GWAS with largest sample size was retained. The Bonferroni method was utilised for multiple testing correction -P < 2.9 x 10 -4 (α = 0.05/172).
A heatmap was constructed using the ComplexHeatmap package 44 .

Latent causal variable models
Latent causal variable models were constructed between each measure of lung function which displayed a significant genetic correlation with a hormone or metabolite trait (see references for GWAS in supplementary table 4). The RunLCV.R and MomentFunctions.R scripts were leveraged to perform these analyses (https://github.com/lukejoconnor/LCV). The LCV framework assumes that a latent variable, L, mediates the genetic correlation between two traits (trait one, trait two), and uses the mixed fourth moments of the bivariate effect size distribution to estimate the mean posterior genetic causality proportion as described in detail by O'Connor and Price 15 . Specifically, the LCV model postulates that if trait two is partially causal for trait one, then directional SNP effects will be unequal, that is, variants impacting trait two will have a proportional effect on trait one, but this will not be observed in the other direction. The mean posterior GCP can be defined by equation one, where ! ! " is the normalised effect of L on trait one or two respectively, and " # is genetic correlation estimate: All traits were munged prior to LCV analyses, with only HapMap3 SNPs (MAF > 0.05) outside the MHC region retained in accordance with the LDSC analyses. We utilised the baseline 1000 genomes phase 3 LD scores for HapMap3 SNPs (MHC excluded). A two-sided t test was used to assess whether the estimated GCP was significantly different from zero.

Mendelian randomisation
We Further, we implemented a weighted median model which takes the median of the ratio estimates (as opposed to the mean in the IVW model), such that upweighting was applied to ratio estimates with greater precision 17 . An advantage of this approach is that it is subject to the 'majority valid' assumption, whereby an unbiased causal estimate will still be obtained if less than 50% of the model weighting arises from invalid IVs. An MR egger model was then constructed; an adaption of Egger regression wherein the exposure effect is regressed against the outcome with an intercept term (-/ ) added to represent the average pleiotropic effect (equation three) 18 .
The key assumption of the MR egger model is referred to as Instrument Strength Independent of Direct Effect (InSIDE), which assumes that there is no significant correlation between direct IV effects on the outcome and genetic association of IVs with the exposure. In other words, the InSIDE assumption is violated if pleiotropic effects act through a confounder of the exposure-outcome association 16,18,49 . We also tested whether the Egger intercept is significantly different from zero as a measure of unbalanced pleiotropy or violation of the InSIDE assumption. In addition, heterogeneity amongst the IV ratio estimates was quantified using Cochran's Q statistic, given that horizontal pleiotropy may be one explanation for significant heterogeneity. A global pleiotropy test was also implemented via the MR PRESSO framework 19 . Leave-one-out analyses were then performed to assess whether causal estimates are biased by a single IV, which may indicate the presence of outliers, and the sensitivity of the estimate to said outliers. However, outliers may not necessarily be evidence of horizontal pleiotropy. There were five IVs in either the FEV1 or FVC model where the IVW estimate was marginally no longer significant following their removal in the leave-one-out analysis. We performed a PheWAS for each of these SNPs using summary data collated by GWAS atlas v20191115 (https://atlas.ctglab.nl/) to assess evidence of horizontal pleiotropy, that is, acting through non-glycaemic pathways to influence lung function 50 . All MR analyses were performed in R version 3.6.0 using the TwoSampleMR v0.4.25 and MRPRESSO v1.0 packages.

Investigating residual confounding from smoking on the relationship between fasting glucose and lung function
We investigated whether a residual effect of smoking could confound the link between glucose and lung function. Firstly, we selected two well-powered GWAS of smoking behaviours: ever vs never smoked (N = 385013) 50 , and cigarettes smoked per day (N = 263954) 51 . Genetic correlation between these two smoking phenotypes and fasting glucose was estimated as described above, followed by the construction of a latent causal variable model. The MR IVs utilised for fasting glucose were also checked for association with each smoking GWAS.

Generation pharmagenic enrichment score (PES) candidate gene-sets
We implemented gene-set association using MAGMA method (MAGMA v1.06b), with some customisations to the framework to identify candidate PES genesets 12,52 . MAGMA aggregates SNP-wise P values for trait association into a gene-based P value and, thereafter, tests whether a set of genes is more strongly associated with the phenotype than all other genes. Gene-based test statistics were calculated analogous to Brown's method, which is applicable to dependent P-values with known covariance (as common SNPs display through the phenomenon of linkage disequilibrium [LD], which can be quantified at a population level). Specifically, the mean χ 2 gene test-statistic sums P-values mapped to each gene, using the 1000 genomes reference genotypes to scale the null χ 2 distribution. P-value thresholding (PT) was utilised for the gene test statistic calculation; for simplicity, we selected four P-values thresholds (all SNPs, PT < 0.5, PT < 0.05, and PT < 0.005). We mapped variants to 18297 autosomal genes in hg19 assembly defined by NCBI and obtained from the MAGMA website -genes within the major histocompatibility complex (MHC) were removed due to the complexity of LD within this region. The 1000 genomes phase 3 European reference panel was utilised to define LD for input into MAGMA. Genic boundaries were extended to capture regulatory variation, with both conservative and liberal upstream and downstream boundary definition implemented. An extension of 5 kilobases (kb) upstream of the gene, and 1.5 kb downstream was the conservative construct, whilst a larger 35 kb upstream and 10 kb downstream was the liberal construct. Boundaries were longer upstream of the gene in both instances to capture more promoter related variation, as is usual practice [53][54][55] Genic P-values were transformed to Z-scores with the probit function for input into the geneset association model. Competitive gene-set association was undertaken by a linear regression model whereby genic Z-scores are the outcome and confounders including gene size and genic minor allele count included as covariates. A one-sided test was performed for the term in the model which specifies whether each gene was within the set of interest (βGS), such that the null hypothesis is βGS = 0 and the alternative βGS > 0. When these models are constructed at different PT, this approach constitutes testing whether the gene-set is more associated than the other genes, for which test-statistics were calculated only including SNPs below the threshold. We defined gene-sets with known drug targets by sourcing hallmark and canonical (BioCarta, KEGG, PID, and Reactome) from the Molecular signatures database (MSigDB) 56 , and retaining those with at least one gene with a high confidence interaction with at least one approved pharmacological agent (TClin genes), as annotated using the Target Central Resource Database (TCRD v6.1, NGenes = 613) 57 .

PES candidate gene-set drug repurposing
We tested each candidate PES gene-set for overrepresentation of DrugBank compound targets using WebGestaltR v0.4.2 58 . Compounds were retained for each pathway if they survived FDR correction (q<0.05) and were FDA approved (https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm). We then searched the literature for each of these compounds to prioritise them on the basis of side-effects and prior clinical trial evidence. After excluding compounds with only topical formulations available, drugs were reviewed for lung function related adverse events (including all of dyspnea, abnormal breath sounds, decreased respiratory rate, orthopnea, shallow breathing, respiratory distress, respiratory depression or any other related term), important precautions, black-box warnings or any contraindication that might prohibit the drug use in our study population. These data were obtained for each compound using the following databases: drugs.com, Medscape, SIDER v4.1, and the summaries of each product's characteristics 59 . We also searched for articles that discussed either an improvement or worsening in the lung functions for each compound. The allowed paediatric age and formulation for each compound were also reviewed. The full list of evaluated compounds is detailed in supplementary table 18, with the ranking criteria also detailed in the supplementary methods.

The PES model for individuals
We defined the model to calculate PES profiles for individuals as follows (equation four).
Consider j SNPs for i individuals, wherein the SNPs are those physically mapped to genes which are members of a candidate PES gene-set (m). Let ( ) , denote the statistical effect size for each variant from the GWAS, multiplied by its dosage Gij. The SNPs included were those below the P-value threshold utilised to discover the gene-set.

Lung function PES in the Hunter Community Study cohort
We utilised an independent, genotyped cohort for which spirometry measures were recorded to investigate the phenotype relevance of PES profiles for lung function. Participants were drawn from the Hunter Community Study (HCS), a population-based cohort of individuals aged between 55-85 years, predominantly of European ancestry and residing in Newcastle, New South Wales, Australia. All work was conducted in accordance with ethics committee approvals. Consenting participants completed a series of questionnaires, attended a clinic visit, and provided blood samples. Individuals were recruited by random selection from the New South Wales State electoral roll with detailed recruitment and data collection methods for the HCS described elsewhere 61 . Participants were genotyped using the Affymetrix Axiom Kaiser array and imputed to the Haplotype Reference Consortium (Supplementary Methods) 62 . We retained 2089 unrelated, European ancestry participants and common variants (MAF > 0.01) with high imputation quality (R 2 > 0.8). The full description of the imputation and quality control process is provided in in the supplementary methods.
Spirometry data from the HCS was then processed by selecting individuals with non-missing FEV1 and FVC. We utilised the maximum FEV1 and FVC from four attempts and fitted a linear model which covaried for sex, age, age 2 , height, height 2 , smoking status, self-reported asthma status, and self-reported bronchitis/emphysema status. The phenotype for association testing were residuals from these models transformed via inverse-rank normalisation (Blom transformation) using the RNOmni package. We tested the association between a genome wide PGS for FEV1 and FVC (PT < 1, 0.5, 0.05, 0.005, 5x10 -5 , 5x10 -8 ) with their respective transformed spirometry indices adjusted for the first five SNP derived principal components using PRSice v2.2.12. Similarly, the association between each of the PES profiles with an overrepresentation of FDA-approved drug targets and FEV1 and/or FVC were investigated using the same approach. We further adjusted each of these models for genome wide PGS at the same PT for which the PES was calculated.

The relationship between PES and mRNA expression
We obtained RNAseq normalised read counts (PEER normalised RPKM) for 23723 genes which survived QC in the geuvadis dataset (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/files/analysis_results/?ref=E-GEUV-1). The geuvadis project performed RNAseq on transformed lymphoblastoid cell lines (LCL) for participants in the 1000 genomes project 21 . We retained 357 European individuals in this dataset for which phase 3 sequencing data was available from the 1000 genomes. The association between normalised mRNA expression for genes part of the candidate gene-set and each PES was tested using a linear model, adjusted for sex, the first three SNP derived principal components, and genome-wide PGS at the same PT utilised to calculate the PES.
Multiple testing correction was applied for the number of genes in each set via Benjamini-Hochberg method using the p.adjust() function.

Transcriptome-wide association studies
A transcriptome-wide association study of each lung function measure was performed using the FUSION software 22 . SNP weights were derived for genes with a significant contribution of cis acting SNPs to expression variability (cis-h 2 P < 0.01) using lung and whole blood RNAseq GTEx v7 data (http://gusevlab.org/projects/fusion/). A transcriptome-wide significant gene was defined by accounting for the number of genes with models of genetically regulated expression in lung and whole blood respectively -Lung: P < 6.43 x 10 -6 [α = 0.05/7776], Whole blood: P < 8.32 x 10 -6 [α = 0.05/6007]. We excluded genes within the MHC region due to its LD complexity. Furthermore, we subjected two smoking behaviour phenotypes to TWAS to uncover associations which could be driven by residual effects of smoking. This is inherently conservative as it is possible that genes associated with both lung function and smoking behaviours could exhibit pleiotropic effects, however, as we wish to define drug targets relevant to lung function, the exclusion of these shared genes is warranted. The smoking phenotypes were 'ever vs never smoked' and 'cigarettes smoked per day' and TWAS was performed using lung and blood for consistency, along with SNP weights from the dorsolateral prefrontal cortex and nucleus accumbens, as these brain regions have been implicated in nicotine addiction. Genes which survived the above were searched using DGidb, with the following criteria utilised to define gene-target pairs, where the drug mode of action matched the sign of the TWAS Z value: i) Tier one -FDA approved compound with at least two lines of evidence for interacting with the target gene, ii) Tier two -investigational compound (not FDA approved) with at least two lines of evidence for interacting with the target gene.

Probabilistic finemapping of druggable TWAS signals
A Bayesian method FOCUS was utilised to finemap TWAS associations which could be therapeutically useful 63 . Given observed TWAS statistics, the marginal posterior inclusion probability (282) was calculated and subsequently used to compute a credible set with 90% probability (9) of containing the causal gene (: 0 = 1). As FOCUS allows the null model to be predicted as a possible member of the credible set, we excluded any genes for which that occurred. The credible set (4) was defined by summing normalised 282 such that 9 was exceeded, sorting the genes and then including those genes until at least 9 of the normalizedposterior mass is explained (equation six).

{6<=<
The Bernoulli prior for each causal indicator was set as the default E = 1 x 10 -3 , with a default prior variance for effects at causal genes set as 40 (=+ 6 " = 40

Host-viral interactome data
We selected three respiratory viruses for which host-viral protein interaction data was previously published: SARS-CoV2, influenza (H1N1), and the human adenovirus (HAdV) family. The host-SARS-CoV2 interactome was defined using affinity-purification mass spectrometry (NGenes = 332, MiST score ≥ 0.7, a SAINTexpress BFDR ≤ 0.05) 25 . We selected 91 proteins which both interact with viral proteins expressed by influenza (mass spectrometry) and siRNA-mediated downregulation reduced viral replication in cultured cells by at least three log10 units while retaining >80% cell viability 26 . Finally, the HAdV-host interactome was defined using a protein microarray platform (NGenes = 24), which encompasses 20 viral proteins encoded by five HAdV species 27 . We investigated approved inhibitors or antagonists of these genes using DGidb as described above in the PES candidate gene-set drug repurposing section.

Overrepresentation of viral prey proteins in glycaemic pathways
The sets of genes which interact with viral proteins for each virus ('viral prey proteins'), were subjected to overrepresentation analysis using the GENE2FUNC function of FUMA 64 . We selected gene-sets which survived multiple testing correction (q < 0.05), which contained at least one of the following key terms related to glycaemic biology: glucose, insulin, diabetes or glucagon. Further, we investigated whether there was a significant overrepresentation of interactions amongst these viral prey proteins overlapping a glycaemic pathway using STRING v11.0 65 . We assembled a list of antidiabetic drug targets by searching compounds annotated with the level two ATC code A10 (Drugs used in diabetes) in DGIdb, retaining drug-gene interactions with two or more lines of evidence (Supplementary Table 59). The interactions between these drug target proteins and the glycaemic SARS-CoV2 prey proteins were investigated once more using STRING, with only interactions scoring > 0.75 considered.

Code and data availability
All data are publicly available from the references described in the manuscript. Code related to this study can be found at the following link: https://github.com/Williamreay/Lung_function_drug_repurposing_manuscript