A meta-analysis of genome-wide association studies of childhood wheezing phenotypes identifies ANXA1 as a susceptibility locus for persistent wheezing

Background: Many genes associated with asthma explain only a fraction of its heritability. Most genome-wide association studies (GWASs) used a broad definition of ‘doctor-diagnosed asthma’, thereby diluting genetic signals by not considering asthma heterogeneity. The objective of our study was to identify genetic associates of childhood wheezing phenotypes. Methods: We conducted a novel multivariate GWAS meta-analysis of wheezing phenotypes jointly derived using unbiased analysis of data collected from birth to 18 years in 9568 individuals from five UK birth cohorts. Results: Forty-four independent SNPs were associated with early-onset persistent, 25 with pre-school remitting, 33 with mid-childhood remitting, and 32 with late-onset wheeze. We identified a novel locus on chr9q21.13 (close to annexin 1 [ANXA1], p<6.7 × 10-9), associated exclusively with early-onset persistent wheeze. We identified rs75260654 as the most likely causative single nucleotide polymorphism (SNP) using Promoter Capture Hi-C loops, and then showed that the risk allele (T) confers a reduction in ANXA1 expression. Finally, in a murine model of house dust mite (HDM)-induced allergic airway disease, we demonstrated that anxa1 protein expression increased and anxa1 mRNA was significantly induced in lung tissue following HDM exposure. Using anxa1-/- deficient mice, we showed that loss of anxa1 results in heightened airway hyperreactivity and Th2 inflammation upon allergen challenge. Conclusions: Targeting this pathway in persistent disease may represent an exciting therapeutic prospect. Funding: UK Medical Research Council Programme Grant MR/S025340/1 and the Wellcome Trust Strategic Award (108818/15/Z) provided most of the funding for this study.


Introduction
Asthma is a complex disorder caused by a variety of mechanisms which result in multiple clinical phenotypes (Pavord et al., 2018). It has a strong genetic component, and twin studies estimate its heritability to be ~60-70% (Duffy et al., 1990). 'Asthma genes' have been identified through a range of approaches, from candidate gene association studies  and family-based genome-wide linkage analyses (Daniels et al., 1996) to genome-wide association studies (GWASs) (Moffatt et al., 2007;Moffatt et al., 2010;Demenais et al., 2018). The first asthma GWAS (2007) identified multiple markers on chromosome 17q21 associated with childhood onset asthma (Moffatt et al., 2007). A comprehensive review summarising the results of 42 GWASs of asthma and asthmarelated traits has been published recently (El-Husseini et al., 2020). The most widely replicated locus is 17q12-21, followed by 6p21 (HLA region), 2q12 (IL1RL1/IL18R1), 5q22 (TSLP), and 9p24 (IL33) (Kim and Ober, 2019). Overall, the evidence suggests that multiple genes are underlying the association peaks (Kim and Ober, 2019).
However, despite undeniable successes, genetic studies of asthma have produced relatively heterogeneous results, and only a small proportion of the heritability is accounted for (Ober and Yao, 2011). One part of the explanation for the paucity of precise replication are numerous gene-environment interactions . Another important consideration is asthma heterogeneity, in that asthma diagnosis comprises several conditions with distinct pathophysiology (Custovic, 2020;Haider et al., 2022), each potentially underpinned by different genetic associations (Custovic et al., 2019). However, in order to maximise sample size, most GWASs used a definition of 'doctor-diagnosed asthma' (Aaron et al., 2017). Such aggregated outcome definitions are imprecise (Looijmans-van den Akker et al., 2016) and phenotypically and mechanistically heterogeneous (Robinson et al., 2021), and this heterogeneity may dilute important genetic signals (Custovic et al., 2019).
One way of disaggregating asthma diagnosis is to use data-driven methods to derive subtypes in a hypothesis-neutral way (Howard et al., 2015). For example, we jointly modelled data on wheezing from birth to adolescence in five UK population-based birth cohorts and identified five distinct phenotypes (Oksel et al., 2019a). However, although latent modelling approaches have been instrumental in elucidating the heterogenous nature of childhood asthma diagnosis (Haider et al., 2022), there has been little research into the genetic associations of phenotypes derived using data-driven methods. This is the first study to investigate the genetic architecture of wheezing phenotypes from infancy to adolescence, to identify genes specific to each phenotype and better understand the genetic heterogeneity between the disease class profiles.

Materials and methods
Study design, setting, participants, and data sources/measurement The Study Team for Early Life Asthma Research (STELAR) consortium (Custovic et al., 2015) brings together five UK population-based birth cohorts: Avon Longitudinal Study of Parents and Children (ALSPAC) (Golding et al., 2001), Ashford (Cullinan et al., 2004) and Isle of Wight (IOW) (Arshad et al., 2018) cohorts, Manchester Asthma and Allergy Study (MAAS) (Custovic et al., 2002), and the Aberdeen Study of Eczema and Asthma to Observe the Effects of Nutrition (SEATON) (Martindale et al., 2005). All studies were approved by research ethics committees. See Appendix 1: Description of cohorts for more details. Informed consent was obtained from parents, and study subjects gave their assent/consent when applicable.
Validated questionnaires were completed on multiple occasions from infancy to adolescence (Oksel et al., 2019a). A list of variables, per cohort, is shown in Appendix 1-table 1, and the cohortspecific time points and sample sizes in Appendix 1-table 2. Data were harmonised and imported into Asthma eLab web-based knowledge management platform to facilitate joint analyses (Custovic et al., 2015).

Definition of primary outcome (wheeze phenotypes from infancy to adolescence)
In the pooled analysis among 15,941 subjects with at least two observations on current wheeze, we used latent class analysis (LCA) to derive wheeze phenotypes from birth to age 18 years (Oksel et al., 2019a). A detailed description of the analysis is presented in Oksel et al., 2019a, and in Appendix 1: Definition of variables. A five-class solution was selected as the optimal model (Oksel et al., 2019a), and the classes (wheeze phenotypes) were labeled as: (1) never/infrequent wheeze (52.4%); (2) earlyonset pre-school remitting wheeze (18.6%); (3) early-onset middle-childhood remitting wheeze (9.8%); (4) early-onset persistent wheeze (10.4%); and (5) late-onset wheeze (8.8%). These latent classes were used in the subsequent GWAS. eLife digest Three-quarters of children hospitalized for wheezing or asthma symptoms are preschool-aged. Some will continue to experience breathing difficulties through childhood and adulthood. Others will undergo a complete resolution of their symptoms by the time they reach elementary school. The varied trajectories of young children with wheezing suggest that it is not a single disease. There are likely different genetic or environmental causes.
Despite these differences, wheezing treatments for young children are 'one size fits all.' Studying the genetic underpinnings of wheezing may lead to more customized treatment options.
Granell et al. studied the genetic architecture of different patterns of wheezing from infancy to adolescence. To do so, they used machine learning technology to analyze the genomes of 9,568 individuals, who participated in five studies in the United Kingdom from birth to age 18. The experiments found a new genetic variation in the ANXA1 gene linked with persistent wheezing starting in early childhood. By comparing mice with and without this gene, Granell et al. showed that the protein encoded by ANXA1 controls inflammation in the lungs in response to allergens. Animals lacking the protein develop worse lung inflammation after exposure to dust mite allergens.
Identifying a new gene linked to a specific subtype of wheezing might help scientists develop better strategies to diagnose, treat, and prevent asthma. More studies are needed on the role of the protein encoded by ANXA1 in reducing allergen-triggered lung inflammation to determine if this protein or therapies that boost its production may offer relief for chronic lung inflammation.  showing an overview of the genome-wide association study (GWAS) results by wheeze phenotype (from outside to inside: early-onset persistent, early-onset pre-school remitting, early-onset mid-childhood remitting, and late-onset wheeze). The red line indicates the genome-wide significance threshold (p < 5 × 10 −8 ), while the blue line indicates the threshold for genetic variants that showed a suggestive significant association (p < 10 −5 ).
The online version of this article includes the following figure supplement(s) for figure 1:   Figure 2. Scatter plots illustrating the heterogeneity in the genetic profile of the wheezing phenotypes. Top plots compare phenotype-specific beta effects for persistent and early-onset mid-childhood remitting wheezing. Shared nominal beta effects only found when relaxing p<10 -4 for early-onset mid-childhood remitting wheezing. Bottom plots compare phenotype-specific beta effects for persistent and early-onset pre-school remitting. No shared beta effects (same or opposite direction) were found at p<10 -5 for any of the comparisons. Abbreviations used: pw = persistent, er = early-onset pre-school remitting, and pe = early-onset mid-childhood remitting.
The online version of this article includes the following figure supplement(s) for figure 2: To help identify functional elements located near the GWAS-associated variants (potential causal variants), we used locus zoom plots (LZPs) for the 134 independent SNPs (p<10 -5 ). Following close inspection of all plots, we short-listed 85 independent SNPs (Appendix 2-table 1) for which the LZPs potentially indicated more than one causal variant (Appendix 2-figures 1-4) and followed them up for further annotation. The results of GWAS meta-analysis for these 85 SNPs with main associations across the four wheeze phenotypes are presented in Table 1. Previously associated traits for each region/gene associated with the different wheeze phenotypes are shown in Appendix 4tables 1-4 and results are summarised in Appendix 4: Results in context of literature. Briefly, one region (6q27) among the top hits for early-onset pre-school remitting wheeze was previously associated with asthma, but in the context of obesity with a nominal association with asthma and BMI (Melén et al., 2010). Another region/gene (3q26.31/NAALADL2) identified as top hit for early-onset pre-school remitting wheeze was reported as an associate of severe asthma exacerbations, but only at nominal level (Herrera-Luis et al., 2021). No regions/genes identified as top hits for early-onset midchildhood remitting wheeze were found to have previous associations with asthma. Several genes/loci identified as top hits for late-onset wheeze were previously associated with asthma: ACOXL chr2q13 (later onset asthma and obesity; Zhu et al., 2020), PRKAA2 chr1p32.2 (lymphocyte count and asthma susceptibility; Cusanovich et al., 2012), CD200 3q13.2 (adult onset non-allergic asthma; Siroux et al., 2014), GIMAP family 7q36.1 (autoimmune diabetes, asthma, allergy;Heinonen et al., 2015), 9p22.3 (asthma in <16 years of age; Denham et al., 2008), and 16p12.1 (asthma and rhino-conjunctivitis at 10-15 years; Sottile et al., 2019).
We identified two GWAS-significant loci for early-onset persistent wheeze: 17q21, p<5.5 × 10 -9 , and a novel locus on 9q21.13 (ANXA1), p<6.7 × 10 -9 . The ANXA1 locus was the only GWAS-significant locus that had not previously been associated with asthma or atopic traits, with one previous study showing an association with FEV 1 /FVC and bronchodilator response in smokers (Lutz et al., 2015). ANXA1 is strongly expressed in bronchial mast cells and has anti-inflammatory properties (Vieira Braga et al., 2019), and may be involved in epithelial airway repair (Leoni et al., 2015;Appendix 4table 1). We therefore followed up top SNPs from this locus.

ANXA1 locus and persistent wheeze
Two SNPs (rs75260654, the lead SNP, and rs116849664 located downstream of ANXA1) were associated with early-onset persistent wheeze at genome-wide significance (GWS), with an additional SNP rs78320984 almost reaching GWS (Appendix 5-table 1). These SNPs are in linkage disequilibrium (LD) with each other (Appendix 5-figure 1), but not with any other SNPs.

Promoter Capture identifies rs75260654 as the most likely causative variant
To identify the most likely causative variant, we investigated the overlap of the SNPs with PCHi-C interactions involving the ANXA1 promoter in CD4+ cells in MAAS cohort subjects. Of the three SNPs, only rs75260654 overlapped a region interacting with the ANXA1 promoter ( Figure 3). Moreover, rs75260654 overlapped a POLR2A ChIP-seq peaks and an ATAC-seq peak and active enhancer in the type II pneumocyte-derived A549 cell line. This shows that rs75260654 is located in a region directly interacting with the ANXA1 promoter and is transcriptionally active in relevant cell types.
Allele frequencies of rs75260654 (MAF = 0.02) across wheeze phenotypes are shown in Appendix 5-table 2. Two individuals (one in MAAS and one in ALSPAC) were homozygote for the minor allele (T), and both were in the early-onset persistent wheeze class. One subject reported current wheeze and asthma through childhood, with hospitalisations for lower respiratory tract infection in the first year of life confirmed in healthcare records. The second individual reported current wheezing at 1.5, 2.5, and 8-9 years and doctor-diagnosed asthma and the use of asthma medication at 8-9 years.

Rs75260654: effect on genomic features
Variant Effect Predictor (VEP) prediction shows the SNP rs75260654 (C changed to T) to be located downstream of three protein-coding transcripts of AXNA1 and overlapping the known regulatory region ID ENSR00000882742 on Chromosome 9: 73,173,173,200. This region is active in the GI tract, M2 macrophages, neural progenitor cells, and trophoblasts, but is repressed in T lymphocytes including CD4+ CD25+, Treg, and CD8+ cells.   Rs75260654: effect on gene expression The effect of rs75260654 on the expression of nearby genes was investigated by browsing the eQTL GTEX data available in Ensembl. Compared to C, the T allele was found to reduce the expression of ANXA1 in naïve B cells (effect size = −2.36795, p=0.01) and to increase expression in lymphoblasoid cell lines (effect size = 0.848856, pe = 0.001) ( Figure 4). This SNP affects expression of the neighbouring gene ALDH1A1 (aldehyde dehydrogenase-1 family member A1) (effect size = −2.40446, p=0.0039 in macrophages infected with Salmonella). In the eQTL catalogue, rs75260654 is identified Obesity and metabolic dysfunction eQTL: identified in expression analyses of whole blood and/or lung tissues using Genotype-Tissue Expression database (https://gtexportal.org) using the European reference panel.
Bold p-values are genome-wide significant (p < 5 × 10 −8 ). * Minimum p-value across associations with the other three wheezing phenotypes, using the never/infrequent wheeze as the baseline phenotype. † List of references or sources (GeneCards, GWAS Catalog, PhenoScanner) available in Appendix 5-tables 1-4.  as an eQTL of ANXA1 in various immune cells (at nominal significance) including T cells, monocytes, fibroblasts, whole blood, Th2 memory cells, naïve B cells. rs75260654 is also an eQTL of ANXA1 in monocytes that were stimulated with R848 (agonist of TLRs 7 and 8) and a human seasonal influenza A virus (Quach et al., 2016) (at nominal significance) (Appendix 5-table 3). In the lung rs116849664 and rs78320984 (both in LD with rs75260654) were eQTLs of ANXA1 (Appendix 5-table 4) as well as LINC01474 at nominal significance levels. Additional supporting evidence regarding the significance of the T-allele on the expression of these genes was provided using eQTLGene Consortium meta-analysis of 24 cohorts and 24,331 samples (Võsa et al., 2018). This method reproduced the previous modest results showing a cis-eQTL effect of rs75260654 on both the ANXA1 (p=6.02 × 10 -23 ) and ALDH1A1 (p=1.11 × 10 -19 ) at FDR = 0. No significant trans-eQTLs were observed.

Potential biological function of ANXA1 in asthma
Protein-protein network analysis demonstrated that ANXA1 interacts directly with genes enriched for asthma (including IL4 and IL13) and inflammatory regulation (NR3C1, glucocorticoid receptor) showing its significance in dysregulation of the immune response (see Appendix 5-figure 2 and Appendix 5-table 5). Functional studies of anxa1 in a murine model Pulmonary expression of anxa1 is modulated by aeroallergen exposure We first analysed expression of anxa1 using a model of HDM-induced allergic airway disease ( Figure 5A; Gregory et al., 2009). Consistently, immunohistochemistry analysis revealed anxa1 protein expression increased following HDM challenge ( Figure 5B and C). Anxa1 mRNA was significantly induced in lung tissue following HDM exposure ( Figure 5D). This increase suggests that the pro-resolving anxa1 may play a role in regulating the pulmonary immune response to allergen.

Anxa1 suppresses allergen-induced airway hyperresponsiveness and type 2 inflammation
To confirm a functional role for anxa1 in allergic airway disease, we exposed anxa1 -/mice to intranasal HDM. Wildtype (WT) mice given HDM over 3 weeks developed significant airway hyperresponsiveness (AHR) compared to PBS control mice. Mice deficient in anxa1 had significantly worse lung function (greater airway resistance) compared to WT-treated mice ( Figure 5E). Anxa1 -/mice exhibited significantly increased airway eosinophilia and elevated numbers of Th2 lymphocytes ( Figure 5F and G). Lung tissue cytokine levels reflected the exacerbated airway Th2 inflammation, with elevation in IL-4, and significant induction of IL-5 and IL-13 ( Figure 5H and J). Thus, anxa1 deficiency results in an alteration of the pulmonary immune response, with uncontrolled eosinophilia and an exacerbation of type 2 inflammation and AHR in response to allergen. More details in Appendix 6: Functional mouse experiments.

Discussion
Herein, we present a comprehensive description of the genetic architecture of childhood wheezing disorders. Using a novel approach applied to a unique dataset from five UK birth cohorts, we identified subsets of SNPs differentially associated across four wheezing phenotypes: early-onset persistent (44 SNPs, 19 loci), early-onset pre-school remitting (25 SNPs, 10 loci), early-onset mid-childhood remitting (33 SNPs, 9 loci), and late-onset (32 SNPs, 20 loci). We found little evidence of genetic associations spanning across different phenotypes. This suggests that genetic architecture of different wheeze phenotypes comprises a limited number of variants likely underpinning mechanisms which are shared across phenotypes, but that each phenotype is also characterised by unique phenotypespecific genetic associations. Importantly, we identified a novel locus in chr9q21 nearby ANXA1 exclusively associated with early-onset persistent wheeze (p<6.7 × 10 -9 ). To identify the most likely causative variant, we investigated the overlap of the associated SNPs with PCHi-C interactions to demonstrate that SNP rs75260654 overlapped a region interacting with the ANXA1 promoter. Using eQTL data, we identified that the risk allele (T) of rs75260654 associated with early-onset persistent wheeze is also associated with ANXA1 expression. Further investigation of the biological function of ANXA1 revealed that it interacts with genes enriched for asthma (including IL4 and IL13) and inflammatory regulation (NR3C1, glucocorticoid receptor). In functional mouse experiments, anxa1 protein expression increased and anxa1 mRNA was significantly induced in lung tissue following HDM exposure, suggesting that the pro-resolving anxa1 may play a role in regulating the pulmonary immune response to allergen. Concurrently, by utilising anxa1 -/deficient mice we demonstrated that loss of anxa1 results in heightened AHR and Th2 inflammation upon allergen challenge, providing important in vivo functional data to support our GWAS finding. ANXA1 is a 37 kDa glycoprotein with potent anti-inflammatory and pro-resolving properties that are mediated by interaction with a specific G protein-coupled receptor FPR2 (Perretti et al., 2002). This axis represents an important resolution pathway in chronic inflammatory settings such as those of rheumatoid arthritis (D'Acquisto et al., 2008) and ulcerative colitis (Vong et al., 2012). ANXA1 belongs to the annexin family of Ca 2+ -dependent phospholipid-binding proteins, and through inhibition of phospholipase A2, it reduces eicosanoid production, which also contributes to its antiinflammatory activities. Modulation of M2 macrophage phenotype is also promoted by ANXA1 to attenuate tissue inflammation (McArthur et al., 2020). Corticosteroids (a mainstay of asthma treatment) increase the synthesis of ANXA1 (Rhen and Cidlowski, 2005). Plasma ANXA1 levels are significantly lower in asthmatic patients with frequent exacerbations compared to those with stable disease, suggesting a link between this mediator and disease state (Lee et al., 2018). Furthermore, children with wheeze have reduced airway levels of ANXA1 ( Eke Gungor et al., 2014).
Previous functional studies using anxa1 -/deficient mice challenged with ovalbumin showed anxa1deficient mice to have elevated AHR compared to WT mice (Ng et al., 2011). Ng et al. demonstrated that untreated anxa1-deficient mice have spontaneous AHR that predisposes them to exacerbated response to allergen (Ng et al., 2011). In the current study, we demonstrated in the murine lung the induction of Anxa1 in response to HDM exposure. In addition, genetic deletion of anxa1 potentiated the development of AHR and enhanced eosinophilia and markers of Th2 inflammation in mice treated with HDM, which is consistent with and extends previous reports. Of interest, in mice, anxa1 expression was recently found to be characteristic of a novel cell type called the Hillock cell, which may be involved in squamous barrier function and immunomodulation (Montoro et al., 2018). These data identify the ANXA1/FPR2 signalling axis as an important regulator of allergic disease, that could be manipulated for therapeutic benefit.
Our study has several limitations. By GWAS standards, our study is comparatively small and may be considered to be underpowered. The sample size may be an issue when using an aggregated definition (such as 'doctor-diagnosed asthma') but is less likely to be an issue when primary outcome is determined by deep phenotyping. This is indirectly confirmed in our analyses. Our primary outcome was derived through careful phenotyping over a period of more than two decades in five independent birth cohorts, and although comparatively smaller than some asthma GWASs, our study proved to be powered enough to detect previously identified key associations (e.g., chr17q21 locus). Precise phenotyping has the potential to identify new risk loci. For example, a comparatively small GWAS (1173 cases and 2522 controls) which used a specific subtype of early-onset childhood asthma with recurrent severe exacerbations as an outcome identified a functional variant in a novel susceptibility gene CDHR3 (SNP rs6967330) as an associate of this disease subtype, but not of doctor-diagnosed asthma (Bønnelykke et al., 2014). This important discovery was made with a considerably smaller sample size but using a more precise asthma subtype. In contrast, the largest asthma GWAS to date had an ~40-fold higher sample size (Demenais et al., 2018), but reported no significant association between CDHR3 and aggregated asthma diagnosis. Therefore, with careful phenotyping, smaller sample sizes may be adequately powered to identify larger effect sizes than those in large GWASs with broader outcome definitions (Schoettler et al., 2019).
The importance of the precise outcome definition was highlighted in our previous studies in ALSPAC which explored genetic associates of wheeze phenotypes derived by LCA (Granell et al., 2013;Spycher et al., 2012). Our current findings are consistent with our earlier report suggesting that 17q21 SNPs are associated with early-onset persistent, but not with early transient or late-onset wheeze (Granell et al., 2013). Further analysis using genetic prediction scores based on 10-200,000 SNPs ranked according to their associations with physician-diagnosed asthma found that the 46 highest ranked SNPs predicted persistent wheeze more strongly than doctor-diagnosed asthma (Spycher et al., 2012). Finally, a candidate gene study combining data from ALSPAC and PIAMA found different associations of IL33-IL1RL1 pathway polymorphisms with different phenotypes (Savenije et al., 2014).
We are cognisant that there may be a perception of the lack of replication of our GWAS findings. We would argue that direct replication is almost certainly not possible in other cohorts, as phenotypes for replication studies should be homogenous (Crawford et al., 2015). However, there is a considerable heterogeneity in LCA-derived wheeze phenotypes between studies, and although phenotypes in different studies are usually designated with the same names, they differ between studies in temporal trajectories, distributions within a population, and associated risk factors (Oksel et al., 2018). This heterogeneity is in part consequent on the number and the non-uniformity of the time points used, and is likely one of the factors responsible for the lack of consistent associations of discovered phenotypes with risk factors reported in previous studies (Oksel et al., 2019b). This will also adversely impact the ability to identify phenotype-specific genetic associates. For example, we have previously shown that less distinct wheeze phenotypes in PIAMA were identified compared to those derived in ALSPAC (Savenije et al., 2011). Thus, phenotypes that are homogeneous to those in our study almost certainly cannot readily be derived in available populations. This is exemplified in our attempted replication of ANXA1 findings in PIAMA cohort (see Appendix 5: Replication of ANXA1 top hits in PIAMA cohort and Appendix 5-table 6). In this analysis, the number of individuals assigned to persistent wheezing in PIAMA was small (Võsa et al., 2018), associates of this phenotype differed to those in STELAR cohorts, and the SNPs' imputation scores were low (<0.60), which meant the conditions for replication were not met.
Our study population is of European descent, and we cannot generalise the results to different ethnicities or environments. It is important to highlight the under-representation of ethnically diverse populations in most GWASs (Kim and Ober, 2019). To mitigate against this, large consortia have been formed, which combine the results of multiple ethnically diverse GWASs to increase the overall power to identify asthma susceptibility loci. Examples include the GABRIEL , EVE (Torgerson et al., 2011), and TAGC (Demenais et al., 2018) consortia, and the value of diverse, multiethnic participants in large-scale genomic studies has recently been shown (Wojcik et al., 2019). However, such consortia do not have the depth of longitudinal data to allow the type of analyses which we carried out to derive a multivariable primary outcome. Finally, the manual and visual inspection of LZPs for the refinement of association signals and identification of functional elements was an objective approach which might have undermined the findings. One strength of our approach is that we used data from five birth cohorts with detailed and lifelong phenotyping, which were harmonised in a common knowledge management platform (Custovic et al., 2015), allowing joint analyses. We performed three parallel GWASs that produced estimates with remarkably consistent directions of effects.
In conclusion, using unique data from five UK birth cohorts, we identified subsets of SNPs differentially associated across four wheezing phenotypes from infancy to adolescence. We found little evidence of genetic associations spanning across different phenotypes. We discovered a novel locus in chr9q21 uniquely associated with early-onset persistent wheeze (p<6.7 × 10 -9 ), identified SNP rs75260654 as the most likely causative variant, and demonstrated that the risk allele (T) confers a reduction in ANXA1 expression. In mouse experiments, ANXA1 expression increased in lung tissue following allergen exposure, suggesting that the pro-resolving ANXA1 may play a role in regulating the pulmonary immune response to allergen. Using ANXA1-deficient mice, we demonstrated that loss of ANXA1 results in heightened AHR and Th2 inflammation upon allergen challenge, providing important in vivo functional data to support our GWAS finding. Targeting these pathways to promote the clearance of chronic inflammation in persistent disease may represent an exciting therapeutic prospect. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.

Data availability
The informed consent obtained from all included participants does not allow the data to be made freely available through any third party maintained public repository. However, data used for this submission can be made available on request to the corresponding cohort Executive. Researchers will need to submit a research proposal to each cohort Executive Committee. Data access will have a cost, for more details re. ALSPAC contact alspac-data@ bristol. ac. uk, for any other cohort contact philip. couch@ manchester. ac. uk. The ALSPAC website provides information on how to request and access its data (http://www.bristol.ac.uk/alspac/researchers/access/). For queries regarding access of data from MAAS, IoW, SEATON or Ashford please contact Philip Couch ( philip. couch@ manchester. ac. uk). All code used to analyse the individual level data and all summary data and code used to plot the figures in our manuscript has been deposited in Dryad.
The following dataset was generated: Appendix 1

Description of cohorts
The STELAR consortium (Custovic et al., 2015) brings together five UK population-based birth cohorts as described below. Informed consent was obtained from parents, and study subjects gave their assent/consent when applicable. Data were harmonised and imported into Asthma eLab webbased knowledge management platform to facilitate joint analyses (https://www.asthmaelab.org) (Custovic et al., 2015).

ALSPAC
ALSPAC is a birth cohort study established in 1991 in Avon, UK (Boyd et al., 2013;Fraser et al., 2013). Pregnant women with expected dates of delivery 1 April 1991 to 31 December 1992 were invited to take part in the study. The initial number of pregnancies enrolled is 14,541. Of these initial pregnancies, there was a total of 14,676 foetuses, resulting in 14,062 live births and 13,988 children who were alive at 1 year of age. When the oldest children were approximately 7 years of age, an attempt was made to bolster the study with eligible cases who had failed to join originally. As a result, when considering variables collected from the age of 7 onwards (and potentially abstracted from obstetric notes) there are data available for more than the 14,541 pregnancies mentioned above. The number of new pregnancies not in the initial sample (known as Phase I enrolment) that are currently represented on the built files and reflecting enrolment status at the age of 24 is 913 (456, 262, and 195 recruited during Phases II, III, and IV, respectively), resulting in an additional 913 children being enrolled. The phases of enrolment are described in more detail in the cohort profile paper and its update. The total sample size for analyses using any data collected after the age of 7 is therefore 15,454 pregnancies, resulting in 15,589 foetuses. Of these 14,901 were alive at 1 year of age.
Ethical approval: Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. All self-completion questionnaire content is approved by the ALSPAC Ethics and Law Committee. Ethics protocols' numbers: Initial approval Bristol and Weston Health Authority: E1808 Children of the Nineties: Avon Longitudinal Study of Pregnancy and Childhood (ALSPAC) (28 November 1989). Southmead Health Authority: 49/89 Children of the Nineties -'ALSPAC' (5 April 1990). Frenchay Health Authority: 90/8 Children of the Nineties (28 June 1990).
Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Data dictionary: The study website contains details of available data through a fully searchable data dictionary: http://www.bristol.ac.uk/alspac/researchers/our-data/.
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses.

MAAS
MAAS is an unselected birth cohort study established in 1995 in Manchester, UK (Custovic et al., 2002). It consists of a mixed urban-rural population within 50 square miles of South Manchester and Cheshire, UK, located within the maternity catchment area of Wythenshawe and Stepping Hill Hospitals. All pregnant women were screened for eligibility at antenatal visits (8-10th week of pregnancy). Of the 1499 couples who met the inclusion criteria (≤10 weeks of pregnancy, maternal age ≥18 years, and questionnaire and skin prick data test available for both parents), 288 declined to take part in the study and 27 were lost to follow-up between recruitment and the birth of a child. A total of 1184 children were born into the study between February 1996 and April 1998. They were followed prospectively for 19 years to date and attended follow-up clinics for assessments, which included lung function measurements, skin prick testing, biological samples (serum, plasma, and urine), and questionnaire data collection. The study was approved by the North West -Greater Manchester East Research Ethics Committee. Ethics protocols' numbers: ERP/94/032 Up to 5 years. Allergen avoidance, primary prevention, genetics, sRaw age 3 and 5; SOU/00/259 5 years; ERP/95/137 Exposure to pet allergens, atopy, genetics; ERP/97/023 IFWIN, genetics; 03/SM/400 8 years; 06/Q1403/142 10-12 years; 11/NW/0228 13-15 years; 14/NW/1309 18+ years.

SEATON
SEATON is an unselected birth cohort study established in 1997 in Aberdeen, UK, which was designed to explore the relationship between antenatal dietary exposures and asthma outcomes in childhood (Martindale et al., 2005). Two-thousand healthy pregnant women attending an antenatal clinic, at median 12 weeks gestation, were recruited. An interviewer administered a questionnaire to the women and atopic status was ascertained by skin prick test (SPT). The cohort included 1924 children born between April 1998 and December 1999. Participants were recruited prenatally and followed up by self-completion questionnaire to 15 years of age using postal questionnaires to record the presence of asthma and allergic diseases. Lung function measurements and SPT to common allergens were performed at 5, 10, and 15 years. The study was approved by the North of Scotland Research Ethics Committee. Ethics protocol REC reference: 13/NS/0108; protocol number: 2/048/13; amendment number: AM03.

Ashford
The Ashford study is an unselected birth cohort study established in 1991 in Ashford, UK (Atkinson et al., 1999). It included 642 children born between 1992 and 1993. Participants were recruited prenatally and followed to age 14 years. Detailed standardised questionnaires were administered at each follow-up to collect information on the natural history of asthma and other allergic diseases. Lung function measurements and SPT was carried out at 5, 8, and 14 years of age. In 2015, the study children aged 20 were sent a self-completion questionnaire, which was returned by 60% of the participants. The Asthma in Ashford study was reviewed by the Imperial College Research Ethics Committee on 11 November 2014. On 8 January 2015 the Joint Research Compliance Office granted full approval of the study on the basis described in the revised documents. ICREC reference: 14|C2288.

The IOW cohort
IOW is an unselected birth cohort study established in 1989 on the IOW, UK (Arshad et al., 2018;Kurukulaaratchy et al., 2002;Kurukulaaratchy et al., 2003). After the exclusion of adoptions, perinatal deaths, and refusal for follow-up, written informed consent was obtained from parents to enrol 1456 newborns born between 1 January 1989 and 28 February 1990. Follow-up-up assessments were conducted to 26 years of age to prospectively study the development of asthma and allergic diseases. At each follow-up, validated questionnaires were completed by the parents. Additionally, the SPT was performed on 980, 1036, and 853 participants at 4, 10, and 18 years of age to check allergic reactions to common allergens. At 10, 18, and 26 years, spirometry and methacholine challenge tests were performed to diagnose lung problems. Ethics protocols' numbers: Ethics approval for the IoW cohort was originally given by the Isle of Wight Local Research Ethics Committee (now named the National Research Ethics Service, NRES Committee South Central -Southampton B) in 1989 and at each subsequent follow-up (1, 2, and 4 years) (this is pre 'numbers'); age 10 follow-up (

Definition of variables
A list of all variables used in the current study, per cohort, is shown in Appendix 1-table 1.

Demographic, exposures and outcomes
Postal questionnaires were used in ALSPAC and SEATON, while interviewer-administered questionnaires were employed in other cohorts.
Parental history of asthma, eczema, and hay fever was defined based on the responses given to the question 'have you (and/or your partner) ever had asthma/eczema/hay fever'. Maternal and paternal smoking were defined based on the response given to the question 'do you (or does your partner) smoke', administered during pregnancy. Low birth weight was defined as birth weight less than 2500 g based on NHS birth records.
Asthma in MAAS was defined as a case if positive for two of the following criteria: doctor diagnosis of asthma in the past 12 months, current wheeze in the last 12 months, doctor prescription for asthma. Asthma in ALSPAC was defined as a mothers' report of doctor ever diagnosis of asthma. Current wheeze in MAAS was defined as a questionnaire report to the question 'have you wheezed in the last 12 months' upon attendance at a follow-up clinic. Current wheeze in ALSPAC was defined as a mothers' report to the question 'has your child had any wheezing or whistling in the last 12 months?'.
Asthma medication in ALSPAC was defined as a mothers' report to the question 'has your child taken any asthma medication in the last 12 months?'. Lower respiratory hospital admissions: Data on hospital admissions in MAAS were obtained by manually inspecting the general practice (GP) records for each individual.
Early-life risk factors were divided into four groups according to timing of exposure: maternal and child characteristics (gender, maternal smoking during pregnancy, and maternal history of asthma), perinatal (low birth weight adjusted for gestational age), environmental (pet ownership, smoke exposure after birth), and allergic sensitisation (defined based on positive SPT to cat, HDM, or grass) variables.

Primary outcome: joint wheeze phenotypes
We used LCA to identify longitudinal trajectories of wheeze (Oksel et al., 2019a) based on pooled analysis among 15,941 children with at least two observations on wheezing at five time periods that were approximately shared across all cohorts: infancy (½-1 year); early childhood (2-3 years); preschool/early school age (4-5 years); middle childhood (8-10 years); and adolescence (14-18 years). Cohort-specific definitions and other variables derived from the questionnaires are provided in Appendix 1-table 2.
To control for cohort-specific variation, cohort ID was included in the LCA model as an additional predictor by transforming the five-category variable into a set of four dummy variables and including them as covariates. The largest cohort, ALSPAC, was treated as the non-coded category to which all other cohorts were compared. The expectation maximisation algorithm was used to estimate relevant parameters, with 100,000 iterations and 500 replications.
To assess model fit, we used (1) the Bayesian information criterion (BIC), (2) the Akaike information criterion (AIC), (3) Lo-Mendell-Rubin likelihood ratio test, (4) bootstrapped likelihood ratio, and (4) quality of classification certainty (model entropy). The BIC is an index used in Bayesian statistics to choose among a set of competing models; the model with the lowest BIC is preferred. Using the lowest BIC as a selection criterion, the best fitting model was chosen as the five-class solution with a nominal covariate (BIC:31340). Analyses were carried out using Mplus 8, R (https://www.r-project. org/) and Stata 14 (StataCorp, College Station, TX, USA).

Minimising bias and missing data effects
Extracted from reference (Oksel et al., 2019a): "One of the advantages of our multicohort approach is that individual studies that might not provide conclusive evidence to make inference about the general population because of cohort specific effects and biases can contribute to revealing a more accurate picture when integrated together. The integration of five cohorts and their pooled analysis enhanced the credibility and generalizability of the phenotyping results to the U.K. population. A further advantage is to minimize the study-specific biases (including cohort specific effects, attrition effects, different recruitment strategies, and geographic factors) affecting the certainty of allocation of individuals to each latent class, while maximizing the benefits of individual cohort studies (e.g., potentially important risk factors and outcomes are captured in some, but not all cohorts)." "Another strength of pooling cohort data is that a multicohort design allowed us to analyze a large sample with complete data on wheeze from birth to adolescence, thus increasing statistical power to detect less prevalent phenotypes." However, "The optimal solution in the model using 15,941 children (allowing for missing data) remained five classes (see Table E3, Figure E1), and was very similar to that derived from a complete data set." We used results from the larger sample, that is individuals with at least two observations of wheezing, to assign individuals to their most likely wheezing phenotype and used this as our primary outcome in this study.

Included vs. excluded participants
Related and non-European individuals were excluded as well as those individuals with missing genetic data.
In ALSPAC, 11,176 individuals had data on wheezing phenotypes, of these 6833 were white unrelated and had genetic data. We found more children from mothers who smoked during pregnancy in the excluded sample compared to the included sample; no difference in gender, maternal history of asthma, current wheezing at 8 or 15 years, and small evidence for more asthma ever and current medication at 8 years in the excluded sample (Appendix 1-table 4).
In MAAS, 1150 individuals had data on wheezing phenotypes, of these 887 were white unrelated and had genetic data. We found no difference in children from mothers who smoked during pregnancy in the excluded sample compared to the included sample; no difference in gender, maternal history of asthma or current wheezing at both 8 and 16 years. There was small evidence for more asthma ever and current medication at 8 years in the excluded sample (Appendix 1-table 4).
In SEATON, 1535 individuals had data on joint wheezing phenotypes, of these 548 were white unrelated and had genetic data. We found evidence for more children from mothers who smoked during pregnancy in the excluded sample compared to the included sample; and more males in the excluded sample. There was no difference in maternal history of asthma or current wheezing, asthma ever or current medication at both 10 and 15 years in the excluded sample compared to the included sample (Appendix 1-table 4).
In Ashford 620 individuals had data on joint wheezing phenotypes, of these 348 were white unrelated and had genetic data. We found evidence for more children from mothers who smoked during pregnancy in the excluded sample compared to the included sample; no difference in gender, maternal history of asthma, or asthma ever. There was small evidence for less current wheezing at 8 years, or current medication at 8 years in the excluded sample compared to the included sample (Appendix 1-table 4).
In IOW, 1460 individuals had data on joint wheezing phenotypes, of these 952 were white unrelated and had genetic data. We found evidence for more children from mothers who smoked during pregnancy in the excluded sample compared to the included sample; no difference in gender, maternal history of asthma, asthma ever at 10 and 18 years in the excluded sample compared to the included sample. There was small evidence for more children with current wheeze and medication at 8 years in the included sample compared to the included sample (Appendix 1-table 4).
Appendix 1-table 1. Definition of variables in each of the five Study Team for Early Life Asthma Research (STELAR) birth cohorts.

Mother -asthma
Have you ever had asthma? (recruitment)

Mother smoking
Mother smoked when expecting (recruitment)

Doctor-diagnosed asthma ever
Has a doctor ever said that your child has asthma? (years 8 and 14)

Current wheezing
Two questions combined: Occurrence of wheezing and/or wheezing with whistling on the chest in the last 12 months (year ½, 2½, 4¾, 8½, and 14) Current asthma medication Asthma medication in the last 12 months (years 8½ and 14)

Current rhinitis
Child had problem with sneezing/runny nose without cold/flu in last 12 months (years 7 and 16½)

Current hay fever
Child had hay fever in last 12 months (years 10½ and 14) Cohort: MAAS

Variable Definition
Mother -asthma Has a doctor ever told you that you had asthma? (recruitment) Mother smoking Do you smoke − mother (recruitment)

Doctor-diagnosed asthma ever
Has your doctor ever told you that your child has or had asthma? (years 8 and 16)

Asthma ever
Has your child ever suffered from asthma (years 8 and 16)

Current wheezing
Has your child had wheezing or whistling in the chest in the last 6/12 months (years 1, 3, 5, 8, and 16)

Current asthma medication
Asthma medication in the last 12 months (years 8 and 16)

Current rhinitis
Has your child ever had a problem with sneezing, or a runny nose, or a blocked nose when he /she did not have a cold or the flu? (years 8 and 16) Current hay fever Does your child have hay fever now? (years 8 and 16)

Cohort: SEATON
Mother -asthma Do you suffer from asthma? (recruitment)

Mother smoking
Which of the following best describes your smoking status? (recruitment)

Doctor-diagnosed asthma ever
Has your child ever suffered from asthma? If yes, has this been confirmed by a doctor? (years 10 and 15)

Asthma ever
Has your child ever suffered from asthma? (year 10); Have you ever suffered from asthma? (year 15)

Current wheezing
Has your child had wheezing in the chest in the last 12 months (years 1, 2, 5, 10, and 15)

Current asthma medication
Has your child been prescribed medicines/inhalers for asthma in the last 12 months? (year 10); Have you been prescribed medicines/inhalers for asthma in the last 12 months? (year 15) Current hay fever Has your child suffered from hay fever last 12 months? (years 10 and 15)

Cohort: Ashford
Mother -asthma Do you have or have you ever been told you have asthma? (recruitment)

Mother smoking
Do you smoke cigarettes? (recruitment)

Doctor-diagnosed asthma ever
Has your doctor ever told you that your child has or had asthma? (years 8 and 14) Asthma ever In the past 12 months has your daughter suffered from asthma? (year 8); Has she/he ever suffered from asthma? (year 14)

Current asthma medication
Over the last 12 months has your daughter taken any of the following treatments (preventer, reliever, nebuliser, steroids) for asthma? (years 8 and 14)

Current rhinitis
In the last 12 months has your child had a problem with sneezing or a runny or blocked nose? (years 8 and 14) Current hay fever

Asthma medication ever
Child ever had asthma treatment (year 18) combined with asthma treatment questions being asked at years 1, 2, 4, 10, and 18

Current rhinitis
In the past 12 months have you had a problem with sneezing, or a runny or blocked nose when you did not have a cold or the flu? (years 10, 18, and 26) Appendix 1-table 2. The cohort-specific time points and sample size used to ascertain wheeze phenotypes. Appendix 2 Genotyping and imputation ALSPAC Participants were genotyped using the Illumina HumanHap550 quad genome-wide SNP genotyping platform (Illumina Inc, San Diego, CA, USA) by the Wellcome Trust Sanger Institute (WTSI; Cambridge, UK) and the Laboratory Corporation of America (LCA, Burlington, NC, USA), using support from 23andMe. Haplotypes were estimated using ShapeIT (v2.r644) which uses relationship information to improve phasing accuracy. The phased haplotypes were then imputed to the Haplotype Reference Consortium (HRCr1.1, 2016) panel (Loh et al., 2016) of approximately 31,000 phased whole genomes. The HRC panel was phased using ShapeIt v2, and the imputation was performed using the Michigan imputation server.

MAAS
In MAAS, we used the Illumina 610 quad genome-wide SNP genotyping platform (Illumina Inc, San Diego, CA, USA). Prior to imputation samples were excluded on the basis of gender mismatches; minimal or excessive heterozygosity, genotyping call rates of <97%. SNPS were excluded if they had call rates of <95%, minor allele frequencies of <0.5%, and HWE p<3 × 10 -8 . Prior to imputation each chromosome was pre-phased using EAGLE2 (v2.0.5) (Loh et al., 2016) as recommended by the Sanger imputation server (McCarthy et al., 2016). We then imputed with PBWT (Durbin, 2014) with the Haplotype Reference Consortium (release 1.1) of 32,470 reference genomes (McCarthy et al., 2016) using the Sanger Imputation Server.

Exclusions
Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%), insufficient sample replication (IBD <0.8), or evidence of cryptic relatedness (IBD >0.1). Following imputation, SNPs with a minor allele frequency of <1%, a call rate of <95%, evidence for violations of Hardy-Weinberg equilibrium (p<5E-7), or imputation quality measure (MaCH-Rsq or IMPUTE-info score)<0.40 were excluded. All individuals with non-European ancestry and siblings were removed.

GWAS meta-analysis
GWASs of the joint wheezing phenotypes were performed independently in ALSPAC, MAAS, and the combined IOW-SEATON-Ashford (combined as they were genotyped on the same platform, at the same time, and quality-controlled and imputed together). All genetic data were imputed to a new Haplotype Reference Consortium panel. This comprises around 31,000 sequenced individuals (mostly European), so the coverage of European haplotypes is much greater than in other panels. As a consequence, we expect to improve imputation accuracy, particularly at lower frequencies.
We used SNPTEST v2.5.2 (Marchini and Howie, 2010) with a frequentist additive multinomial regression model (-method newml, never/infrequent wheeze as the reference) to investigate the association between SNPs and wheezing phenotypes. No covariates were included in the model and only individuals of European descent were included in this analysis. A meta-analysis of the three GWASs, including 5887 controls and 943 cases for early-onset persistent, 1482 cases for early-onset remitting, 603 cases for mid-childhood onset remitting, and 652 cases for late-onset wheeze, was performed using METAL  with a total of 8,057,852 SNPs present. We used the option SCHEME STDERR in METAL to implement an effect size-based method weighted by each study-specific standard error in a fixed-effects model. We performed clumping to keep only one representative SNP per LD block and used LZPs to short-list independent SNPs for further annotation.