Understanding the genetic complexity of puberty timing across the allele frequency spectrum

Pubertal timing varies considerably and is associated with later health outcomes. We performed multi-ancestry genetic analyses on ~800,000 women, identifying 1,080 signals for age at menarche. Collectively, these explained 11% of trait variance in an independent sample. Women at the top and bottom 1% of polygenic risk exhibited ~11 and ~14-fold higher risks of delayed and precocious puberty, respectively. We identified several genes harboring rare loss-of-function variants in ~200,000 women, including variants in ZNF483, which abolished the impact of polygenic risk. Variant-to-gene mapping approaches and mouse gonadotropin-releasing hormone neuron RNA sequencing implicated 665 genes, including an uncharacterized G-protein-coupled receptor, GPR83, which amplified the signaling of MC3R, a key nutritional sensor. Shared signals with menopause timing at genes involved in DNA damage response suggest that the ovarian reserve might signal centrally to trigger puberty. We also highlight body size-dependent and independent mechanisms that potentially link reproductive timing to later life disease.

We were able to confirm four of these six genes (KDM4C, MC3R, TACR3 and ZNF483) using voice-breaking data in 178,625 men with exome sequence data in the UK Biobank (P < 0.05; Fig. 2 and Supplementary Table 5).Lack of association with AVB at MKRN3 is consistent with previous reports that rare MKRN3 mutations have a greater clinical impact in girls than boys 21,26 .None of the genes showed an association with childhood or adult adiposity (Supplementary Table 6).
In addition, we specifically examined rare variant associations with AAM or VB for ANOS1, CHD7, FGF8 and WDR11, which are clinically tested in hypogonadotropic hypogonadism ('high-evidence genes' on the Genomics England IHH panel 27 ) and show a dominant or X-linked mode of inheritance (Supplementary Table 7).Normal puberty timing (AAM: 10-15 years 1 or VB: 'about average') was reported by all carriers of PTVs in ANOS1 (n = 5 male) and CHD7 (n = 5 female and n = 1 male).PTVs in WDR11 showed no association with delayed puberty, with only 7/81 female and 5/68 male carriers reporting delayed puberty.Female carriers of PTVs in FGF8 showed some evidence of later puberty (β = 1.4 years, P = 3.6 × 10 −3 , n = 5/10 reported delayed puberty) but with no effect in males (n = 1/8 reported delayed puberty; Supplementary Fig. 5).These observations highlight the lower penetrance of rare deleterious variants in large population-based studies compared to patient cohorts [28][29][30] .

Common variation influences the risk of phenotypic extremes
Rare pathogenic variants, such as the ones mentioned above, are described to cause disorders of puberty.However, it remains unclear whether common genetic variants also contribute to abnormal puberty timing.To assess this, we generated a polygenic score (PGS) of AAM in a penalized regression framework using lassosum 31 and data from our meta-analysis of European-ancestry cohorts but excluding the UK Biobank.This PGS explained ~12% of the phenotypic variance in UK Biobank.The PGS was informative in individuals experiencing menarche as early as 8 years old and later than 20, well beyond the normal AAM range (Supplementary Fig. 6 and Supplementary Tables 8 and 9).
We next sought to understand how the risks of early (<10 years) and delayed (>15 years) AAM were influenced by the PGS.Women in the lowest PGS centile reported AAM at 11.49 (mean, s.e.= 0.03) years, compared to 14.46 (0.04) years in the top 1% (Supplementary Fig. 6 and Supplementary Table 10).Compared to women in the 50th PGS centile, those in the top 1% PGS were 10.7 times more likely to report late AAM (odds ratio (OR) = 8.20-13.96,P = 2.6 × 10 −68 ), while women in the lowest 1% were 14.2 times more likely to report early AAM (OR = 7.13-28.39,P = 5.1 × 10 −14 ).Collectively, these findings suggest that common genetic variants contribute to the risk of rare clinical disorders of extremely early (precocious) and delayed puberty.
To evaluate the predictive performance of our AAM signals, we compared them to phenotypic predictors in 3,140 female children from the Avon Longitudinal Study of Parents and Children (ALSPAC) study.The AAM signals in combination explained more variance in AAM than childhood BMI, parental BMI or mother's AAM (Supplementary Table 11).Furthermore, they had a similar ability to predict extremes of AAM (beyond 2 s.d.) than a multiphenotype predictor (Supplementary Fig. 7), and a combined genotype and phenotype model showed high predictive ability for early (area under the receiver operating characteristic curve (AUROC) = 0.75 (95% confidence interval (CI), 0.68-0.82))and late AAM (AUROC = 0.85 (95% CI, 0.81-0.92)).
We next tested whether carrying rare variants in the AAM ExWAS genes modifies the common polygenic influence on AAM.We saw that the effect of the common variant PGS on AAM was attenuated in the Association Consortium (n = 137,815), (4) 23andMe (n = 76,831) and (5)  three East Asian biobanks-the China Kadoorie Biobank, the Biobank Japan and the Korean Genome and Epidemiology Study (n = 166,890).Studies in strata one to four comprised European-ancestry individuals.All studies provided GWAS data imputed to at least 1000 Genomes reference panel density (Supplementary Table 1), yielding a total of ~12.7 million genetic variants in the final meta-analysis.We did not find evidence of test statistic inflation due to population structure (linkage disequilibrium score regression (LDSC) intercept = 1.07, s.e.= 0.03).
To maximize the discovery of AAM genomic signals, we used a combination of distance-based clumping and approximate conditional analysis (Methods) in the European-strata and ancestry-combined meta-analyses, to identify signals that are homogenous across the two ancestry groups.European-strata identified signals (n = 935) were supplemented with additional signals from the ancestry-combined analysis (n = 145), resulting in a total of 1,080 statistically independent signals for AAM at genome-wide significance (P < 5 × 10 −8 ; Fig. 1 and Supplementary Table 2).Effect sizes ranged from 3.5 months per allele for rarer alleles (minor allele frequency (MAF) = 0.9%) to ~5 days per allele for more common variants (Supplementary Fig. 1).Across the 145 additional signals, we observed a median 1.16-fold increase in χ 2 for their association with AAM in the ancestry-combined analysis compared to European-only, which is proportionate to the added number of East Asian samples (~21% of the total).Independent replication data from the Danish Blood Donor Study (DBDS; n = 35,467; Supplementary Table 3) was available for 969/1,080 signals 16 .Of these, 862 showed directionally concordant associations (89%, P binomial = 2.9 × 10 −147 ).In this independent sample, the variance explained in AAM doubled from 5.6% for 355 available previously reported signals 9 to 11% for the 969 available signals.We also sought indirect confirmation of AAM signals by association with age at voice breaking (AVB) in men from the UK Biobank study (n = 191,235) and 23andMe (n = 55,871; Supplementary Table 4) 14,17,18 .Of the 1,080 AAM signals, 909 (84%, P binomial = 2.6 × 10 −122 ) showed directionally concordant associations with AVB in UK Biobank (including 354 at P < 0.05).Similarly, 852/1,067 (79%, P binomial = 1.8 × 10 −90 ) AAM signals available in 23andMe showed directionally concordant associations with AVB (217 at P < 0.05).In the combined dataset (effective n = 205,354), 893/1,020 (83%, P binomial = 1.18 × 10 −142 ) signals showed directionally concordant effects (397 signals at P < 0.05).
Of the three new genes, KDM4C (β = −0.33 years, P = 2.5 × 10 −7 , n = 582 DMG carriers) encodes a lysine-specific histone demethylase likely involved in epigenetically regulating hypothalamic-pituitarygonadal (HPG) axis genes 22 .In addition, PDE10A (β = 0.58 years,  12).To confirm that this was not a reflection of reduced power due to the low number of carriers, we estimated the expected relationship for noncarriers in 10,000 random subsamples of 49 participants and found that the observed carrier effect was unlikely by chance (P = 0.015; Supplementary Fig. 8).

Implicating AAM genes through variant-to-gene mapping
To implicate putatively causal genes that underlie our 1,080 common variant signals for AAM, we developed the framework 'GWAS to genes' (G2G) that integrates genomic and functional data across six sources (Methods; Fig. 1 and Supplementary Tables 13-16).We identified proximal genes (within 500 kb upstream or downstream) for the 1,080 AAM signals and scored them based on the degree of evidence linking our associated variants to the function of these genes.To achieve this, we implicated genes by identifying signals that colocalized with (1) known enhancers and regulatory elements 34 , (2) nonsynonymous variants, (3) expression quantitative trait loci (eQTL) specifically in tissues enriched for AAM associations (Supplementary Fig. 9 and Supplementary Table 17) and (4) circulating protein QTL (pQTL) from whole blood (Methods).In addition, we integrated gene-level associations for aggregated nonsynonymous common variants using MAGMA 35 and gene scores from polygenic priority score (PoPs) 36 , which uses bulk human and mouse data with information on scRNA, gene pathways and protein interactions to link genes to GWAS signals.Individual genes were further upweighted if they were the nearest gene to a signal 37,38 .Using this approach, our 1,080 signals were found to be proximal to 10,323 genes, of which 665 'high-confidence AAM genes' were identified as the highest-scoring gene at a locus and with at least two lines of evidence (Supplementary Fig. 10 and Supplementary Tables 18 and 19).High-confidence AAM genes include established components of the HPG axis that are disrupted in rare monogenic disorders of puberty (CADM1, CHD4, CHD7, FEZF1, GNRH1, KISS1, SPRY4, TAC3, TACR3 and TYRO3) 39 and other recently reported candidate genes (PLEKHA5, TBX3 and ZNF462) 40,41 .Other AAM genes have recognized roles in sex hormone secretion and gametogenesis (ACVR2A, CYP19A1, HSD17B7, INHBA, INHBB, MC3R and PCSK2) 42 , are disrupted in rare monogenic disorders of multiple pituitary hormone deficiency (OTX2, SOX2, SOX3 and SST) 43 , monogenic obesity (BDNF, LEPR, MC4R, NTRK2, PCSK1 and SH2B1) 44 or syndromes characterized by hypogonadism (Noonan syndrome: BRAF and SOS1; Bardet-Biedl syndrome: BBS4; Prader-Willi/ Angelman syndrome: NDN, SNRPN and UBE3A) [45][46][47][48] .Other mechanisms implicated by high-confidence AAM genes include insulin and insulin-like growth factor (IGF) signaling (CALCR, GHR, IGF1R, INSR, NEUROD1, NSMCE2, PAPPA2 and SOCS2) 49 ; thyroid hormone signaling (THRB) 50 and the polycomb silencing complex (CBX4, CTBP2, FBRSL1, JARID2, PHC2, SCMH1 and TNRC6A) 51 .We also found strong supportive evidence for all genes identified by our ExWAS (Supplementary Fig. 11).21 and 22, Fig. 4 and Supplementary Fig. 12).This data indicate a bidirectional causal relationship between AAM and body size, with greater early weight gain leading to earlier AAM and also earlier AAM leading to higher adult BMI.This approach provides a clear distinction between AAM signals that have primary effects on puberty timing or early weight gain.

Pathway, tissue and cell-type enrichment of AAM genes
Genome-wide common variant AAM associations were enriched for genes expressed in several brain regions, and enrichment was highest in the hypothalamus.Outside the brain, we also observed enrichment for genes expressed in the adrenal gland (Supplementary Fig. 9 and Supplementary Table 17).
We next performed gene-based pathway analyses on the 665 high-confidence AAM genes using g:Profiler 53 and identified 85 enriched pathways (Supplementary Table 23), which were grouped into 30 clusters (Fig. 4, Supplementary Fig. 13 and Supplementary Table 24).These included a number of neuroendocrine, sexual development, protein and transcription regulation pathways.To explore distinct biological pathways by early weight trajectories, we stratified the 665 high-confidence AAM genes into 'early weight gain' (n = 404) or 'no early weight gain' AAM genes (n = 344; Supplementary Table 25).Early weight gain AAM genes specifically highlighted hormone regulation, feeding behavior, AKT phosphorylation targets and peptidyl-serine modification (Supplementary Fig. 14 and Supplementary Table 26).Conversely, the no early weight gain AAM genes highlighted female sex differentiation, negative regulation of transcription by RNA polymerase II, synapse organization and DNA damage response (Supplementary Fig. 15 and Supplementary Table 26).Pathways involved in response to organic compounds, proteins and hormones were enriched across both AAM gene groups (Fig. 4, Supplementary Figs.13-15 and Supplementary Table 26).
To understand how AAM-associated genes may exert effects on the HPG axis, we explored their expressional dynamics in mouse embryonic gonadotropin-releasing hormone (GnRH) neurons.GnRH neurons are central to reproductive processes and regulate the HPG axis by stimulating the pituitary secretion of follicle-stimulating hormone and luteinizing hormone 54 .During development, they migrate  through the nasal placode and into the hypothalamus, where they establish a neurosecretory network that activates pubertal onset 55,56 .RNA sequencing (RNA-seq) in GnRH neurons previously identified 2,182 genes that showed differential expression between embryonic migration stages (early, intermediate or late; Fig. 5) and were categorized into 23 spatiotemporal expression trajectories 57 .At the genome-wide level, we observed enrichment for AAM associations among genes that become upregulated in the late (Trajectory01, P adj = 3.8 × 10 −5 ) and mid to late stages of GnRH neuron development (Trajectory03, P adj = 0.032; Supplementary Table 27), that is, when GnRH neurons have completed their migration process and start to make synaptic connections.Of the 665 high-confidence AAM genes, 28 assign to Trajectory01 (P exact = 3.2 × 10 −6 ), including NEGR1 and TNRC6A and 31 assign to Trajectory03 (P exact = 7.8 × 10 −3 ), including KDM4C, PDE10A and TP53BP1.Both of these GnRH expressional trajectories remained enriched when considering only the subset of nonearly weight affecting genes (Trajectory01, P exact = 1.8 × 10 −5 ; Trajectory03, P exact = 0.01), while Trajectory01 was also enriched when considering only AAM genes that influence early weight gain (Trajectory01, P exact = 1.9 × 10 −3 ; Trajectory03, P exact = 0.09).

GPCRs and puberty timing
G-protein-coupled receptors (GPCRs) regulate several endocrine processes and diseases, including puberty timing 58 , and are therapeutic targets.Here 24 of 161 brain-expressed GPCRs 59 were implicated in AAM by at least one G2G predictor (Fig. 6 and Supplementary Table 28).These include MC3R, where we recently reported that rare loss of function (LOF) variants, which impair signaling, were associated with delayed puberty 15 , and GPR83, which encodes a Gα q11 -and Gα i -coupled GPCR widely expressed in several brain regions 60,61 and is implicated in energy metabolism 62 .In mice, Gpr83 and Mc3r are reportedly co-expressed in key hypothalamic neurons that control reproduction (kisspeptin, neurokinin B and dynorphin, KNDy neurons) and growth (growth hormone-releasing hormone, GHRH neurons) 15 .
Because dimerization between GPCRs may affect their signaling 63 , we tested for physical and functional interactions between MC3R and GPR83 in vitro.Using a bioluminescence resonance energy transfer (BRET)-based assay in HEK293 cells, we observed a physical and specific interaction between GPR83 and MC3R (Supplementary Fig. 16 and Supplementary Table 29).We then tested whether GPR83 modifies canonical MC3R signaling by measuring NDP-α-melanocyte-stimulating hormone (NDP-αMSH)-stimulated cyclic AMP production in HEK293 cells following transfection with plasmids encoding wild-type GPR83 and MC3R separately or together (1:1 ratio).GPR83 and MC3R cotransfection increased cAMP production by 43% compared to MC3R-alone (P = 0.03; Fig. 6, Supplementary Fig. 16 and Supplementary Tables 30 and 31).
Consistent with this in vitro interaction, we observed statistical genetic epistasis between the common AAM signals at MC3R rs3746619, a 5′-UTR SNP highly correlated with predicted deleterious coding variants, and GPR83 rs592068, which colocalizes with eQTLs for GPR83 in brain 64 and across tissues 65 .Among white European unrelated UK Biobank female participants, MC3R function-increasing alleles conferred increasingly earlier AAM in the presence of GPR83 expression-increasing alleles (β interaction = −0.034± 0.015 years, P interaction = 0.02; Fig. 6).These findings extend our previous observation that MC3R loss of function causes delayed puberty 15 and indicate that increased MC3R function through enhanced GPR83 expression leads to earlier puberty timing.

Joint regulation of ages at menarche and menopause
Previous GWASs have estimated a modest shared genetic etiology between AAM and age at natural menopause (ANM; genome-wide genetic correlation: r g = 0.14; P = 0.003) 66 .ANM gene candidates are mainly expressed in the ovary and implicate DNA damage sensing and repair (DDR) processes that maintain genome stability and hence preserve the ovarian primordial follicle pool 67 .
Of the 1,080 AAM signals, nine colocalized (at PP ≥ 0.5) and showed genome-wide significant (P < 5 × 10 −8 ) association with ANM, and a further 11 AAM signals colocalized and showed association with ANM at P < 4.6 × 10 −5 (= 0.05/1,080; Supplementary Table 32).We also considered whether ANM signals influence AAM.Of the 290 previously reported ANM signals 67 , 21 colocalized and showed association with  18 and 27.34), central to the establishment and maintenance of ovarian oocyte numbers 67 , and not previously implicated in puberty timing.A notable example is rs199879971 (P AAM = 1.5 × 10 −20 ; P ANM = 1.5 × 10 −34 ), which is intronic in MSH6, a DNA mismatch repair gene that is primarily expressed in peripheral reproductive tissues, such as ovary and uterus (Supplementary Table 35).Furthermore, the colocalized ANM signal at CHEK2 captures the previously described frameshift variant 1100delC 71 .This association was further supported by the exome data, with the 347 women carrying rare CHEK2 PTVs (excluding 1100delC) reporting on average 2 months later AAM (s.e.= 0.99, P = 0.04).CHEK2 encodes a cell cycle checkpoint inhibitor that has a crucial role in culling oocytes with unrepaired DNA damage 67 .

Article
A few of the shared AAM and ANM signals that map to DDR genes were assigned to the 'no early weight gain' trajectory (CHD4, MSH6, SCAI and SUMO1) and/or showed no association (P > 0.05) with adult BMI (CHEK2, MSI2, PPARG and WWOX; Supplementary Table 33).Three of the shared AAM and ANM signals that map to DDR genes were assigned to the 'moderate early weight gain' trajectory and further colocalized with adult BMI (RAD52, TP53BP1 and TRIP12; Supplementary Table 20).This suggests that some DDR genes might influence AAM via early life weight gain, although we only observed a trajectory-level DDR pathway enrichment for the 'no early weight gain' genes (Supplementary Tables 25 and 26).

Discussion
The GWAS signals identified by this expanded multi-ancestry GWAS double the variance explained in AAM compared to previous findings 9 and highlight associations with consistent effects across the two studied ancestry groups.Furthermore, the common variant PGS contributes substantially to risks of extremely early and late puberty timing.Future studies should explore the potential of this PGS to predict extreme disorders of puberty timing, in contrast to the effects of known monogenic causes.While the majority of our sample was European, the inclusion of East Asian ancestry data increased our power to identify homogeneous signals across the two ancestries.Further studies, including individuals from a broader range of ancestry groups, will be required to understand how generalizable our findings are to non-European populations.
We describe the first systematic characterization of common genetic determinants of both ends of reproductive lifespan, AAM and ANM.The 33 identified shared signals highlight the concordant effects of HPG axis genes on both AAM and ANM and also the influence of ovary-expressed genes involved in DNA damage response.28 and 30.

Article
https://doi.org/10.1038/s41588-024-01798-4DDR processes regulate ovarian oocyte numbers throughout life 67 but have not previously been implicated in puberty timing.DDR pathways were enriched among AAM genes that do not show a primary effect on early weight gain.Our findings suggest that the ovarian reserve, established during early fetal development, might signal centrally to influence the timing of puberty.
We address the considerable challenge of deriving biological insights from common variant signals 72 by developing G2G, an analytical pipeline that integrates a variety of data sources to enable gene prioritization.While comprehensive experimental validation is infeasible, its utility is supported by the prioritization of many genes with known involvement in sex hormone regulation and rare monogenic or syndromic disorders of puberty, obesity and hormone function.The validity of G2G prioritized genes is also supported by their enrichment for dynamic expression in GnRH neurons during their late stage of embryonic migration, when they begin their integration into the hypothalamic neural network controlling puberty 73 .Furthermore, we provide experimental support for one new high-scoring AAM gene, GPR83, which is co-expressed with, interacts with, and enhances the function of MC3R.Future studies should further explore the emerging role of brain-expressed GPCRs in linking central nutritional sensing to reproductive function.
Finally, we provide one of the few examples to date of epistatic interactions between common and rare genetic variants.Linked to puberty timing by both common and rare variants, the transcription factor ZNF483 has diverse binding sites across the genome.We infer that greater ZNF483 binding promotes earlier AAM, whereas rare deleterious variants in ZNF483 appear to abolish the common genetic influence on puberty timing.
Together, these insights shed light on mechanisms, including early life weight gain and adiposity, hormone secretion and response and cellular susceptibility to DNA damage, that potentially explain the widely reported relationships between earlier puberty timing and higher risks of later life mortality, metabolic disease and cancer.

GWAS meta-analysis for AAM
Association summary statistics were collated from studies on AAM (predominantly recalled in adulthood) and genome-wide SNP arrays imputed to the 1000 Genomes reference panel or more recent (Supplementary Table 1).Genetic variants and individuals were filtered based on study-specific quality control metrics.In each study, genetic variants were tested for association with AAM in additive linear regression models, including as covariates age and any study-specific variables, such as genetic principal components.Insertion and deletion polymorphisms were coded as 'I' and 'D' to allow harmonization across all studies.Association statistics for each SNP were then processed centrally using a standardized quality control pipeline 74 .Each variant was meta-analyzed using a fixed-effects inverse-variance-weighted (IVW) model using METAL 75 .This was done in two stages.First, summary statistics from studies within each stratum ((1) ReproGen consortium studies, (2) reproductive cancer consortium studies and (3) East Asian studies) were meta-analyzed and then filtered to include only variants present in more than half of the studies within each stratum.Second, strata-level results were meta-analyzed with data from UK Biobank 76 , using 'first instance' data for AAM (field 2714) and 23andMe.Initially, we performed a European-only analysis (up to n = 632,955).This combined file was filtered to include only variants present in the UK Biobank and at least one other stratum.Variants were also filtered to include MAF ≥ 0.1%.We then performed a second analysis by adding the data from the East Asian studies (up to n = 799,845) and followed the same sample filtering steps and identification of independent signals (described below).

Replication and explained variance
Independent replication of identified signals was performed in data from the DBDS 16 .The DBDS includes questionnaire-recalled AAM data on 35,472 European women (Age when menstruation started?).Mean age at recall was 38.4 years (s.d.= 12.9 years), and the mean AAM was 13.1 years (s.d.= 1.4 years).Indirect confirmation of AAM signals was sought by association with AVB in men in UK Biobank 17 (n = 191,235 European men-data field 2385), the 23andMe study 18 (n = 55,781 European men) and a meta-analysis of the two 14 (n = 205,354 European men).For signals with missing data for either AVB dataset, we identified proxies using the UK Biobank White European dataset (within 1 Mb of the reported signal and r 2 > 0.6), choosing the variant with the highest r 2 value.Given the smaller sample sizes of these cohorts, we performed a binomial sign test for global replication.The variance explained by each lead AAM signal in the DBDS was calculated using the formula 2 × f (1 − f)β2a, where f denotes the variant MAF and βa is the effect estimate in additive models.The overall variance explained was calculated as the sum of individual variants.

UK Biobank phenotype preparation
For downstream analyses in the UK Biobank, we derived an AAM variable using data from field 2714.To maximize sample size, individuals with missing or implausibly early or late 'first instance' AAM (<8 years old or >19 years old) were imputed using data from the next available instance (if plausible).We also derived two binary traits to represent abnormally early (precocious) and delayed puberty.Early puberty was defined as AAM < 10 years old (n = 1,321).Delayed puberty was defined as AAM > 15 years old (n = 10,530).For comparison, participants reporting AAM at 12 or 13 years were controls (n = 81,950).All data analysis and visualization were conducted in R (version 4.2.1),unless otherwise stated.

Rare variant associations with AAM
To identify gene-level rare variant associations with AAM, we performed an ExWAS analysis using whole-exome sequencing (WES) data on 222,283 UK Biobank women of European genetic-ancestry 77 .WES data processing and quality control were performed as described in ref. 30.Individual gene burden tests were performed by collapsing variants with MAF < 0.1% per gene according to their predicted functional consequences.We defined the following two functional categories of rare variants: (1) HC PTV annotated using VEP 78 and LOFTEE 79 and (2) DMG including HC PTVs plus missense variants with CADD score ≥25 (ref.19).We analyzed a maximum of 17,885 protein-coding genes, each with at least ten rare allele carriers in either of the two variant categories (P < 1.54 × 10 −6 , 0.05/32,434 tests).Gene burden association tests were performed using BOLT-LMM 80 .The validity of the ExWAS analysis was indicated by the absence of significant association with the synonymous variant mask (Supplementary Fig. 1) and low exome-wide inflation scores (λ-PTV = 1.047 and λ-DMG = 1.047).Where applicable, protein domains were annotated using information from UniProt 81 , and domain-level burden tests were then performed using linear models.

Rare variant associations with other traits
We assessed the associations of any ExWAS AAM-associated genes in the UK Biobank with a range of related phenotypes-ANM (based on field 3581), BMI (based on field 21001), comparative body size age 10 (based on field 1687), adult height (based on field 50), comparative height age 10 (based on field 1697) and circulating IGF-1 concentrations (based on field 30770).We considered only the top AAM-associated variant mask for each gene.We also performed a similar lookup of these genes across a broader range of phenotypes using the AstraZeneca Portal 82 .

Rare variants in IHH panel app genes
We selected high-evidence green genes with an established monoallelic/X-linked mode of inheritance from the routine clinical investigation Genomics England gene panel for IHH.At the time of the study, this included the following four genes: ANOS1, CHD7, FGF8 and WDR11.We performed a lookup of these genes in the UK Biobank WES data for AAM (n = 222,283) and VB (n = 178,625), considering only HC PTVs with MAF < 0.1%.We also extracted the phenotype of individual carriers.As in the ExWAS analysis, normal pubertal timing was defined in women as AAM between 10 and 15 years of age 1 and in men as AVB at an 'about average age' (UK Biobank data field 2385).

PGS calculation
We calculated a genome-wide PGS for AAM using lassosum 31 .To keep PGS generation independent of PGS testing, we generated the PGS using our European-ancestry GWAS data excluding UK Biobank.We randomly selected 25,000 unrelated Europeans in the UK Biobank to generate the LD reference.The resulting PGS was standardized by subtracting the mean and dividing by the s.d.
We divided the PGS into 100 centiles and calculated the mean AAM for each PGS centile, as well as PGS centile-specific ORs for precocious or delayed AAM (as defined above) compared to individuals in the 50th centile of the PGS.We also calculated the mean PGS for each completed whole year of AAM.
We next tested whether the carriage of ExWAS AAM-associated rare variants modifies the influence of the PGS on AAM.We performed linear models that included interaction terms (PGS × rare variant carrier status) in the subsample of unrelated white-European UK Biobank https://doi.org/10.1038/s41588-024-01798-4female participants with WES, PGS and AAM data (n = 187,941).To test for chance effects due to the low sample size, we randomly subsampled noncarriers to a sample size equivalent to that of carriers and compared this distribution of AAM to that observed in carriers.
A PGS comprising the 882 available lead AAM SNPs or their proxies (of the 935 independent AAM signals from the European analysis) was computed in 3,140 girls with available imputed GWAS data from the ALSPAC study.Linear or logistic regression models for continuous AAM, early AAM (−2 s.d., corresponding to <10.38 years) and delayed AAM (+2 s.d., corresponding to >14.95 years) were tested, controlling for the first 20 genetic principal components (PCs).Other models assessed the predictive performance of BMI at age 8 years and mother's AAM.Finally, a model including all predictors as covariables was calculated.The predictive performance of each model was evaluated by the R 2 metric for continuous AAM and by the AUROC for binary AAM outcomes.

G2G pipeline
Mapping GWAS signals to genes.We performed signal selection on the two sets of AAM GWAS summary statistics, from the European-only and ancestry-combined meta-analyses.For each meta-analysis separately, we first filtered out all variants with MAF < 0.1%.The remaining variants were merged with allele information from the UK Biobank to provide the genomic sequence for any missing alleles.Genome-wide significant signals (P < 5 × 10 −8 ) were selected initially based on proximity (in 1 Mb windows).Secondary signals at the same significance level (P < 5 × 10 −8 ) within these windows were then identified using approximate conditional analysis (GCTA 83 ).We generated an LD reference panel derived from 25,000 randomly selected white-European unrelated UK Biobank participants for GCTA and other downstream analyses, including analyses on the ancestry-combined data, due to the lack of approaches available for handling multi-ancestry LD reference panels.Secondary signals were defined if uncorrelated (in low LD, r 2 < 0.05) with the proximity-defined signals and if they did not show an overt change in their AAM association between baseline and conditional models (change in β < 20% or change in P value by less than four orders of magnitude).Primary and secondary AAM signals were further checked for pairwise LD within 10 Mb windows using PLINK (v1.90b6.18) 84, and only independent signals (in low LD, r 2 < 0.05) were retained, prioritizing distance-based signals in the case of linkage.Signal selection was performed first using the European-ancestry GWAS meta-analysis and then supplemented by any signals identified using the same process in the ancestry-combined meta-analysis that were uncorrelated (r 2 < 0.05) with any European-ancestry signal.
Independent GWAS AAM signals were examined for proximal genes, defined as those within 500 kb upstream or downstream of the gene's start or end sites, using the National Center for Biotechnology Information (NCBI) RefSeq gene map for GRCh37 (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/).
Colocalization with expression or pQTL data.Tissue enrichment for GWAS associations was performed using LD score regression applied to tissue-specific expression (LDSC-SEG) 85 and tissue-specific annotations from the Genotype-Tissue Expression (GTEx; https://github.com/bulik/ldsc/wiki/Cell-type-specific-analyses).Significantly enriched tissues (P < 0.05) were then included in colocalization analyses with the tissue-specific and cross-tissue meta-analyzed GTEx eQTL data (v7 (ref.65; available via https://gtexportal.org)and using the fixed-effects summary statistics for the latter), in addition to data from the eQTLGen 86 and Brain-eMeta 64 studies.
Including genomic variants with at least suggestive association with AAM (GWAS, P < 5 × 10 −5 ), we applied summary data-based Mendelian randomization (MR) and heterogeneity in independent instruments (SMR-HEIDI, v0.68 (ref.87)) and the approximate Bayes factor method in the R package 'coloc' (v5.1.0(ref.88)).For the former, we considered gene expression to be influenced by the same GWAS AAM variant if the false discovery rate (FDR)-corrected SMR test P < 0.05 and HEIDI test P > 0.001.For the latter, genomic regions were defined as ±500 kb around each gene, and loci exhibiting an H4 PP > 0.75 were considered to show evidence of colocalization.
We also tested for colocalization between GWAS AAM variants and pQTLs from two datasets.First, we used pQTL data from the Fenland study 89 , which includes data on 4,775 protein targets captured via the SomaScan v4 assay, measured in plasma from 10,708 European individuals.In addition, pQTL summary statistics from the UKB Pharma Proteomics Project 90 were used, which includes 2,923 protein targets captured via the Olink Explore 3072 proximity extension assay in up to 34,090 individuals of European ancestry.Colocalization was tested using the same procedure as mentioned above.It is important to note that colocalization analysis cannot determine causal relationships or the direction of causality between the two phenotypes.
Mapping GWAS signals to enhancers and coding variants.For genes proximal to (within 500 kb) GWAS AAM signals, we calculated genomic windows of high LD (r 2 > 0.80) around each signal and mapped these to the locations of known enhancers for the genes, using activity-by-contact (ABC) enhancer maps 34 .This was done across the 131 available cell/tissue types, and genes were matched to enhancers only in the tissues/cells where they were actively expressed.
We also checked whether GWAS AAM signals were in LD (r 2 > 0.80) with any coding variants within the paired genes and what the predicted consequence of those coding variants was, using SIFT 91 and POLYPHEN 92 .
Gene-level GWAS associations with AAM.We performed a gene-level MAGMA analysis 35 , which collapses common GWAS variants within each gene and calculates aggregate gene-level associations with the outcome trait, as described in ref. 35.To enhance the validity of this approach, we restricted the analysis to include only coding variants.Genes with FDR-corrected MAGMA P < 0.05 were considered associated with AAM.
Finally, we used the PoPS 36 , which is a similarity-based gene prioritization method that uses cell-type-specific gene expression, biological pathways and protein-protein interactions to prioritize likely causal genes from GWAS data.At each locus, the gene with the numerically highest PoPS score was determined to be the PoPS-prioritized gene.
Calculation of G2G scores.From the abovementioned analyses, gene-level results were scored for each of the six sources as follows: 1. Closest gene: Gene proximity to a GWAS signal is a good predictor of causality 37 .The genes closest to each AAM signal (if also within 500 kb) were assigned.All genes with an intragenic signal were assigned as closest.The closest genes scored 1.5 points.2. eQTL colocalization: Genes with evidence of eQTL colocalization via both SMR-HEIDI and coloc scored 1.5 points.Genes with evidence of colocalization via only one of these received 1.0 points.A further (1.0) point was assigned to genes if the most likely shared causal variant between eQTL and GWAS AAM was independent of the proximal GWAS signal (r 2 < 0.05).3. pQTL colocalization: The same scoring as in 'eQTL colocalization' was applied to pQTL analyses.4. Coding variants: As the evidence was overlapping for coding variant gene-level MAGMA analysis and signals correlated with coding variants, these analyses were scored concomitantly.
Genes with an FDR-corrected MAGMA P < 0.05 were scored 0.5 points.Genes containing coding variants of deleterious or damaging predicted consequences in LD with GWAS AAM signals were scored 1.0 points, or only 0.5 points if the coding variants were predicted to be benign or tolerated. https://doi.org/10.1038/s41588-024-01798-4 5. ABC enhancers: Genes targeted by enhancers that overlapped with or were correlated with GWAS AAM signals were scored 1.0 points.6. PoPS: PoPs-prioritized genes at each locus were scored 1.5 points.
G2G scores for each gene-signal pair were calculated as the sum of scores from these six sources.Genes that scored >0 points and were located within 500 kb of a GWAS AAM signal were considered further.To account for confounding due to large LD blocks, G2G scores were adjusted for signal LD window size (defined as the genomic distance containing variants with pairwise r 2 > 0.50 with the lead SNP) using linear regression models.
For genes proximal to more than one GWAS AAM signal (and hence with multiple G2G scores), the signal with the most concordant sources for that gene (highest residual G2G score) was retained and a further (1.0) point was added to reflect evidence from multiple signals.This resulted in a unique summarized G2G score for each included gene.To account for confounding due to gene size, G2G scores were further adjusted for gene length using linear regression models.The resulting residuals were considered to be the final G2G scores.
To prioritize likely causal AAM genes, all G2G-scored genes (that is, highlighted as potentially causal by at least one source) were ranked and also allocated a G2G centile position.In addition, the number of concordant predictors (sources) for each gene was noted (range: 1-6 sources).Finally, to reflect uncertainty due to multiple high-scoring genes for the same signal, genes were flagged if they were proximal (within 1 Mb) to other genes with a similar G2G score (within 0.5 points or greater and highlighted by at least the same number of sources).
High-confidence AAM genes.Independent GWAS signals from the all-and the European-ancestry meta-analyses were annotated with their top G2G scoring gene using corresponding GWAS data (that is, European analysis signals were annotated with genes from the European G2G, etc.).Genes implicated by at least two concordant sources were considered to be high-confidence AAM genes.
High-confidence AAM genes were functionally annotated using STRING 93 .Links to rare monogenic disorders were annotated from the Online Mendelian Inheritance in Man (OMIM) database (via OMIM; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University; accessed November 2023, https://omim.org/).Finally, we used GTEx, a publicly available resource for tissue-specific gene expression, to lookup the tissue expression of 1,080 AAM genes highlighted by G2G 65 .

ZNF483 genome-wide binding analysis
We used fGWAS (v.0.3.6 (ref.32)), a hierarchical model for joint analysis of GWAS and genomic annotations, to test for enrichment of GWAS AAM signals among ZNF483 transcription factor binding sites.fGWAS models a maximum likelihood parameter estimate for the enrichment of a transcription factor (in this case, ZNF483).To perform this, we annotated the European-ancestry GWAS AAM summary statistics with the ZNF483 binding sites from the ENCODE ChIP-seq data derived from the human HepG2 cell line (ENCSR436PIH).
We also used SLDP (https://github.com/yakirr/sldp;ref. 33) to explore the directional effect of the ZNF483 function on AAM.We tested whether alleles that are predicted to increase the binding of ZNF483 have a combined tendency to increase or decrease AAM.SLDP requires signed LD profiles for ZNF483 binding, a signed background model and a reference panel in a SLDP-compatible format.We used a 1000 Genomes Phase 3 European reference panel containing approximately 10 million SNPs and 500 individuals.

Clustering of AAM signals by early childhood body weight
We analyzed repeated measurements of early childhood body weight from the MoBa cohort study 52,94 to investigate the relationship between early growth and puberty timing.Childhood body weight values were extracted from the study questionnaires for 12 different time points from birth to age 8 years using previously reported exclusion criteria 52 .Weight values were standardized and adjusted for sex and gestational age using the generalized additive model for location, scale and shape (GAMLSS; v5.1-7, www.gamlss.com) in R (v3.6.1) as previously reported 52 with the exception that a Box-Cox t distribution was used to standardize body weight values (instead of the log-normal distribution used for BMI) 52 .GWAS for these traits was performed using BOLT-LMM (v2.3.4) as previously reported 52 .
We performed MR analyses to assess the likely causal effects of AAM on childhood weight at each time point 95 .As instrumental variables (IVs), we used all 1,080 AAM-associated lead SNPs individually.As outcome data, we used childhood weight at the 12 time points.For SNPs with missing outcome data, we identified proxies within 1 Mb and r 2 > 0.6, choosing the variant with the highest r 2 value, using a random selection of 25,000 unrelated European-ancestry UK Biobank individuals for the LD reference.Genotypes at all variants were aligned to the AAM-increasing allele.We used IVW MR models, as this has the greatest statistical power 96 .
Next, we stratified the 1,080 AAM lead SNPs by their effects on early childhood weight using a k-means clustering approach for longitudinal data 97 .We performed five different models with k-means for k∈{2,3,4,5,6} clusters 20 times each.To find the optimal partition, we used the 'nearlyAll' option that uses several different initialization methods in alternation.As the assumption of homoscedasticity was not met, we used the Carolinski-Harabatz criterion, a nonparametric quality criterion, to derive the optimal number of clusters.
We then performed additional MR analyses, combining AAM signals within each identified cluster as IVs and, as the outcomes, childhood weight at each time point and also adult BMI (on 450,706 UK Biobank participants).We grouped together 'high early weight' and 'moderate early weight' AAM SNPs into a single IV to maximize power.

Biological pathway enrichment analysis
We performed gene-centric biological pathway enrichment analysis using g:Profiler (via the R client 'gprofiler2', v0.2.1 (ref.53)).We used a filtered set of Gene Ontology (GO) pathways (accessed on 21 February 2023), focusing on GO:BP, KEGG and REACTOME, and restricted the analysis to those pathways with 1,000 genes or fewer, reasoning that these are more biologically specific.Pathway enrichment analyses were performed using the set of 665 high-confidence AAM genes and repeated when stratified by their effects on early childhood weight (see 'Clustering of AAM signals by early childhood body weight').Pathways with Bonferroni-corrected P < 0.05 were considered to be associated with AAM.
As the pathways derived from overlapping sources, we clustered the AAM-associated pathways to aid interpretation.Clustering was based on shared AAM genes across pathways.We used a 'complete' clustering algorithm and a custom distance calculated as (one minus the proportion of the overlap between any two pathways relative to the pathway with the smaller overlap).Thus, between two pathways, a value of 0 indicates that all the shared AAM genes in the pathway with fewer genes are also enriched in the other pathway.To define clusters, we chose an arbitrary overlap value of 0.5, which indicates that pathways in the same cluster share 50% or more of their AAM genes.
Each pathway cluster was annotated by (1) the pathway with the most significant enrichment, (2) the pathway with the highest proportion of AAM genes, (3) the biological coherence of the pathways and (4) shared genes common to all included pathways.We considered that pathways were overlapping between the total AAM gene set and the two early weight subgroups if there were common pathways across either (1) the most significant pathway or (2) the pathway with the highest proportion of AAM-associated genes. https://doi.org/10.1038/s41588-024-01798-4

Expression of AAM genes in GnRH neurons
We tested for enrichment of AAM-associated genes in RNA-seq data from embryonic GnRH mouse neurons 57 .All expressed genes were sorted into different expressional trajectories based on shared dynamic expression profiles across three developmental stages (early, intermediate or late; Zouaghi, et al. 57 ).We tested for enrichment of AAM-associated genes (from our European-ancestry GWAS meta-analysis) in any identified trajectory, using MAGMA 35 with custom pathways.As a sensitivity test, we used Fisher's exact test to confirm the over-representation of AAM-associated genes within each trajectory.

Colocalization of AAM signals with BMI and menopause
To explore the shared genetic architecture between AAM, ANM and adult BMI, we performed a colocalization analysis for each of the 1,080 AAM signals.ANM GWAS summary statistics were from reported ReproGen data on ~250,000 women of European ancestry 67 .Adult BMI GWAS summary statistics were derived from 450,706 individuals in the UK Biobank.For AAM signals with missing outcome GWAS data, we identified proxies within 1 Mb and with an r 2 > 0.6 using our 25,000-participant UK Biobank LD reference.We applied both Bonferroni correction (P ≤ 0.05/1,080 = 4.6 × 10 −5 ) for association with the outcome trait, and a less stringent PP of colocalization PP > 0.5, due to the different priors for these hypothesis-driven analyses.
The same approach was applied in the opposite direction by testing ANM signals identified in the most recent ReproGen GWAS 67 for association with AAM.ANM signals were highlighted if they passed Bonferroni correction (P ≤ 0.05/290 = 1.7 × 10 −4 ) for association with AAM.As ANM signals are well-established to be enriched for DNA DDR, we built a comprehensive list of DDR genes, integrating the following five different sources: (1) an expert-curated DDR gene list (broad DDR) from the laboratory of S. Jackson (this list encompasses a range of related pathways-DNA repair genes, broader DNA damage response genes (such as damage-induced chromatin remodeling, transcription regulation or cell cycle checkpoint induction)) and general maintenance of genome stability (such as genes involved in DNA replication); (2) a second expert-curated list previously reported 67 (assembled by J. Perry, E. Hoffmann and A. Murray); (3) genes listed in the REACTOME 98 'DNA repair' pathway (R-HSA-73894); (4) genes listed in the GO 'DNA repair' pathway (GO:0006281) and (5) genes listed in the GO 99 'cellular response to DNA damage stimulus' (GO:0006974).
BRET to measure dimerization.Heterodimerization between GPR83 and MC3R was quantified using BRET1 in titration configuration.Briefly, 12,000 HEK293 cells seeded in 96-well plates were transfected with a constant dose of MC3R-RlucII plasmid (0.5 ng per well) and increasing doses of GPR83-Venus plasmids, or soluble (s) Venus as negative control.All conditions were topped up with an empty vector (pcDNA3.1 (+)) to a total of 100 ng of plasmid per well.Twenty-four hours post-transfection, cells were washed once with Tyrode's buffer and total Venus fluorescence was measured in a Spark 10M microplate reader (Tecan) using monochromators (excitation 485 ± 20 nm and emission 535 ± 20 nm).BRET was quantified 10 min after the addition of coelenterazine H (NanoLight Technology; 2.5 mM).netBRET was calculated as (absorbance at 533 ± 25 nm/absorbance at 480 ± 40 nm) − (background (absorbance at 533 ± 25 nm/absorbance at 480 ± 40 nm)), with the background corresponding to the signal in cells expressing the RlucII protomer alone under similar conditions.Data on the x axis represent the ratio between acceptor (Venus) fluorescence and donor (RlucII) luminescence.Representative data are from four independent experiments.Time-resolved cAMP assay.Measurement of ligand-induced cAMP generation in HEK293 cells transiently expressing either MC3R or both MC3R and GPR83 was performed using the GloSensor cAMP biosensor (Promega), according to the manufacturer's protocol.Briefly, 12,000 cells were seeded in white 96-well poly-d-lysine-coated plates.After 24 h, cells were transfected with both 100 ng per well of pGloSensorTM-20F cAMP plasmid (Promega, E1171) and 30 ng per well of each plasmid encoding either MC3R or MC3R and GPR83, using Lipofectamine 2000 (Gibco, 11668).All conditions were topped up with an empty vector (pcDNA3.1 (+)) to a total of 160 ng of plasmid per well.The day after transfection, cell media were replaced by 90 ml of fresh DMEM with 2% vol/vol GloSensorTM cAMP reagent (Promega, E1290) and incubated for 120 min at 37 °C.Firefly luciferase activity was measured at 37 °C and 5% CO 2 using a Spark 10M microplate reader (Tecan).After initial measurement of the baseline signal for 10 min (30 s intervals), cells were stimulated with 10 ml of 10× stock solution of the MC3R agonist NDP-aMSH (final concentration was 1 mM), and real-time chemiluminescent signals were quantified for 25 min (30 s intervals).In each experiment, a negative control using mock-transfected cells (empty pcDNA3.1(+)plasmid) was assayed.The area under the curve (AUC) was calculated for each cAMP production curve considering the total peak area above the baseline calculated as the average signal for mock pcDNA3.1(+)-transfectedcells.For data normalization, the AUC from mock-transfected cells was set as 0 and the AUC from WT MC3R was set as 100%.The results are from six independent experiments.

Fig. 3 |
Fig. 3 | Epistatic interactions between rare coding variants and common genetic susceptibility on AAM in the UK Biobank.a, Interaction effects (mean and 95% CIs) on AAM between a GWAS PGS and carriage of qualifying rare variants in the six exome-highlighted genes (n = 222,283).b,c, Predicted mean (with 95% CI) AAM in (b) noncarriers (black) and carriers (light blue) of rare variants in six genes without significant (P > 0.05) interaction effects and (c) in noncarriers (left) and carriers (right) of rare variants in ZNF483, which shows significant interaction (P = 0.025).In c, points show individual AAM values, with the color gradient indicating increasing age.d, Plot of individual rare damaging (DMG) variant associations with AAM by ZNF483 functional domains.The coding part of ZNF483 is depicted by the horizontal black line.Included DMG variants had an MAF < 0.1% and were annotated to be either HC PTVs or missense variants with CADD score ≥25.Each variant association is represented by a circle and vertical line-the line length indicates the −log 10 (P), in the direction of its effect on menarche in carriers of the rare allele, and the circle size indicates the number of carriers of each variant (that is, allele count).Relevant data are included in Supplementary Table12.

Fig. 4 |
Fig. 4 | Stratification of AAM signals and biological pathway enrichment by their influence on early childhood weight.a, Proportion of GWAS signals for AAM by early childhood weight trajectory.b, Biological pathways enriched for high-confidence AAM genes (left), plus enrichment within early childhood weight trajectories (right).Pathway enrichment was calculated using g:Profiler, and the displayed P values are Bonferroni-corrected. Row names Article https://doi.org/10.1038/s41588-024-01798-4

Fig. 5 |
Fig. 5 | Enrichment of gene drivers of GnRH migration and maturation in the AAM GWAS.a, Schematic representation of the stages of GnRH neuron migration during embryonic development.Using RNA-seq data, Zouaghi et al. 57 grouped differentially expressed genes into 23 expressional trajectories based on their comparative level of expression during the early (yellow), intermediate (amber) and late (red) stages of GnRH migration.b, Genome-wide MAGMA enrichment for AAM gene associations within each expression trajectory.Colored symbols accompanying the points indicate the expression level of

Fig. 6 |
Fig. 6 | Interactions between GPCRs on AAM.a, A total of 24 brain-expressed GPCRs were implicated in AAM by the G2G analysis of the white European GWAS data.Point colors indicate the number of concordant G2G predictors implicating each GPCR.b, Time-resolved NDP-αMSH-stimulated cAMP production in HEK293 cells expressing MC3R-alone or with both MC3R and GPR83.Data are presented as the mean (±s.e.) percentage of the maximal MC3R-alone response (from six independent experiments).c, AAM according to dosage of MC3R function-increasing C alleles at rs3746619 (x axis in each panel) and GPR83 expression-increasing T alleles at rs592068.Predicted means are represented by the lines, and the accompanying 95% CIs are denoted by the shaded areas.β interaction = −0.034± 0.015 years, P interaction = 0.02.Supporting data are included in Supplementary Tables28 and 30.

Table 12 .
Article 52tps://doi.org/10.1038/s41588-024-01798-4puberty) in the Norwegian Mother, Father and Child Cohort Study (MoBa; n = 26,681 children)52.We identified three trajectories-464 AAM signals (44%) formed a 'moderate early weight gain' trajectory and 15 (1%) formed a 'high early weight gain' trajectory; both trajectories were characterized by effects of AAM-reducing alleles on higher weight gain across early childhood.The remaining 586 (55%) AAM signals formed a 'no early weight gain' trajectory; yet, in combination, AAM-reducing alleles in this trajectory increased adult BMI (β = 0.487 kg m −2 yr −1 ; P = 1.6 × 10 −20 ; Supplementary Tables Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.MethodsUK Biobank data have approval from the North West Multicentre Research Ethics Committee as a Research Tissue Bank.The 23andMe research participants provided informed consent and volunteered to participate in the research online under a protocol approved by the external Association for the Accreditation of Human Research Protection Programs (AAHRPP) accredited Institutional Review Board (IRB), Ethical and Independent Review Services.Each of the other individual studies that contributed data has its own ethical approval from the relevant boards.