Genetic architecture of telomere length in 462,666 UK Biobank whole-genome sequences

Telomeres protect chromosome ends from damage and their length is linked with human disease and aging. We developed a joint telomere length metric, combining quantitative PCR and whole-genome sequencing measurements from 462,666 UK Biobank participants. This metric increased SNP heritability, suggesting that it better captures genetic regulation of telomere length. Exome-wide rare-variant and gene-level collapsing association studies identified 64 variants and 30 genes significantly associated with telomere length, including allelic series in ACD and RTEL1. Notably, 16% of these genes are known drivers of clonal hematopoiesis—an age-related somatic mosaicism associated with myeloid cancers and several nonmalignant diseases. Somatic variant analyses revealed gene-specific associations with telomere length, including lengthened telomeres in individuals with large SRSF2-mutant clones, compared with shortened telomeres in individuals with clonal expansions driven by other genes. Collectively, our findings demonstrate the impact of rare variants on telomere length, with larger effects observed among genes also associated with clonal hematopoiesis.

Telomeres protect chromosome ends from damage and their length is linked with human disease and aging.We developed a joint telomere length metric, combining quantitative PCR and whole-genome sequencing measurements from 462,666 UK Biobank participants.This metric increased SNP heritability, suggesting that it better captures genetic regulation of telomere length.Exome-wide rare-variant and gene-level collapsing association studies identified 64 variants and 30 genes significantly associated with telomere length, including allelic series in ACD and RTEL1.Notably, 16% of these genes are known drivers of clonal hematopoiesis-an age-related somatic mosaicism associated with myeloid cancers and several nonmalignant diseases.Somatic variant analyses revealed gene-specific associations with telomere length, including lengthened telomeres in individuals with large SRSF2-mutant clones, compared with shortened telomeres in individuals with clonal expansions driven by other genes.Collectively, our findings demonstrate the impact of rare variants on telomere length, with larger effects observed among genes also associated with clonal hematopoiesis.
Telomeres are repetitive nucleotide sequences that protect the ends of chromosomes from degradation and are thus crucial for maintaining genomic integrity.In somatically dividing cells, telomeres shorten with each replication cycle until they reach a critical length that triggers cellular senescence and ultimately cell death 1,2 .Telomere length demonstrates considerable interindividual variability modulated by heritable 3,4 , environmental and lifestyle factors such as smoking behavior and stress 5 .Rare germline mutations linked to telomere shortening have been associated with severe diseases, including premature aging syndromes, interstitial lung disease and immunodeficiencies 1,6,7 .
More subtle reductions in telomere length have been associated with common, age-related diseases, such as coronary artery disease 8 .Although telomere length is heritable, our current understanding of its genetic determinants has been largely limited to the study of common variants.A greater understanding of the genetic determinants of telomere length would provide insights into disease pathogenesis, thereby identifying potential new therapeutic targets.
High-throughput telomere length assays have been developed to understand telomere biology at the population level.One such method uses quantitative PCR (qPCR) to measure the relative abundance of Article https://doi.org/10.1038/s41588-024-01884-7smaller 95% credible SNP sets (median = 8) compared with the separate qPCR and WGS GWASs (median = 12 and 11, respectively), highlighting that PC1 can more effectively identify potentially causal variants.In total for PC1, we identified 192 significant (P < 5 × 10 −8 ) loci (Supplementary Tables 5 and 6), 70 of which were not within 1 Mb of a previously implicated locus.Associations at known loci were also stronger with PC1 compared with qPCR or TelSeq, further demonstrating the value of the combined metric (Extended Data Fig. 5).

Rare-variant-level associations with telomere length
We observed that rare variants have demonstrably larger effects on telomere length than common variants and have also been implicated in numerous telomere-related diseases.In the present study, we focused on protein-coding variants observed in WGS data from 439,351 UKB participants of NFE broad genetic ancestry to examine the effect of rare variation on PC-derived telomere length estimates.After removing individuals with known hematological malignancies at sampling (N = 3,073), we performed both variant-level (exome-wide association study (ExWAS)) and gene-level (rare-variant-aggregated collapsing) analyses 16 .We observed high concordance (r 2 = 0.99) between the effect sizes for the common variants included in the ExWAS and our separate common variant GWAS (microarray genotyping) analyses.Genomic inflation was also well controlled with a median λ GC = 1.07 (Supplementary Fig. 9).
We restricted our downstream analyses of the ExWAS to rare (MAF < 0.1%) exonic variants that were too rare to be well represented in the GWAS.Based on our previously identified significance threshold of P ≤ 1 × 10 −8 (ref.16), there were 62 significant rare-variant germline associations across 19 distinct genes (Fig. 2a and Supplementary Table 7) for PC1 after excluding variants that were also significantly associated with PC2 (Supplementary Fig. 10).Although all of the variants except 8-84862338-A-G (RALYL.p.Ala165Ala, P = 4.8 × 10 −11 , β = 2.24 (1.57-2.90))overlapped with a previously identified GWAS locus, the absolute effect sizes observed for the ExWAS analyses were generally significantly greater than that previously reported for the same loci.Of the 62 rare-variant germline signals, 16% (10 of 62) were only significantly associated with PC1 and not underlying qPCR or TelSeq measurements.
Thirty-nine germline rare variants were associated with longer telomere length and clustered in components of the CST (CTC1) and Shelterin (ACD, TERF1 and TINF2 POT1) complexes, both of which function to protect telomere ends and regulate interactions with telomerase.Of these, ten were protein-truncating variants (PTVs) in CTC1, POT1, SAMHD1, TINF2 and TERF1, all of which are genes implicated in telomere-associated diseases.It is interesting that the two PTVs in CTC1 (17-8237439-GCTTT-G p.Lys242fs: P = 1.35 × 10 −24 , β = 0.54(0.44-0.65);and 17-8229438-AG-A p.Leu1007fs: P = 4.12 × 10 −11 , β = 0.53 (0.37-0.69)) have both been implicated in compound, heterozygous, recessive, cerebroretinal microangiopathy with calcifications and cysts (CMCC, also known as Coats plus syndrome), which is associated with shorter telomere sequences compared with a reference sequence 9 .More recently introduced in silico methods, such as TelSeq, measure average telomere length from whole-genome sequencing (WGS) data 10 .The advances in genome sequencing of population-scale biobanks provides unprecedented opportunities to leverage these approaches to study the genetic architecture of telomere length and ultimately its impact on human health at a population scale.In a recent study of over 472,174 UK Biobank (UKB) participants, a microarray-based, genome-wide association study (GWAS) identified >100 independent common variant loci associated with qPCR telomere length measurements 8 .By combining these measurements with whole-exome sequencing (WES) data across 418,401 individuals, Kessler et al. identified rare-variant associations for several previously established genes 11 .Another study applied the TelSeq algorithm to estimate telomere length from the whole-genome sequences of 109,122 multiancestry individuals from the TopMed program and identified 36 associated loci, which largely overlap those identified by qPCR-based measures 12 .
In the present study, we leverage a larger sample size of WGS data from 490,397 multiancestry UKB participants to study the genetic architecture of telomere length, including contributions from both rare and common variants.Moreover, in comparing qPCR-and WGS-derived telomere length estimates in the same individuals, we observed that combining both measurements into a single statistical metric significantly improved the accuracy of telomere length estimates and empowered discovery potential.

A combined telomere length metric increases heritability
Of the 490,397 UKB participants with WGS data, we took forward for analysis 462,666 UKB samples (94%) that met our quality control (QC) thresholds (Methods) and for whom qPCR telomere length estimates were also available (Supplementary Table 1 and Extended Data Fig. 1).As an alternative method for estimating telomere length, we also used TelSeq (Supplementary Fig. 1), which estimates telomere length from the WGS data 10 .
As expected, telomere length estimated from TelSeq and qPCR were both significantly associated with age, sex and ancestry (Supplementary Table 2 and Extended Data Fig. 2).It is interesting that the qPCR-and coverage-adjusted TelSeq telomere length estimates were only moderately correlated (r 2 = 0.29; Fig. 1a) after consideration of potential sequencing confounders (Extended Data Fig. 3, Supplementary Figs.2-5, Supplementary Table 3 and Supplementary Notes 1 and 2).In a joint model, the association between each of the metrics and age remained highly significant, suggesting that each captures additional information.We derived a principal component analysis (PCA) linear combination 13 incorporating both qPCR and adjusted TelSeq (Fig. 1b and Extended Data Fig. 4).Use of the first principal component, PC1, demonstrated a significant (P < 1 × 10 −16 , linear regression, two-sided unadjusted) performance gain in predicting age compared with models employing either of the individual measures (Supplementary Fig. 6).
We first sought to determine common variants (minor allele frequency (MAF) > 0.1%) associated with telomere length, focusing on 438,351 non-Finnish European (NFE) broad genetic ancestry individuals with array-based imputed genotypes available (Supplementary Table 1 and Extended Data Fig. 1).Using REGENIE 14 , we performed a common variant GWAS of telomere length estimates derived from qPCR, WGS, PC1 or PC2 (Fig. 1c and Methods) replicating all signals from Codd et al. 8 (Supplementary Note 3).Linkage disequilibrium (LD)-score regression 15 revealed that the PC1 vector had the highest heritability (h 2 = 0.099, s.e.m. ± 0.010; Supplementary Table 4), suggesting that the combined telomere length metric explains more telomere length variance resulting from genetic variation than either qPCR or TelSeq alone.

Rare-variant gene-level collapsing analysis
We performed gene-level collapsing analyses to identify genes associated with telomere length through the aggregated presence of variants too rare and thus underpowered to be individually discovered in ExWAS analyses.We employed ten qualifying variant (QV) models 16 (Supplementary Table 9), and association statistics were well calibrated with a median λ GC = 1.12 (Supplementary Fig. 11).After filtering putative somatic signals, we identified 20 genes significantly (P ≤ 1 × 10 −8 ) associated with PC1 telomere length, 2 (10%) of which were uniquely identified in PC1 and not the individual qPCR or TelSeq statistics (Fig. 2b, Supplementary Table 10 and Extended Data Fig. 6).
Three genes significantly associated with telomere lengths in the rare PTV collapsing model have not been previously described in the context of telomere length biology.Of the two associated with longer telomere length, G3BP1 (P = 1.2 × 10 −9 , β = 0.85 (0.57-1.12)), encodes an RNA-binding protein involved in RNA metabolism regulation and stress association for a gene over all qualifying variant models (Supplementary Table 9).Associations driven by putative somatic variants are excluded.Colors represent the qualifying variant model used in collapsing analysis.Error bars represent 95% CIs.For both plots N = 436,410 independent samples.

Causal associations between the proteome and telomere length
We integrated protein quantitative trait locus (pQTL) data from the UKB Pharma Proteomics Project that examined genetic associations across approximately 3,000 plasma proteins 44 with our telomere length PC1 genetic associations.Across all PC1 GWAS significant loci, we identified 2,905 overlapping pQTLs (P < 1.7 × 10 −11 ) (Supplementary Table 11).We used coloc 45 to assess each of these and found strong evidence for a shared causal variant modulating both telomere length and plasma protein abundance at 266 (9%) of these overlaps.Of these, 10 were colocalizations in cis and 256 were in trans.For the cis signals we used pQTLs as instruments in a Mendelian randomization (MR) analysis (Methods) to assess whether plasma proteome abundance might be causally related to telomere length.We found evidence for a causal interrelationship across nine protein assays and telomere lengths after multiple testing correction (Supplementary Table 12 and Supplementary Fig. 13), including some well-established, telomere-related proteins (for example, TK1, CDA and PARP1).For TK1, SPRED2 and BCL2L15, MR-Egger analysis highlighted the potential presence of pleiotropy, which might invalidate MR assumptions.One protein, RPA2, binds single-stranded DNA to protect from instability and breakage and recently has been shown to be involved in telomere maintenance 46 .The remaining associations were previously unreported and warrant future functional studies to elucidate the mechanism by which they mediate telomere length.Of the trans colocalizing proteins, 183 of 256 (71%) were found in the 12q24.12locus containing SH2B3, which is known to be highly pleiotropic.Of the remaining trans colocalizing protein assay associations, six exhibited colocalization with more than one locus (Supplementary Fig. 14).These included FLT3LG for which trans pQTL signals colocalized with variants in ATM, TERT and SETBP1 loci.

Causal gene prioritization
To prioritize putative causal genes in GWAS loci, we generated a list of 7,334 protein-coding genes overlapping a telomere-length PC1 locus and annotated this gene set with data integrated from seven separate sources (Supplementary Methods).Assuming equal weighting across all seven prioritization categories, we computed a simple sum to prioritize genes within each PC1 GWAS locus.Of the 7,334 protein-coding genes considered, 404 had a prioritization score >0 and a single gene was prioritized in 94 of the 192 PC1 telomere-length GWAS loci (Supplementary Tables 13 and 14).We found that these prioritized genes were more enriched (P = 4.12 × 10 −15 ) for the reactome pathway 'extension of telomeres' (R-HAS-180786) compared with 50 gene sets of the same size derived from randomly sampled closest genes (P median = 5.6 × 10 −5 ) (Supplementary Fig. 15 and Supplementary Table 15).

Multiancestry rare-variant analysis
Inclusion of individuals of non-European ancestries is critical for health equity and bolstering gene discovery 48,49 .Therefore, we performed additional GWAS, ExWAS and collapsing analysis on PC1 in five additional UKB genetic ancestry groups (admixed American/Hispanic (AMR), East Asian (EAS), South Asian (SAS), Ashkenazi Jewish (ASJ) and African (AFR); Supplementary Table 1).The broad genetic ancestry GWAS revealed a single locus in the AFR cohort that was not detected in the NFE cohort analyses (rs7577687, P AFR = 4.26 × 10 −8 , β AFR = 0.11 (0.07-0.15)) and there were no non-NFE genetic, ancestry-specific rare-variant associations, probably owing to the substantially smaller sample sizes of these populations in the UKB.A fixed-effect meta-analysis was then performed to combine results across ancestral strata, which detected an additional five loci (Supplementary Table 16).For the rare-variant ExWAS and collapsing meta-analysis, no additional study-wide significant genes were identified.However, there was a consistent improvement in observed statistical power, indicating that future cross-ancestry sequencing studies are likely to identify further causal gene telomere length associations (Extended Data Fig. 7).
To investigate these associations further, and particularly to distinguish cause from effect in the context of telomere length measures ascertained from bulk blood, we performed subsequent analyses stratifying by the size of the mutant CH clone (Supplementary Table 19).Specifically, we reasoned that, in individuals with small CH clones (for example, VAF < 5%), most blood leukocytes would derive from wild-type (non-CH) cells and therefore reflect background telomere length.In comparison, in individuals with larger CH clones, average telomere length across blood cells would increasingly reflect telomere length within the mutant CH clone itself.Small clones (for example, VAF 3-5%) were associated with longer telomere length for overall CH (P = 1.05 × 10 −4 , β = 0.09 (0.05-0.14)) and DNMT3A-mutant CH (P = 8.6 × 10 −6 , β = 0.13 (0.08-0.19)), consistent with previous reports that longer telomere length promotes CH acquisition (Fig. 3b and Supplementary Table 19) 20,50 .However, intriguingly, we discovered the inverse association for some other CH drivers, where small clones were associated with shorter telomere length, suggesting that acquisition of certain CH subtypes is promoted by shorter telomeres.A notable example was PPM1D, consistent with reports of high prevalence of PPM1D-mutant CH in individuals with inherited short telomere disorders 54,55 .
Also aligning with previous reports, for CH overall and for most individual CH driver genes, we observed progressive shortening of telomere length with increasing clone size (any P = 1.3 × 10 −14 , β = −0.49(−0.61 to −0.36)), probably reflecting accelerated telomere attrition with cell division in expanding clones (Supplementary Table 19).However, a striking exception to this pattern was observed in SRSF2-mutant CH, in which large clones were unexpectedly associated with longer telomere length (P = 2.2 × 10 −6 , β = 1.36 (0.81-1.91)), suggesting that SRSF2 mutations may mediate telomere elongation in CH.

Discussion
The present study of 462,666 multiancestry individuals presents a comprehensive, technically robust, genetic interrogation of telomere length.Importantly, we discovered that qPCR-and WGS-derived estimates of telomere length capture additional genetic associations with telomere length.Combining these metrics via PCA not only enhanced downstream analyses, but also allowed us to discriminate artefactual signals (that is, associations with PC2).This has important implications for future population-based studies, because it suggests that, where possible, the most robust assessments should leverage both metrics.
Through both common and rare-variant-oriented studies, we described several telomere length loci that give insight into telomere biology.For example, we uncovered antagonistic allelic heterogeneity in ACD and RTEL1, highlighting the complex role for rare variants in telomere homeostasis and their role in disease.Moreover, the disease associations with both shorter and longer telomere length underscore the challenge of therapy development, where perturbation of balanced antagonistic effects might lead to important off-target effects.Integrative analysis of telomere length and proteomic data identified a number of putatively causal relationships, identifying known drug targets (for example, PARP1) and providing additional support for therapeutic modulation of nucleotide metabolism via TK1 and CDA 35 .We also identified a previously undescribed association between PTVs in BRIP1 and G3BP1 with shorter and longer telomere length, respectively.Although BRIP1 is a helicase known to be involved in DNA damage response and G3BP1 is involved in stress granule formation, their role in modulating telomere length is currently unclear and will require functional work in future studies.
Previous studies 11,50 have highlighted a causal, bidirectional relationship between telomere length and CH.In the present study, we uncovered driver gene-specific links between CH and telomere length, providing additional insights into the mechanisms driving clonal expansion.Longer telomeres predispose to DNMT3A-mutant CH, perhaps by extending cellular replicative potential, whereas this is not the case for some other CH driver genes, including PPM1D.It is notable that PPM1D-mutant CH is known to be particularly enriched among individuals with inherited short telomere disorders 54 and in individuals exposed to DNA-damaging chemotherapies that appear to shorten telomeres [56][57][58] .Taken together, we hypothesize that PPM1D mutations are specifically advantageous to blood stem cells in the context of critically short telomeres, perhaps by conferring resistance to the replicative senescence that would ordinarily occur in this setting.
It is also notable that mutations in particular splicing genes, such as SRSF2, have been shown to drive CH exclusively in older individuals 59 , by which time telomeres have naturally shortened with age.The discovery that telomeres in SRSF2-mutant CH do not appear to shorten as clones expand, or even to elongate, contrasts starkly with the accelerated attrition of telomeres with clonal expansion driven by other CH genes.The possibility that SRSF2 mutations confer advantage through telomere modulation offers one explanation for the expansion of these mutant clones specifically in older age; however, further functional studies are required to validate and elucidate the underlying biological mechanisms involved.In summary, our findings support a key role for telomere maintenance in the development of CH, via mechanisms specific to the mutant gene driving clonal expansion.As CH is a causal risk factor for progression to myeloid cancers and for a range of nonhematological diseases, with larger CH clones conferring higher risks 60,61 , therapeutic modulation of telomere biology might be an important focus as strategies for prevention and treatment of CH and its sequelae.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-024-01884-7.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.© The Author(s) 2024 https://doi.org/10.1038/s41588-024-01884-7

Ethics declarations
The protocols for the UKB are overseen by the UK Biobank Ethics Advisory Committee (EAC); for more information see https:// www.ukbiobank.ac.uk/ethics and https://www.ukbiobank.ac.uk/ wp-content/uploads/2011/05/EGF20082.pdf.Informed consent was obtained for all participants.The Northwest Research Ethics Committee reviewed and approved UKB's scientific protocol and operational procedures (REC reference no.06/MRE08/65).Data for the present study were obtained and research conducted under the UKB application license nos.24898 and 68574.

WGS processing, QC and variant calling
WGS data of the UKB participants were generated by deCODE Genetics and the Wellcome Sanger Institute as part of a public-private partnership involving AstraZeneca, Amgen, GlaxoSmithKline, Johnson & Johnson, Wellcome Trust Sanger, UK Research and Innovation and the UKB.Sequencing was carried out in two centers (deCODE facility in Reykjavik, Iceland and the Wellcome Sanger Institute in Cambridge, UK).Genomic DNA underwent paired-end sequencing on Illumina NovaSeq6000 instruments with a read length of 2×151 and an average coverage of 32.5× (refs.62,63).Conversion of sequencing data in BCL format to FASTQ format and the assignments of paired-end sequence reads to samples were based on ten-base barcodes, using bcl2fastq (v.2.19.0).Initial QC was performed by deCODE and Wellcome Sanger, which included sex discordance, contamination, unresolved duplicate sequences and discordance with microarray genotyping data checks.From a total UKB cohort of 503,310 participants, 807 had withdrawn consent before WGS whereas 10,949 had no suitable sample for sequencing.The 50,0101 samples were sequenced as part of the Vanguard phase of the UKB WGS project such that, in total, 492,729 samples from 491,554 individuals were sequenced.After removing replicates, duplicates and an additional 91 individuals who withdrew consent after the sequencing had commenced, a total of 490,397 primary samples were available.
UKB genomes were processed at AstraZeneca using the provided CRAM format files.A custom-built Amazon Web Services cloud compute platform running Illumina DRAGEN Bio-IT Platform Germline Pipeline (v.3.7.8) was used to align the reads to the GRCh38 genome reference and to call small variants.Variants were annotated using SnpEff (v.4.3) 64 against Ensembl (Build 38.92) 65 .

UKB WGS cohort definition
For the remaining 490,348 WGS samples.we used KING (v.2.2.3) 67 to identify individuals with first-degree relatives, which we then randomly pruned such that there were no pairs of samples with a kinship coefficient >0.354, to leave 490,216 (99.93%)WGS samples.We used peddy (0.4.2) 68 and 1000genomes data to classify individuals into broad genetic ancestries (peddy_prob ≥ 0.9) using the gnomAD classifier 69 to subdivide European (EUR) into NFE and ASJ individual broad genetic ancestries.We performed additional QC on NFE broad genetic ancestry samples using peddy-derived PCs, removing samples that fell outside 4 s.d.from the mean over the first four PCs.Next, we removed sex-discordant samples to leave 482,839 (98.4%) samples with TelSeq telomere length estimates (Extended Data Fig. 1).Final cohort sizes stratified by ancestry are described in Supplementary Table 1.

WES
Full details of the WES and subsequent variant calling and annotation of the UKB cohort are described in Wang et al. 16 .Briefly, genomic DNA underwent paired-end 75-bp WES at Regeneron Pharmaceuticals using the IDT xGen v.1 capture kit on the NovaSeq6000 platform.Reads were aligned to GRCh38 and small indels (inserts and deletions) and SNVs called using running Illumina DRAGEN Bio-IT Platform Germline Pipeline v.3.0.7.The resultant catalog of variants was annotated using snpEFF v. 4.3 (ref.64), Ensembl v. 38.92 (ref.65), REVEL 70 and MTR 71 scores.

Estimating telomere length from WGS data
We used TelSeq 10 v.0.0.2 to estimate telomere length using WGS data in the quality-controlled cohort of 482,839 UKB individuals.We used read length (−r) 150 and k-mer size (−k) 10 to match the proportional threshold (40%) for a read to be classified as of telomeric origin, as described in Ding et al. 10 .

Quantitative PCR telomere length estimates
For qPCR telomere length measurements, we used those available through UKB (field ID 22191) derived from baseline samples for a total of 472,518 participants.We used a rank inverse, normal transformed, relative telomer to single copy gene (T:S)-adjusted ratios without further adjustment given the extensive QC already performed on these measurements 72 .In total we identified 9,852 (2%) of the qPCR samples that lacked a matching TelSeq telomere length estimate from downstream analyses.We found that qPCR measurements in this set were significantly longer when compared with samples with both TelSeq and qPCR metrics (Student's t-test P = 8.86 × 10 −42 , two-sided, unadjusted, mean TelSeq&qPCR = 5.0 × 10 −4 , mean qPCR.only= 0.14).

Correcting TelSeq telomere length for technical confounders
We examined the correlation between inverse, normal rank transformed TelSeq-and UKB-adjusted qPCR T:S (UKB showcase field ID 22191) telomere length estimates finding modest agreement (r 2 = 0.16), perhaps indicating the presence of technical confounders.To mitigate this, for TelSeq telomere length estimates that were derived from WGS, we adapted the coverage correction method described in ref. 12.Briefly, we used available Mosdepth 73 coverage files available across 482,839 WGS samples, which given the scale were calculated using a 'quantized' strategy that merges adjacent bases if they fall in the same coverage bin.Overall, four read depth bins were selected ((0-9), (10-19), (20-49)  and (50+)).To compute overall coverage, we assumed that the coverage for a given base was the median of the read depth for that bin.As described in Taub et al. 12 we split the genome (GRCh38) into 1-kb tiles and removed those that overlapped regions with poor mappability, which were blacklisted overlapped known structural variants or were nonautosomal, resulting in 178,120 1-kb bins (approximately 6% of the genome).Then, for each sample we computed the average coverage across each bin.To facilitate downstream computation given the large size of the coverage matrix (that is, 482,839 × 178,120), we investigated the performance of randomly batching samples for coverage adjustment (Supplementary Note 2).This supported a strategy of 24 randomized batches (23 batches of 20,000 and 1 batch of 22,839 participants).For each batch we used a randomized PCA approach implemented in the R package 'rsvd' v. 1.0.5 (ref.74) to estimate the first 300 PCs for each batch.To correct TelSeq telomere length estimates, we then fit a linear model (TelSeq raw ~ PC1-300), where '~' separates response and predictor variables, taking forward the resulting residuals as coverage-corrected TelSeq telomere length estimates.
To assess coverage-corrected TelSeq telomere length estimates we created Bland-Altman plots stratified by sex, ancestry and age using inverse normal transformed metrics for 462,666 participants in whom both metrics were available (Supplementary Figs.2-5).Overall, we observed little bias in telomere length estimates when comparing qPCR and TelSeq methods.We used logistic regression to assess whether outlier status (by difference) was significantly associated with any of these biological metrics.Only AFR genetic ancestry was significantly associated with outlier status (P = 1.16 × 10 −12 , odds ratio https://doi.org/10.1038/s41588-024-01884-7 (OR) = 1.80, two-sided, unadjusted); however, when we added the rare HBB-coding variant carrier status into the model this association was significantly attenuated (P = 2.65 × 10 −5 , OR = 1.23) indicating that this might be driven by the genetic effects reported for qPCR telomere length within the HBB locus.
Finally, we looked for univariate association across 19 WGS sequence metrics (Supplementary Table 2) collected on each sample and both qPCR-and coverage-corrected telomere length estimates, using scaled WGS sequence metrics and inverse normal, rank transformed qPCR and coverage-adjusted telomere length estimates to facilitate comparison.We found that, overall, 14 and 16 WGS metrics were significantly associated with coverage-adjusted TelSeq and qPCR telomere length measurements (Extended Data Fig. 3 and Supplementary Table 3), respectively.Of these, many were highly correlated; however, we noted that a combination of coverage uniformity, total WGS reads and sequencing pipeline captured these, and so were included as covariates in downstream analyses (Supplementary Note 3).We also examined how mosaic loss of X or Y might differentially affect TelSeq and qPCR telomere length estimates, but did not find evidence for systematic differences between the two metrics (Supplementary Note 4).
In total 20,173 (4%) samples with TelSeq WGS telomere length estimates lacked matching qPCR estimates.There was no significant difference between TelSeq telomere length estimates in these samples compared with those with both telomere length measurements.A comparison of TelSeq measurements for this set and the set where both metrics were available did not detect a significant difference (Student's t-test P = 0.17 two-sided unadjusted, mean TelSeq&qPCR = 5.0 × 10 −4 , mean TelSeq.only= 7.0 × 10 −3 ).

Correlation analysis
For the 462,666 samples that had telomere length estimates from both TelSeq and qPCR methods, we calculated the pairwise Pearson's correlation using the R 'cor' function.To assess the contribution and degree of collinearity between TelSeq and qPCR methods we fit the following model linear model using inverse rank, normal transformed age, TelSeq and qPCR (adjusted T:S ratio-UKB field ID 22191) Age ∼ Telomere length TelSeq + Telomere length qPCR + Sex.
We then used the R package olsrr (v.0.5.3) to compute variance inflation factors (VIFs) for each of the predictors, finding a mean VIF of 1.28 that indicated no evidence of collinearity.Overall removing telomere length TelSeq or telomere length qPCR from the model reduced R 2 by 0.01 and 0.02, respectively.

PCA telomere length score
Across all 462,666 individuals with both telomere length measurements, we used the R built-in function 'prcomp' to combine the adjusted TelSeq and adjusted T:S ratio qPCR (UKB field ID 22191) inverse normal, transformed telomere length estimates.Each PCA consisted of two orthogonal principal axes with sample scores that were considered separate telomere length measurements or 'telomere length scores', with PC1 and PC2 explaining 77% and 23% of the variance, respectively (Fig. 1b).Overall PC1 was highly correlated with the standardized mean across TelSeq and qPCR metrics, whereas PC2 was correlated with their difference (Extended Data Fig. 4).
To assess performance for single and combined telomere length metrics we randomly sampled 10,000 participants from the full dataset.We used this training set to fit a simple linear model of a given telomere length metric with age (that is, age ~ telomere length metric ).Then, using the held-out participants, we used the model to predict age and assessed prediction performance as the root mean squared error of the age predictions.To perform crossvalidation and obtain CIs for these performance estimates, we performed this procedure 100×, sampling with replacement (Supplementary Fig. 6).

Defining GWAS loci
To define loci for each phenotype we selected significant variants (P < 5 × 10 −8 ) and created regions ±1 Mb, creating a bespoke region (chr6: 25,500,000-34,000,000) for human leukocyte antigen.We then merged overlapping regions by phenotype, for each resultant region where the most significant variant was selected as the index; in the case of ties the variant closest to the middle of the region was selected.Finally we used the GenomicRanges 75 'reduce' function to combine overlapping regions regardless of phenotype to define a set of nonredundant loci.
To assess a list of previously reported loci, we compiled a list of significant (P < 5 × 10 −8 ) variants from refs.8,11,12 and the GWAS catalog 80 using the 'telomere length' term (EFO_0004505), downloaded on 11 July 2023.We then defined 2-Mb regions centered on each variant.

Single causal variant fine-mapping
For variant fine-mapping, under the single causal variant we selected autosomal variants from NFE GWAS and divided these into approximately independent LD blocks using regions defined in ref. 81.We then used the single-variant fine-mapping 82,83 approach as implemented in https://github.com/ollyburren/rCOGSto assign 95% credible sets.

SuSIE fine-mapping
We used SuSIE 84 to perform fine-mapping of all autosomal telomere length PC1 GWAS loci.Briefly, we selected a reference panel of 10,000 unrelated NFE broad genetic ancestry individuals for LD estimation.For each autosomal PC1 locus, we selected NFE telomere length summary stats and used PLINK to compute LD matrices across reference panel individuals.We then used the susie_rss function in the R package 'susieR' (v.0.12.35) to perform fine-mapping with L = 10, using the https://doi.org/10.1038/s41588-024-01884-7susie_get_cs() function to obtain 95% credible sets (Supplementary Table 13).

ExWAS
We carried out a virtual ExWAS of telomere length using WGS genotypes stratified by the broad genetic ancestry groupings: NFE (n = 439,491), SAS (n = 9,349), AFR (n = 8,162), EAS (n = 2,362), ASJ (n = 1,201) and AMR (675) ancestral groups.Briefly we selected unrelated individuals within each genetic ancestry stratum with telomere length and WGS data using the same method as described in UKB WGS cohort definition above.Finally, we removed individuals with a known hematological malignancy at sampling (N overall = 3,196, NFE = 3,073, SAS = 42, AFR = 44, EAS = 9, AMR = 0).We took forward variants that passed the variant QC as described in Wang et al. 16 which had an MAC > 5. We used a linear model of the form telomere length PC1 ~ genotype + age + sex + age2 + Peddy PC1:4 + SequenceSite to assess the association of genotype with telomere length using the R 'PEACOK' package 16 .
In the present study, genotype was coded as a genotypic (AA = 0, AB = 1, BB = 2), dominant (AA = 0, AB = 1, BB = 1) or recessive model (AA = 0, AB = 0, BB = 1), where A and B are the reference and alternative alleles.For the NFE ancestral group, we assessed 326,846, 326,846 and 62,716 variants for the dominant, genotypic and recessive models, respectively (carrier count ≥5).For the NFE analyses we reported the most significant model-variant pair such that variants P ≤ 1 × 10 −8 for PC1 and P > 1 × 10 −8 for PC2 and MAF < 0.1%.For PC1 associated variants passing QC we reran association analyses for each variant conditional on other significant rare variants within 2 Mb to check for independence.

Collapsing analysis
To assess the contribution of very rare variants we carried out a collapsing burden analysis stratified by broad genetic ancestral groups as per the ExWAS analysis, removing individuals diagnosed with a hematological malignancy at sampling, using the method described in ref. 16.Briefly, we aggregated qualifying variants based within the unit of a gene for each ancestral grouping and used these counts in a linear regression using the R 'PEACOK' (v.1.1.3)package with the same covariates as for the ExWAS.We defined ten qualifying variant tests (Supplementary Table 9) that include a synonymous model as an empirical control.We used the empirical modeling of the null distribution from Wang et al. to define a genome-wide significant threshold of P < 1 × 10 −8 .In total we assessed 18,930 genes across all 10 models.For NFE analyses we report best QV model-gene pair for which P ≤ 1 × 10 −8 for PC1 and P > 1 × 10 −8 for PC2.
To assess the leverage of individual variants on collapsing analysis genome-wide significant hits we employed a LOO analysis.For each gene, and qualifying variant model, we reperformed collapsing analysis, leaving out one variant at a time.In this approach, variants with a large influence on the overall collapsing analysis, when excluded, result in a concomitant change in statistical significance (Supplementary Fig. 12).

Multiancestry meta-analysis
We performed IVW meta-analysis for ExWAS and collapsing across NFE, SAS, AFR, EAS, ASJ and AMR broad genetic ancestry groupings for variants with a carrier count ≥5 within each grouping.In the context of rare variants, IVW can be unstable so we compared IVW meta-analysis P values with those generated from Stouffer's method, weighting each study by the square root of the sample size.We found that both approaches generated similar P values, indicating that IVW in this setting was stable even for rare variants.
For GWAS multiancestral analysis we used REGENIE with the approach described for the NFE ancestry group to generate GWAS summary statistics for SAS, AFR, EAS and AMR cohorts.We used the locus definition approach described earlier to define significant loci for each ancestral strata, considering the PC1 NFE ancestry telomere length loci defined in Supplementary Table 5.For GWAS, we used METAL 85 to perform IVW meta-analyses across all ancestry strata.We selected significant variants (P meta < 5 × 10 −8 ), removing those that were present in a single broad genetic ancestry, using these to define loci and index variants as described earlier and assessing these for overlap with NFE loci.
For cis-colocalizing signals (n = 10) where there was strong evidence for a shared causal variant between protein abundance and telomere length, we performed MR as implemented in the R package 'MendelianRandomization' (v.0.9.0) 86 .Briefly, For all variants (MAF > 1%) in a locus we performed clumping using PLINK 78 with a reference sample of 10,000 randomly sampled, unrelated NFE ancestry UKB samples (----indep-pairwise 100 kb 1 0.3), taking forward these pruned variants as instrumental pQTL variables.For MR, we used PLINK to compute correlation matrices for pruned variants at each locus.We then used the 'mr_allmethods' function to assess support for whether pQTL instruments were causally associated with telomere length across 'simple median', 'weighted median', 'IVW' and 'MR-Egger regression' methods.We took the median, across all four methods, using a multiple corrected P value (P < 0.005) as indicative of a putative causal relationship.Finally, we flagged results where the MR-Egger intercept term deviated from 0, indicating the presence of horizontal pleiotropy, which might invalidate underlying MR assumptions.

CH analysis
To detect putative CH, we used the pipeline described in ref. 52.Briefly, using the same GRCh38 genome reference-aligned reads as for WES germline variant calling, we ran somatic variant calling with GATK's Mutect2 (v.4.2.2.0).After QC we focused on a set of 15 genes (Supplementary Table 17) exhibiting age-dependent prevalence for further analyses, including only PASS variant calls with 0.03 ≤ VAF ≤ 0.4 and allelic depth ≥3 across an annotated set of variants.
For the analysis, we considered four different VAF cutoffs (3-5%, >5-10%, >10-20% and >20%; Supplementary Table 17) across NFE ancestry individuals.In total, after excluding 3,585 individuals diagnosed with either a hematological malignancy predating sample collection or a lymphocyte count >5 × 10 9 cells per liter, we took forward 435,525 individuals for analysis.For overall CH driver subtype association (as shown in Fig. 1a), we fit a linear model telomere length PC1 ~ (CH VAF>0.03+ age + sex + age)/(sex + age2 + ancestry PC1:4 + ever-smoked + pack-years), where telomere length PC1 represents the PC1 telomere length estimate and CH the carrier status for a particular CH driver subtype with VAF > 3%.We then repeated this analysis stratifying by nonoverlapping VAF cutoffs for each CH driver subtype.Finally, to get an overall association statistic between telomere length and VAF stratified by CH driver subtype, we repeated this analysis recoding each CH driver gene carrier status by VAF as an ordinal variable.

Fig. 1 |
Fig. 1 | Combining telomere length metrics improves genetic discovery.a, Correlation between inverse normal transformed qPCR and WGS TelSeq telomere length metrics.The orange dashed line indicates a linear model line of best fit.b, Biplot for PCA of qPCR and TelSeq telomere length metrics.

Fig. 2 |
Fig.2| Rare-variant analysis of telomere length.a, ExWAS analysis of PC1 telomere length in the NFE broad genetic ancestry group, showing only rare germline variants that are significant (P ≤ 1 × 10 −8 ) for PC1 and not PC2.For clarity the variant with the largest effect for a gene is labeled, variants with opposing effect size in the same gene are starred and triangles indicate HGMD pathogenic variants.P values (two-sided, unadjusted) were calculated from fitting a linear

Fig. 3 |
Fig. 3 | Associations between telomere length and CH.a, Collapsing analysis of somatic variants in select CH genes with telomere length PC1 metric.b, Collapsing analysis of somatic variants in CH genes stratified by VAF intervals (colors).Associations not reaching significance are shown with dashed error

Extended Data Fig. 1 | 7 −. 2 |. 3 |. 4 |Extended Data Fig. 5 |. 6 |
Flowchart of sample QC and analyses.Abbreviations; WGS = Whole genome sequencing, Dx = Diagnosis, Broad genetic ancestry groupings -AFR = African, AMR = Admixed American/Hispanic ASJ = Ashkenazi Jewish, EAS = East Asian, NFE = Non-Finnish European, SAS = South Asian.https://doi.org/10.1038/s41588-024-01884-Age,ancestry, and sex relationships with TelSeq & qPCR telomere length measurements.For each panel y-axes denote telomere length residuals after regressing out age, sex, or ancestry depending on the x-axis variable.In all panels N for qPCR and TelSeq + Coverage is 462,666 and 482,839 independent UKB participants respectively.For each boxplot the centre is the median, the lower and upper hinges indicate the 25th and 75th percentile and outliers are represented as individual points.(A) Boxplot of age by telomere length residuals.(B) Boxplot for broad genetic ancestry group (AFR = African, AMR = Admixed American/Hispanic ASJ = Ashkenazi Jewish, EAS = East Asian, NFE = Non-Finnish European, SAS = South Asian) by telomere length residuals.(C) Boxplot for sex by telomere length residuals.https://doi.org/10.1038/s41588-024-01884-7li g n e d _ c o v X _ c o v Y _ ra ti o c c d s _ r2 2 _ a li g n n e d _ c o v _ c Association of whole genome sequencing technical variables with qPCR and coverage adjusted TL metrics.(A) Forest plot of Bonferroni significant associations (P < 1 x 10-3) from a univariate linear regression of technical variables (two-sided) with either qPCR (coral) or inverse rank normal transformed TelSeq coverage adjusted (azure) telomere lengths (n = 462,666 independent samples).All variables have been standardised to facilitate effect size comparison on telomere length (x-axis), 95% confidence intervals are shown.A full table of all results with descriptions is available as Supplementary Table 3.Sequencing pipeline (deCODE, WSI, and WSI_vanguard (baseline)) and sex are treated as categorical variables.(B) Pearson correlation heatmap of significantly associated WGS technical variables.Variable order is derived from hierarchical (complete linkage) clustering of the full correlation matrix.Age and sex are included as biological variables with known associations with telomere length for comparison.https://doi.org/10.1038/s41588-024-01884-7Comparison of PC1 and PC2 rotations.Density plots of mean qPCR and TelSeq transformed telomere length estimates vs PC1 rotation values (A), mean qPCR and TelSeq transformed telomere length estimates vs PC2 rotation values (B), difference between qPCR and TelSeq transformed telomere length estimates vs PC1 rotation values (C), and difference between qPCR and TelSeq transformed telomere length estimates vs PC2 rotation values (D).Dotted lines indicate x = y (Top) and x = -y (Bottom) and are included for reference.https://doi.org/10.1038/s41588-024-01884-7Codd et al.Telomere Length Effect Size (SD) Telomere Length Effect Size (SD) Comparison of GWAS effect sizes with Codd et al. (y-axis) for different TL measurements effect sizes with EUR only effect sizes from Codd et al. (x-axis) and NFE from this study (P < 5 ×10 −8 ).P values are derived from linear regression and are two sided and unadjusted.Crosses indicate 95% confidence intervals for each estimated effect size; Pearson's correlation coefficients are labelled on each panel; blue dotted line shows equivalence (x = y).https://doi.org/10.1038/s41588-024-01884-7Heatmap of genome-wide significant telomere length associated genes from gene collapsing analyses.Shading indicates effect size (green = unit increased telomere length, purple = unit decreased telomere length), points indicate genome-wide significance (P ≤ 1x10 −8 ).The x-axis indicates the different qualifying variant models implemented which are described fully in Wang et al.Briefly, ptv= rare protein truncating variants, UR = ultra rare variants, URmtr = ultra rare variants in missense intolerant regions (MTR), raredmg = rare damaging (REVEL) variants, raredmgmtr = as raredmg but with additional MTR filter, flexdmg = flexible non-synonymous, flexnonsynmtr = as flexdmg but with additional MRT filter, ptvraredmg = Union of ptv and raredmg models, rec = recessive model, syn = synonymous variants (negative control).P values are derived from linear regression and are two sided and unadjusted.

Table 1 | Rare variants in ACD modulating telomere length
-sided, unadjusted) are derived from a linear model using the PC1 telomere length metric across 436,410 independent participants of broad NFE genetic ancestry.Effect and 95% CIs are on the unit scale.a Protein coordinates with respect to UniProt (Q96AP0) canonical transcript ENST00000620761.6.b Also detected through our GWAS.