Large-scale cross-cancer fine-mapping of the 5p15.33 region reveals multiple independent signals

Summary Genome-wide association studies (GWASs) have identified thousands of cancer risk loci revealing many risk regions shared across multiple cancers. Characterizing the cross-cancer shared genetic basis can increase our understanding of global mechanisms of cancer development. In this study, we collected GWAS summary statistics based on up to 375,468 cancer cases and 530,521 controls for fourteen types of cancer, including breast (overall, estrogen receptor [ER]-positive, and ER-negative), colorectal, endometrial, esophageal, glioma, head/neck, lung, melanoma, ovarian, pancreatic, prostate, and renal cancer, to characterize the shared genetic basis of cancer risk. We identified thirteen pairs of cancers with statistically significant local genetic correlations across eight distinct genomic regions. Specifically, the 5p15.33 region, harboring the TERT and CLPTM1L genes, showed statistically significant local genetic correlations for multiple cancer pairs. We conducted a cross-cancer fine-mapping of the 5p15.33 region based on eight cancers that showed genome-wide significant associations in this region (ER-negative breast, colorectal, glioma, lung, melanoma, ovarian, pancreatic, and prostate cancer). We used an iterative analysis pipeline implementing a subset-based meta-analysis approach based on cancer-specific conditional analyses and identified ten independent cross-cancer associations within this region. For each signal, we conducted cross-cancer fine-mapping to prioritize the most plausible causal variants. Our findings provide a more in-depth understanding of the shared inherited basis across human cancers and expand our knowledge of the 5p15.33 region in carcinogenesis.


Introduction
Cancer is a major global public health problem. More than 19.3 million new cancer cases and 10 million cancer deaths were estimated to occur worldwide in 2020. 1 In the United States, approximately 1.9 million individuals are projected to be newly diagnosed with cancer, and more than 600,000 affected individuals are projected to die of cancer in 2021. 2 Inherited genetic variants, along with environmental exposures, contribute substantially to the pathogenesis of cancers. Cancers tend to cluster in families, and twin studies have reported cancer-specific heritability ranging from 9% (head/neck) to 58% (melanoma). 3,4 Genome-wide association studies (GWASs) of specific types of cancer have identified genetic loci significantly associated with susceptibility to malignancies. In a recent study of 18 types of cancer in European ancestry populations, 5 the authors identified 17 genome-wide significant variants that were associated with the risk of at least two cancers with the same direction of effect. The 8q24 region has been long recognized as a pleiotropic locus, where genetic variants have been associated with the risk of breast, colorectal, endometrial, glioma, ovarian, pancreatic, and prostate cancer, among others. [6][7][8][9][10][11][12][13][14][15][16] The 5p15.33 region has been associated with more than ten types of cancer, with multiple independent risk alleles identified. [17][18][19][20][21][22][23][24] Various biological mechanisms, including inflammation, epigenetics, gene expression, and telomere structure, have been proposed to explain these identified pleiotropic associations. For example, the 5p15.33 region harbors the TERT gene, which encodes the catalytic subunit of telomerase, 25 as well as the CLPTM1L gene, which encodes the cleft lip and palate-associated transmembrane-1 like protein. 26 In addition, recent efforts have been devoted toward estimating the genetic correlation between pairs of can-cers. Using the restricted maximum likelihood (REML) approach implemented in the GCTA tool, 27 one study quantified the pairwise genetic correlation among 13 types of cancers in European ancestry populations. 28 Four pairs of cancers, including bladder-lung, testis-renal, lymphoma-osteosarcoma, and lymphoma-leukemia, demonstrated statistically significant shared heritability. We have previously applied linkage disequilibrium (LD) score regression 29,30 on cancer GWAS summary statistics and observed significant genetic correlations between multiple solid tumor pairs, including colorectal-lung, colorectalpancreatic, breast-colorectal, breast-lung, breast-ovarian, and lung-head/neck cancer. 31,32 However, these studies only quantified the pairwise genetic correlation on a genome-wide scale, ignoring variations in the local genetic correlation across the genome. As shared heritability between cancers may not be uniformly distributed across the genome, such limitation may lead to missed opportunities to discover specific regions with crucial contribution to the oncogenesis of multiple cancers. 33 In the present study, we collected European ancestryderived GWAS summary statistics from large-scale metaanalysis results for 14 types of cancer, based on a total number of 375,468 cancer cases and 530,521 controls. By partitioning the genome into 1,703 blocks based on the LD pattern in the 1000 Genomes (1000G) European ancestry populations, 34 we systematically estimated pairwise local genetic correlations between cancers. After adjusting for multiple comparisons, we identified thirteen pairs of cancers with statistically significant local genetic correlations across eight distinct genomic regions. Among these, a 1.2 Mb region at 5p15.33, harboring the TERT and CLPTM1L genes, showed significant local genetic correlations across six pairs of cancers, including breast (overall and estrogen receptor [ER]-negative), colorectal, glioma, lung, melanoma, pancreatic, and prostate cancer. We then utilized an iterative analysis pipeline implementing a subset-based meta-analysis approach (Association Analysis for SubSETs [ASSET]) 35 and a conditional analysis tool (COndition and JOint analysis tool implemented in the Genome-wide Complex Trait Analysis software, COJO-GCTA) 36 and identified ten independent cross-cancer signals within the 1.2 Mb region. For each independent signal, we conducted multi-cancer fine-mapping analysis using PAINTOR 37 to prioritize the variants with the highest posterior probability of being causal. Our study provides novel evidence of shared genetic susceptibility across cancer types and contributes crucial information toward understanding the common genetic mechanisms of carcinogenesis.

Study sample and genotype quality control
We collected the meta-analysis results from a total of 14 cancer GWASs: breast (overall, ER-positive, and ER-negative), 38 colorectal, 39 endometrial, 16 esophageal, 40 glioma, 41 head/neck, 42 lung, 43 melanoma, 44 ovarian, 45 pancreatic, 46 prostate, 47 and renal cancer. 48 Sample size for each cancer is listed in Table 1. The GWAS summary statistics for each cancer were provided by the corre-sponding collaborative consortia. Details on study characteristics and subjects contributing to each cancer-specific GWAS summary dataset have been described in the original cancer-specific publications. All the GWAS results used in this study were based on European ancestry populations. Genomic positions were based on Genome Reference Consortium GRCh37 (hg19).
Individual cancer GWASs were primarily imputed to the 1000G reference panels. 49 Breast, ovarian, pancreatic, and prostate cancer used the 1000G phase 3 v.5 reference panel; colorectal cancer used the Haplotype Reference Consortium (HRC); head/neck cancer used HRC; renal cancer used 1000G phase 1 v.3; metanalysis results for melanoma GWASs were based on studies majorly imputed with 1000G phase 1 v.3; 44 lung cancer used a mix between 1000G phase 1 and hase 3; glioma used a mix between 1000G phase 3, UK10K, and HRC; esophageal cancer used 1000G phase 1; and endometrial cancer used a mix between 1000G phase 3 v.5 and UK10K. For each dataset, we conducted comprehensive quality control to clean and harmonize the GWAS summary statistics across cancers. This included: (1) removing duplicate, structural, multi-allelic, and ambiguous variants; (2) confirming that strand and alleles at each variant were consistent across cancers; (3) creating a common unique marker ID; and (4) removing analytic artifacts (e.g., common variants with reported |log odds ratio| > 3). We also removed any variants with imputation quality score < 0.3 or minor allele frequency (MAF) < 0.01. After manual inspection of the results, we conducted additional ad hoc cleaning for individual cancer results to remove any obvious technical artifacts.

Genetic correlations due to sample overlap
We estimated the number of controls overlapping between pairs of cancers, as these would induce a correlation in the GWAS summary statistics between cancers. We identified participating studies and any publicly available datasets (e.g., Wellcome Trust Case Control Consortium) to calculate the maximum number of controls overlapping between any two cancers. We also employed the tetrachoric correlation between binary-transformed GWAS summary Z scores to determine putative sample overlap. 50,51 To avoid induced correlations due to a shared polygenic architecture, we removed all cancer-specific variants with association p < 0.1. We observed six pairs of cancers that had correlations > 0.05, and these all reflected previously known documented relationships where controls were shared between groups (Table S1). Pairs with correlations > 0.05 included breast and endometrial (0.08), ER-positive breast and endometrial (0.06), breast and ovarian (0.05), esophageal and melanoma (0.08), lung and head/neck (0.07), and lung and renal cancer (0.07).

Local genetic correlation estimation
To identify regions in the genome with local genetic correlations between pairs of cancers, we used rHESS 52 (Heritability Estimation using Summary Statistics), which first estimates the local SNP-heritability for each cancer within each region based on summary statistics 53 and then quantifies the covariance and correlation between pairs of cancers. Based on the LD pattern in 1000G European ancestry populations, 34 rHESS partitions the genome into 1,703 approximatively independent LD blocks. We took sample overlap between pairs of cancers into account as described above. Pairwise local genetic correlations were considered statistically significant if the p value < 0.05/1,703 ¼ 2.94 3 10 À5 .

Searching for independent signals shared across cancers
Based on the local genetic correlation results, we identified a 1.2 Mb region at 5p15.33 (hg19 coordinates: 82,252-2,132,442 bp) harboring significant local heritability for multiple cancer pairs (see Results). We selected eight types of cancers (ER-negative breast, colorectal, glioma, lung, melanoma, ovarian, pancreatic, prostate) that had genome-wide significant associations in the region and showed evidence of pairwise genetic correlation (p < 0.05) with at least one other cancer having genome-wide significant associations in this region, which includes the TERT and CLPTM1L genes. We first performed a conditional analysis using COJO-GCTA 36 on each individual cancer adjusting for the variant with the smallest p value, until no variant had a conditional p < 5 3 10 À8 . We then performed pairwise colocalization analyses using COLOC 54 to assess if any cancers shared causal variants, after controlling for the independent signals identified in analysis of individual cancers.
To comprehensively enumerate the independent cross-cancer signals within this locus, we then used an agnostic subset-based meta-analysis (ASSET) 35 to identify variants with the strongest cross-cancer associations in this region ( Figure 1). ASSET allows for opposite direction of effects across traits when assessing the association between variants and multiple traits, as implemented in the ''two-sided'' option in ASSET. Overlap in controls between GWASs was addressed by using the tetrachoric correlation, as described above. To determine the number of independent signals within a region, we reran all individual cancer GWASs conditioning on the top variant identified by ASSET using COJO-GCTA. The conditional analysis may be subject to the mismatch of LD between the reference panel and the population that generated the GWAS results. Consequently, we created a LD reference panel for all cancer-specific conditional analyses using European ancestry breast cancer controls (n ¼ 40,401), 38 which was the largest population with genotype data available. After generating updated cancer-specific GWAS summary statistics conditioned on the most significant variant (top variant), we reran the two-sided ASSET meta-analysis to identify any additional significant cross-cancer signals. We then added the new top variant from the ASSET analysis to the list of lead SNPs and reran all cancer GWASs conditioning on all lead variants using COJO-GCTA. We iteratively ran cancer-specific analyses conditioning on the identified top variants using COJO-GCTA and ran two-sided ASSET on the resulting cancer-specific association results. We repeated this procedure until no variant reached genomewide significance in the two-sided ASSET meta-analysis. The lead variants that resulted from the two-sided ASSET meta-analyses based on the conditional cancer-specific results were regarded as candidate variants that independently affect the risk of multiple cancers. Using this approach, we identified a total of ten independent signals within the 5p15.33 region.

Multi-trait fine-mapping
For each of the ten cross-cancer signals in the 5p15.33 region identified by our ASSET-COJO analysis, we created new cancer-specific GWAS summary statistics adjusting for the other nine top variants and estimated variant-specific posterior probabilities of causality using PAINTOR v.3.0. 37 We varied the set of cancers included in the fine-mapping analyses of each of the ten independent crosscancer signals as we hypothesized that not all cancers would share the same causal variant for each independent signal, but, rather, different combinations of cancers contributed to each of the ten independent signals. This was also supported by the ASSET analyses, where not all cancers contributed to the top signal for each of the ten conditional meta-analyses. In particular, ASSET provides the subset of traits that contribute to the smallest variant-specific meta-analysis p value. For each variant, two subsets are reported, with the first including traits with a positive association and the Regions with significant pairwise local genetic correlation were first identified by rHESS. For regions harboring disproportionally high shared heritability across cancers, joint test of ASSET twosided meta-analysis and COJO conditional analysis was then repeatedly conducted to identify independent signals, until no variant reached genome-wide significance (p < 5 3 10 À8 ) in twosided ASSET meta-analysis. For each signal, GWAS summary statistics conditional on other signals of selected cancer were used in multi-trait fine-mapping to estimate the posterior probability of being causal.
second including traits with a negative association. For each of the ten independent signals, we included a specific cancer in the PAIN-TOR fine-mapping analysis if: (1) it was one of the cancers selected by ASSET as a contributing phenotype in the corresponding twosided ASSET analysis of the lead variant, or (2) the lead variant showed genome-wide significant association for that cancer in the unadjusted cancer-specific GWAS. For each independent signal, only SNPs with data for all relevant cancers were included. We ran PAINTOR under the assumption that there was only one causal variant underlying that signal. We used the same LD reference panel for the fine-mapping analysis as we did for the conditional analysis. In our primary analyses, we performed the fine-mapping with no functional annotation implemented. Since regulation of TERT and CLPTM1L expression has been linked to open chromatin conformation in previous analyses, 55,56 we conducted a secondary analysis incorporating tissue-specific open chromatin annotations as functional prior. We obtained open chromatin narrow peaks identified from normal tissue or primary cell lines of the relevant organs of each signal, based on the ENCODE project. 57 By overlapping variants with open chromatin peaks, we generated a binary matrix for the region, which was then implemented as the functional prior in the fine-mapping analysis.

Results
Local genetic correlation revealed specific regions in the genome with shared heritability across cancers We first partitioned the genome into 1,703 regions and estimated the pairwise local genetic correlation between fourteen types of cancers. After adjusting for multiple comparisons (p value < 0.05/1,703 ¼ 2.94 3 10 À5 ), we identified thirteen pairs of cancers with statistically significant local genetic correlation across eight distinct genomic regions (  Figure 2). Interestingly, the direction of the genetic correlations varied between cancer pairs. For example, glioma showed significant but opposite local genetic correlations with ER-negative breast (r g ¼ 0.0014, p ¼ 2.40 3 10 À5 ) and colorectal cancer (r g ¼ À0.0015, p ¼ 1.24 3 10 À5 ). Similarly, pancreatic cancer had a positive local genetic correlation with melanoma (r g ¼ 0.0034, p ¼ 4.85 3 10 À6 ) but a negative genetic correlation with lung cancer (r g ¼ À0.0025, p ¼ 1.39 3 10 À57 ). Distinct patterns of regional GWAS association p values for the variants at 5p15. 33 Based on the local genetic correlation results, the 5p15.33 region may harbor key genetic variants related to multiple cancer types. Indeed, multiple susceptibility variants in this region have been reported for at least ten cancer types, including ER-negative breast, colorectal, glioma, lung, melanoma, ovarian, pancreatic, and prostate cancer. To obtain a more complete understanding of the association patterns in this region, we created cancer-specific regional association plots for 5p15.33 ( Figure 3A). We observed three different patterns of association. Pattern A, which includes breast (overall, ER-positive, and ER-negative), colorectal, glioma, ovarian, and prostate cancer, displayed one sharp genome-wide significant signal in a narrow region (~30 kb) overlapping the TERT gene (chr5: 1,253,282-1,295,178 bp). Pattern B, which includes lung, melanoma, and pancreatic cancer, has a broader genome-wide significant signal overlapping both the TERT (chr5: 1,253,282-1,295,178 bp) and CLPTM1L genes (chr5: 1,317,869-1,345,180 bp) ( Figure 3B). Pattern C, which includes endometrial, esophageal, head/neck, and renal cancer, did not have a genome-wide significant signal in this region ( Figure 3C). Interestingly, the distribution of variant-specific associations for some cancers was highly similar but in the opposite direction ( Figure 3D), suggesting that GWAS associations discovered in this region may underly tissue-specific regulations across cancers. The associationbased classification of cancers was highly consistent with our local genetic correlation results. All cancer types showing shared significant local genetic correlation in this region were in either pattern A or B, and thus we excluded the cancers belonging to pattern C for further analyses. For breast cancer, we limited our analysis to ERnegative breast cancer, as it had the strongest association at 5p15.33. Along with colorectal, glioma, lung, melanoma, ovarian, pancreatic, and prostate cancer, a total number of eight cancer types were used in the fine-mapping cross-cancer analyses.
Ten independent signals were identified based on multicancer meta-analysis results Given the important biological function of the TERT and CLPTM1L genes, previous cancer fine-mapping efforts in this region, and the appearance of multiple association peaks for some of the cancers, it is plausible to assume that multiple variants in this region affect cancer risk independently. To test this assumption, we performed a conditional analysis using COJO-GCTA for each cancer to enumerate the independent signals at the 5p15.33 region. Six of the eight cancers of interest, including ER-negative breast, colorectal, glioma, lung, pancreatic, and prostate, were identified with two or more independent variants (Table S2). A total number of thirteen variants were identified, of which four were shared by two cancer types. By using conditional analysis results of each cancer, we then assessed the probability of two cancers sharing a single causal variant using a Bayesian-based colocalization approach. 54 Glioma and melanoma were estimated to be likely sharing a causal variant (posterior probability [PP] ¼ 0.519; Table S3), even after controlling for the effect of identified signals of individual cancers. These results  suggest that multiple independent cross-cancer signals may exist in this region. However, current state-of-the-art statistical fine-mapping tools often struggle to make inference of causality under the assumption of multiple causal variants. Further, it is likely that not all cancers share all causal variants. To get an estimate of the number of independent association signals across cancers in this region, we conducted iterative meta-analyses using individual cancer-specific association results from conditional analysis as generated by COJO-GCTA (see Material and methods). We adopted the twosided analysis scheme in ASSET to allow for the detection of effects in opposite directions.
The strongest associated variant in the two-sided ASSET meta-analysis was rs10069690 (chr5: 1,279,790, p ¼ 4.05 3 10 À126 ; Figure 4), which was positively associated with ER-negative breast cancer and glioma while negatively associated with pancreatic and prostate cancer. We adjusted the cancer-specific GWAS results for rs10069690 using COJO-GCTA, and then reran the two-sided ASSET metaanalysis with the rs10069690-adjusted cancer-specific results. We observed the strongest association for rs465498 (chr5: 1,325,803, p ¼ 1.75 3 10 À59 ), which was positively associated with melanoma and pancreatic cancer and negatively associated with lung cancer. We added rs465498 to the set of variants to be conditioned on in the cancer-specific GWASs and iterated this process until no variant reached genome-wide significance (p < 5 3 10 À8 ) in twosided ASSET meta-analysis. In the end, we obtained ten conditionally independent significant SNPs (Table 3; Figure S1). The pairwise r 2 between the ten SNPs ranged between 0.001 and 0.294 as based on 1000G European ancestry data, 58 which indicated that the pairwise correlations between the identified signals were weak ( Figure 5).
For each of the ten independent signals, the number of cancer types contributing to the association as identified by ASSET ranged from two to eight. Although SNPs rs10069690 and rs7705526 were both genome-wide significant variants for ovarian cancer (p ¼ 1.74 3 10 À8 and 1.34 3 10 À9 , respectively), ASSET did not include ovarian cancer as a contributing cancer to the meta-analysis results for either of the SNPs. To ensure that we included all relevant cancers in the fine-mapping analysis of each independent signal, we extracted the original cancer GWAS results for the ten independent SNPs and manually added any cancers to the list of contributing cancers if that cancer showed a genome-wide significant association with a specific SNP but was not included on the list of traits optimizing the ASSET meta-analysis. For each of the ten SNPs, we then applied COJO-GCTA on each included cancer GWAS dataset to obtain cancer-specific results conditioned on the other nine lead SNPs and used these adjusted summary statistics in the fine-mapping analyses.
Cross-cancer fine-mapping proposes candidate causal variants shared by cancers To identify candidate causal SNPs within the ten independent signals identified in the conditional analyses, we  conducted a multi-cancer fine-mapping analysis using PAINTOR for each signal. We first performed fine-mapping analyses with no functional annotation data implemented. For the ten candidate signals, the size of credible sets comprising a cumulative 95% PP of causality ranged from one to fifteen variants (Table 4; Figure S2). All SNPs identified as lead SNPs in the conditional analysis were included in the 95% PP credible set of the corresponding fine-mapping analysis, with six of them having the highest PP in its set (rs35033501: PP ¼ 0.  To assess the impact of adding a priori information on functional importance, we downloaded tissue-specific open chromatin narrow peaks of normal tissues or primary cell lines for the relevant organs for each signal from the ENCODE project ( Figure 6; Table S4). By overlapping the functional annotations with the variants of interest, we repeated the fine-mapping analysis for all the candidate signals (Table 4; Figure S2). Seven of the ten candidate signals showed consistent 95% PP credible sets as the previous fine-mapping analyses without functional annotations. However, fine-mapping analysis of rs35033501 (ER-negative breast, lung, melanoma, pancreatic, and prostate cancer) prioritized rs71595003, residing in an open chromatin peak for breast epithelial tissue, with a PP of 0.999. In contrast, rs35033501, which had a PP of 0.875 in the analysis without annotations, had a PP < 0.001 when information about open chromatin was added. For the fine-mapping analyses of rs11414507 (ER-negative breast, prostate), the size of the 95% PP credible set shrank from four to two, which included the index SNP rs11414507 (PP ¼ 0.42) as well as rs7712562 (PP ¼ 0.58). Both rs11414507 and rs7712562 were located in open chromatin peaks in breast epithelial tissue. Similarly, after we implemented the functional annotation data, only two SNPs were included in the 95% PP credible set of the signal indexed by rs465498 (lung, melanoma, pancreatic), compared to eight SNPs in the analysis without functional information. The index SNP rs465498 had a comparable PP (0.437) as rs421629 (PP ¼ 0.563), and both SNPs were located within open chromatin peaks in lung tissue.

Discussion
In this study, we leveraged GWAS summary statistics from 14 cancer types to estimate local genetic correlations and conduct follow-up fine-mapping of shared cancer regions in the genome. By partitioning the genome into independent blocks as defined by LD, we comprehensively estimated pairwise local genetic correlations between the included cancers. We identified 13 cancer pairs with significant local genetic correlation across eight distinct Table 4. Statistical fine-mapping prioritized the potential causal SNP within 10 independent cross-cancer signals in 5p15. 33  SNPs within the credible set were ranked by the posterior probability (PP).
genomic regions. Among these, one region on chromosome 5p15.33 harboring the TERT and CLPTM1L genes had statistically significant shared heritability for seven cancer types, including ER-negative breast, colorectal, glioma, lung, melanoma, pancreatic, and prostate cancer. By utilizing an iterative analysis, we identified ten independent cross-cancer SNP signals within this locus. We then conducted fine-mapping analyses for each independent signal and generated 95% posterior probability credible sets both without and with a priori functional information.
Our pairwise local genetic correlation results were highly consistent with the conclusions of previous GWASs and cross-cancer analyses. The pleiotropic effect of variants in the 8q24 region between multiple types of cancer, including colorectal and prostate cancer, has been previously demonstrated and replicated in studies across populations of different ancestries. 12,59,60 The 5p15.33 region, containing the TERT and CLPTM1L genes, has also been associated with multiple cancers. [17][18][19][20][21][22][23] Other significant genomic regions identified in our study, including 1q32.1 (ER-negative breast and prostate), 4q24 (colorectal and prostate), 5q11.2 (overall breast and colorectal), 10q26.13 (ER-positive breast and prostate), 17q12 (endometrial and prostate), and 19p13.11 (ER-negative breast and ovarian), have also been identified as pleiotropic loci in previous analyses. 61,62 Previous efforts have been devoted to identifying pleiotropic variants, by using either a subset-based meta-analysis approach 61 or categorizing genome-wide significant loci of multiple cancers by LD patterns. 62 Our analysis complements these, as we aggregated the per-SNP effect within the loci, estimated the local heritability of each cancer, and quantified the local genetic correlation between the cancer pairs. These ''shared heritability hotspots'' identified in our analysis may contain genes with strong effect on multiple cancers or harbor multiple risk variants and biological mechanisms that can independently affect the risk of different cancers. Our results can thus be utilized to prioritize candidate regions for future discoveries of causal variants and functional follow-up.
As the 5p15.33 region harboring the TERT and CLPTM1L genes was the only region that displayed more than one statistically significant pairwise genetic correlation, we focused our continued efforts on this region. The TERT gene encodes the catalytic subunit of telomerase reverse transcriptase, 25 which is a crucial enzyme for maintaining telomere length. Mendelian randomization studies have shown that genetically determined telomere length is associated with the risk of multiple cancer types, including glioma, ovarian, lung, and melanoma, but is not associated with the risk of other cancers included here, such as breast and prostate. [63][64][65] In our study, we observed local negative genetic correlations and opposite direction of SNP effects between specific cancer types, which indicate that genetic variation in this region is likely to affect cancer risk through multiple distinct biological pathways, of which telomere length is only one implicated mechanism. Meanwhile, the CLPTM1L gene encodes the cleft lip and palateassociated transmembrane-1 like protein, which has been reported to play a role in cell apoptosis and cytokinesis and is overexpressed in lung and pancreatic cancer. [66][67][68] Given its important biological function and significant association with a broad set of cancers, we assumed that multiple variants in this region may independently influence the risk of various types of cancers. By iteratively conducting conditional meta-analyses, we identified ten independent signals (seven in the TERT gene, one in the CLPTM1L gene, and two between TERT and CLPTM1L). Our study results are comparable to a previous study published by Wang et al., 56 which conducted a subset-based meta-analysis across six types of cancers (bladder, glioma, lung, pancreatic, prostate, and testicular). Several signals identified in our study have either been proposed (rs10069690, rs2853677) or are correlated with the independent signals reported in that study (rs7705526 versus rs7726159, r 2 ¼ 0.87; rs465498 versus rs451360, r 2 ¼ 0.34). We only included cancers with genome-wide significant signals in this region into the subset-based metaanalysis and conditional analysis. Compared to the study presented by Wang et al., 56 our study further included several common cancers (ER-negative breast, colorectal, melanoma, and ovarian), while we did not have data on bladder and testicular cancer. With an increased number of cancers and larger sample sizes, we were able to refine the cross-cancer signals in this important region. In addition, independent signal rs465498 identified in our study was in strong correlation with two previously identified susceptible loci at the CLPTM1L gene, including pancreatic cancer SNP rs31490 (r 2 ¼ 0.96) 69 and lung, melanoma, and prostate cancer SNP rs401681 (r 2 ¼ 0.96). 21,70,71 Our findings imply that the association between the CLPTM1L gene and various types of cancer can be potentially attributed to one distinct signal.
When estimating the local genetic correlation across cancers, we considered subtypes for breast (ER-negative and ER-positive) and lung cancer (adenocarcinoma, small cell, and squamous cell). Despite the relatively smaller GWAS sample size (21,468 for ER-negative breast cancer compared to 122,977 for overall breast cancer), ER-negative breast cancer showed stronger associations and higher genetic correlation with other cancers in the 5p15.33 region, as compared to ER-positive and overall breast cancer. In contrast, the three subtypes of lung cancer had either no genome-wide significant hits at the 5p15.33 region (small cell) or had weaker local genetic correlation estimates (adenocarcinoma and squamous cell, data not shown) than overall lung cancer. We thus included ER-negative breast cancer and overall lung cancer in the subsequent analyses.
Multiple lead SNPs with high posterior probability have been reported to affect telomere length. SNP rs7705526 is significantly associated with telomere length in multiple populations. 72-75 SNP rs2853677 has been associated with relative telomere length in a breast cancer case-only cohort in Han Chinese, 76 as well as leukocyte telomere length in a European ancestry population. 75 SNP rs35226131 is perfectly correlated with a nonsynonymous variant (rs61748181) in TERT, which results in a proteinlevel change from alanine to threonine and negatively influences telomere length. 15,71 SNP rs10069690 has been found to significantly interact with recent use of non-steroidal anti-inflammatory drugs (NSAIDs) to alter telomere length in a colorectal cancer case-control study. 77 SNP rs465498, located in the CLPTM1L gene, has been reported to be significantly associated with telomere length among Han Chinese. 78 We could not find previous data on the role of other five lead SNPs identified by our study, and it is thus possible that other unknown mechanisms are involved.
Since previously identified cancer risk SNPs at 5p15.33 have been linked to open chromatin conformation, 55,56 we further included regions of open chromatin for related tissues from the ENCODE project as functional prior in our fine-mapping analysis. 57 The results for five signals (lead SNPs rs192723047, rs10069690, rs7705526, rs2853677, and rs35334674) remained unchanged, with each having a credible set containing one single SNP with a posterior probability of 1.00. After incorporating open chromatin peaks as a prior, the 95% posterior probability credible sets became smaller for three signals (lead SNPs rs35033501, rs11414507, and rs465498), as SNPs located in open chromatin peaks obtained a higher posterior probability of being causal. For the other two signals (lead SNPs rs35226131 and rs3888705), the size of each 95% credible set was relatively large in analyses with and without functional annotations. No SNPs in these regions had a pre-dominantly high posterior probability, nor did any of them overlap with the open chromatin peaks of any related tissue. The fine-mapping results for these two signals should thus be interpreted with extra caution.
Our study has several strengths and limitations. We used cancer GWAS summary statistics published by each collaborating consortium, which maximized our sample sizes and provided large statistical power. This is also the first study to comprehensively quantify the local genetic correlation across multiple common cancers. We innovatively adopted the joint analysis pipeline of two-sided ASSET meta-analysis and COJO-GCTA. This approach enabled us to both validate the proposed pleiotropic loci and explore novel independent signals, under the complex genetic architecture in the 5p15.33 region. It is also important to recognize some limitations. Although we chose an internal population (breast cancer controls) to generate the LD reference panel for the conditional analyses and fine-mapping, bias may still inevitably exist as the mismatch of LD between the reference and the population of other cancers. The study population was limited to European ancestry individuals only, and therefore any conclusions of our research may not be applicable to other ancestries. Including multiple ancestries would also allow for refinement of the fine-mapping signals, since LD structure varies between populations. Moreover, some of the GWASs included in the present study (e.g., breast and ovarian) shared controls. Although we accounted for this overlap in the local genetic correlation analysis and the subsetbased meta-analysis, we were not able to take these into account in the fine-mapping analysis, as PAINTOR currently does not adjust for sample overlap. However, we do not believe this will have a qualitative impact on our results. Meanwhile, although our analysis included a large number of cancer types, other cancers, including bladder and testicular, which have shown genome-wide significant signals in the 5p15.33 region, 21,79 were not included. Further, we could have missed any potentially causal variants that were not included in our analyses for various reasons (e.g., poorly imputed or rare variants). Finally, the tissuespecific open chromatin peaks used as the functional prior in our fine-mapping analysis were from adult tissue. Some of these tissues may not express much of TERT, and thus these annotations may not necessarily reflect a cellular context where TERT and the enhancers that promote TERT expression are active. Our fine-mapping analysis should thus be interpreted with some caution. Since the fine-mapping analysis was solely based on bioinformatic analysis, further functional validation using molecular biology experiments is required to fully understand the mechanisms at play in this region.
In summary, our study identified genomic regions with significant local genetic correlations across 14 types of common cancers. We further enumerated the independent pleiotropic signals in the 5p15.33 region and performed a cross-cancer fine-mapping for each signal, using up-dodate bioinformatics tools. Results from our study provide novel evidence of the shared inherited basis of human cancers and expand our understanding of the role played by the TERT-CLPTM1L region in cancer development.

Declaration of interests
B.M.W. has received research grants from Celgene and Eli Lilly and has consulting relationship with BioLineRx, Celgene, and Grail. R.A.E. has received speaker honoraria from the GU-ASCO meeting (January 2016), RMH FR meeting (November 2017, supported by Janssen), University of Chicago invited talk (May 2018), and ESMO (September 2019, supported by Bayer & Ipsen) and served as member of external expert committee at the Prostate Dx Advisory Panel (June 2020). All other authors declare no competing interests.