Whole-genome sequencing identifies variants in ANK1, LRRN1, HAS1, and other genes and regulatory regions for stroke in type 1 diabetes

Individuals with type 1 diabetes (T1D) carry a markedly increased risk of stroke, with distinct clinical and neuroimaging characteristics as compared to those without diabetes. Using whole-exome or whole-genome sequencing of 1,051 individuals with T1D, we aimed to find rare and low-frequency genomic variants associated with stroke in T1D. We analysed the genome comprehensively with single-variant analyses, gene aggregate analyses, and aggregate analyses on genomic windows, enhancers and promoters. In addition, we attempted replication in T1D using a genome-wide association study (N = 3,945) and direct genotyping (N = 3,263), and in the general population from the large-scale population-wide FinnGen project and UK Biobank summary statistics. We identified a rare missense variant on SREBF1 exome-wide significantly associated with stroke (rs114001633, p.Pro227Leu, p-value = 7.30 × 10–8), which replicated for hemorrhagic stroke in T1D. Using gene aggregate analysis, we identified exome-wide significant genes: ANK1 and LRRN1 displayed replication evidence in T1D, and LRRN1, HAS1 and UACA in the general population (UK Biobank). Furthermore, we performed sliding-window analyses and identified 14 genome-wide significant windows for stroke on 4q33-34.1, of which two replicated in T1D, and a suggestive genomic window on LINC01500, which replicated in T1D. Finally, we identified a suggestively stroke-associated TRPM2-AS promoter (p-value = 5.78 × 10–6) with borderline significant replication in T1D, which we validated with an in vitro cell-based assay. Due to the rarity of the identified genetic variants, future replication of the genomic regions represented here is required with sequencing of individuals with T1D. Nevertheless, we here report the first genome-wide analysis on stroke in individuals with diabetes.

prevalence is rising 2 .Even though much of this trend is driven by an increase in obesity and insulin-resistant type 2 diabetes (T2D), the incidence of insulin-dependent T1D has increased as well 3 .T1D is a lifelong condition caused by an autoimmune reaction towards the pancreas and treated with daily insulin injections.The strokes themselves may be of hemorrhagic (20%) or ischemic (80%) origin and classified into even more specific subtypes.Interestingly, the two diabetes types affect stroke risk differentially: T1D increases the risk of both ischemic-and hemorrhagic stroke 4,5 , while the risk imposed by T2D has been estimated more modest for hemorrhagic strokes 5 .Importantly, T1D predisposes individuals to cerebral small-vessel disease and strokes of microvascular origin 6,7 .Diabetes causes also other complications, of which diabetic kidney disease (DKD) and severe retinopathy predict cerebrovascular disease in T1D 8 .Understanding stroke pathophysiology in diabetes is important for improving treatment and quality of life for individuals with T1D.
Stroke heritability has been estimated to vary between 30 and 40% in the general population 9 .Stroke heritability varies greatly depending on the subtype, with the largest heritability estimates for large artery atherosclerotic stroke and lobar intracranial hemorrhage, and the lowest for small vessel disease 9 .To date, 126 common genomic loci have been associated with stroke or its subtypes with genome-wide significance 10,11 .Associations at many of the known common stroke loci overlap with other cardiovascular phenotypes, e.g., coronary artery disease (CAD) 9 .Our previous study suggested a heritable component of stroke in individuals with T1D as a history of maternal stroke was associated with hemorrhagic stroke in T1D 12 .However, very few studies have investigated genetic risk factors for stroke in diabetes [13][14][15] , and no genome-wide studies in individuals with diabetes yet exist.On the other hand, genetic studies on CAD in diabetes have identified a few diabetes-specific loci 16,17 , although still pending external replication, and have replicated three known general population CAD risk loci in diabetes: CDKN2B-AS1, PSRC1 and LPA 15,16,18 .
A substantial proportion of heritability remains unexplained for stroke 9 .Rare genetic variants with minor allele frequency (MAF) of ≤ 1% may significantly contribute to stroke heritability.In fact, some rare monogenic disorders have stroke as one of their manifestations 9,10,19 .In GWASs, the imputation accuracy of rare variants may be limited, and largely depends on the minor allele count (MAC) in the reference sample 20 .Rare variants can be reliably studied with next-generation sequencing-based techniques such as whole-genome sequencing (WGS) and whole-exome sequencing (WES).We have previously used WES to identify protein coding variants associated with lipid and apolipoprotein traits in T1D 21 .In the general population, novel stroke risk loci have been identified with WGS 22 .However, UK Biobank WES analysis for cardiometabolic traits did not discover exome-wide significant stroke risk genes 23 .
Historically, the Finnish population has been isolated and, thus, represents a unique genetic background with enrichment of low-frequency deleterious variants 24 , which may in part enable the discovery of rare diseaseassociated variants.Here we studied genetics of stroke and its subtypes with WGS and WES in Finnish individuals with T1D with multiple statistical approaches by focusing on rare and low-frequency genomic variants.We aimed both to find stroke-risk loci specific to individuals with T1D, and to identify risk loci generalizable to the non-diabetic population, since discovery of rare variants is more probable in a high-risk Finnish diabetic population.Finally, we performed cell-based in vitro experiments to further validate a discovered promoter region.Altogether, here we report the first genome-wide study on stroke genetics in diabetes.

Study design
The study is part of the Finnish Diabetic Nephropathy (FinnDiane) Study; an ongoing nationwide multicenter study established to identify factors leading to diabetic complications 25 .We studied WGS in 571 and WES in 480 non-related and non-overlapping individuals with T1D, entailing 112 and 74 stroke cases, respectively (Table 1, Table S1, Fig. S1 and S2).We aimed to find rare and low-frequency genetic variants associated with stroke in T1D.Therefore, we performed single variant analyses across the genome (MAC ≥ 5), using fixed-effects meta-analysis for variants available in both data sets, with a minimal adjustment setting i.e., the calendar year of diabetes onset, sex and two first genomic data principal components, and repeated the analyses with an additional DKD adjustment (Fig. 1).We performed gene aggregate analyses (cumulative MAC, CMAC ≥ 5) with the minimal adjustment separately with protein-altering variants (PAVs) and protein-truncating variants (PTVs); and repeated the analyses with an additional DKD adjustment.Finally, we conducted minimally adjusted intergenic aggregate analyses within genomic windows by statistically up-weighting functionally important and rare variants; and within established enhancers and promoters by weighting variants according to their rarity.Furthermore, we performed stroke subtype association analyses for the lead findings.

Single variant analyses
We sought for genetic variants associated with stroke using non-overlapping WES and WGS data, and discovered a suggestively stroke-associated locus, 4q33-34.1, with the minimally adjusted model (4:170787127, p-value = 8.83 × 10 -8 , MAF = 3.7%, Table 2, Fig. 2).The variant was unavailable for replication in the T1D specific GWAS and in the FinnGen general population GWAS summary statistics.However, the variant with the third lowest p-value on 4q33-34.1 was available but did not replicate for stroke in T1D nor the general population (Table 2).As DKD is a common diabetic complication that has been reported to predict incident stroke in T1D 8 , we performed additional analyses adjusted for DKD, and discovered a rare missense variant on SREBF1 exomewide significantly (p-value < 3 × 10 -7 ) associated with stroke (rs114001633, p.Pro227Leu, p-value = 7.30 × 10 -8 , MAF = 0.26%) (Table 2, Fig. S3).Due to the rarity of the variant, we performed additional genotyping for replication, whereby the variant did not replicate for stroke (Table 2), but replicated for hemorrhagic stroke in T1D (p-value = 0.02, N = 3,263, Table S2).Since rs114001633 did not pass MAC threshold in the hemorrhagic stroke Table 1.Clinical characteristics of study participants in the next-generation sequencing data sets.Weighted mean HbA1c is calculated until the stroke event or the end of follow-up.DKD: diabetic kidney disease status closest to the end of follow-up, defined as end-stage renal disease (ESRD), macro-or microalbuminuria.Mean (standard deviation, SD), *Median (interquartile range, IQR).Student's t-test, Wilcoxon signed rank test or Fisher's exact test.3, S3 and S5).Furthermore, eight genes were suggestively associated with stroke through rare or low-frequency PAVs (Fig. 3B).In the stroke subtype analysis for the lead genes, the aggregate of rare PAVs on MAP3K12 was associated with ischemic stroke (p-value = 1.72 × 10 -7 , CMAC = 17), and on MTRNR2L7 with hemorrhagic stroke (p-value = 2.24 × 10 -6 , CMAC = 6).MAP3K12 and TARBP2 are located close to each other on the genome, thus, they may represent the same association signal through linkage disequilibrium (LD) or modifier effects onto the causal gene (Fig. S6).

Replication of gene aggregate findings
We attempted T1D specific replication within the FinnDiane GWAS data, by including also five directly genotyped variants, both using the gene aggregate approach and by inspecting the exonic variants individually.Despite the uncertainty of genotype imputation and our limited statistical power for rare variants, ANK1 and LRRN1 showcased weak evidence of replication in T1D: Although ANK1 did not reach significance for stroke Table 2. Lead variants discovered with single variant association analyses (p-value < 5 × 10 -7 ).Two variants closest to rs4435704 are reported from the 4q33-34.1 locus.In addition, if the variant was discovered already in the minimal model, we do not report DKD adjusted results.Variants are in Hardy-Weinberg equilibrium (p-value > 0.05).FinnDiane replication is performed with GWAS for rs376936219, and with genotyping for rs114001633.+ + Meta-analysis with positive minor allele effect direction, REF = Reference allele, ALT = Alternative allele, I 2 = Meta-analysis heterogeneity estimate, OR = Odds ratio, CI = confidence interval.4 and S6), one of the available fifteen variants was associated with stroke (rs779805849, p-value = 0.017) (Table 3, Fig. 4), and two additional variants with hemorrhagic stroke (rs146416859 and rs61753679, p-value < 0.05) (Table S4).LRRN1 did not replicate for stroke in FinnDiane with rare PAVs (p-value = 0.50, N variant = 4) (Tables 4 and S7).However, when we extended the model to low-frequency PAVs (Tables 4 and S7), thus improved statistical power and imputation quality, LRRN1 replicated for ischemic stroke (p-value = 0.039, N variant = 6).UACA contained two rare PTVs associated with stroke, of which one replicated through genotyping (p-value = 0.0030, Tables 3 and S5).However, the variant was ultra-rare, and replication thus uncertain.We were unable to replicate HAS1 in T1D due to missing data; we directly genotyped one variant but found no rare allele carriers.ARPC5 did not replicate.We further attempted replication in the general population by look-ups from two UK Biobank WES studies 23,27 (Tables 4, S8 and S9).Importantly, HAS1 replicated for stroke with rare loss-of-function variants (MAF ≤ 1%: p-value = 0.035 27 ) and with ultra-rare deleterious variants (MAF ≤ 0.1%: p-value = 0.012 23 ), while UACA replicated with ultra-rare deleterious variants (MAF ≤ 0.01%: p-value = 0.035 27 ).Finally, LRRN1 replicated for stroke with an ultra-rare deleterious variant model (MAF ≤ 0.001%: p-value = 0.026 27 ), although not for ischemic stroke.ANK1 did not replicate with the deleterious missense variant model in the general population 27 .

Promoters and enhancers
As a more targeted aggregate approach to explore the non-coding genome, we studied rare and low-frequency variants on established regulatory regions using the minimal adjustment.We discovered three enhancers with suggestive stroke-associated enrichment of rare or low-frequency variants within intronic regions of TRPM3, LOC105378983, and BDNF, encoding brain-derived neurotrophic factor (Tables S12 and S13, Fig. S10).The BDNF enhancer was significant after multiple testing correction for ischemic stroke (p-value = 1.01 × 10 -6 , CMAC = 6).Regional aggregate replications were not possible in the T1D specific GWAS (N variant < 2), and individual variants were missing or did not replicate.PCHi-C linked the BDNF enhancer to its promoter on specific brain regions (Fig. S9).
We did not identify stroke-associated promoters after correction for multiple testing (p-value < 3 × 10 -7 , Fig. S11).The strongest associations were two TGOLN2 promoters (p-value = 5.60 × 10 -6 , CMAC = 9, MAF ≤ 1%), located on the previously mentioned TGOLN2 window, and a TRPM2-AS promoter (p-value = 5.78 × 10 -6 , CMAC = 33, MAF ≤ 1%; Tables S14 and S15).The aggregate of rare variants on TRPM2-AS promoter nearly replicated for stroke in T1D (FinnDiane GWAS: p-value = 0.053).When we inspected variants individually, one out of nine available variants replicated in the general population for ischemic stroke (FinnGen GWAS: p-value = 0.038).In GTEx, rs762428 within the TRPM2-AS promoter associated significantly to TRPM2 level in whole blood (NES = -0.63)and lungs (NES = -0.41,p < 0.001), also nominally in other tissues such as the hypothalamus (NES = -0.42).TRPM2 encodes a calcium-permeable and non-selective cation channel expressed mainly in the brain.The gene has been linked to ischemic stroke 30 , and belongs to the same protein subfamily as the above mentioned TRPM3.TRPM2 inhibitors have been proposed as a drug target for central nervous system diseases 31 , thus, our results suggested that these inhibitors could be beneficial also for stroke in T1D, although further validation of the genetic associations are needed.www.nature.com/scientificreports/ We performed luciferase promoter analysis of the stroke-associated sequence within the TRPM2-AS promoter region to experimentally confirm its promoter activity (Fig. 6).As we detected TRPM2-AS expression in HELA cells but not in HUVEC or HEK-293 cells using semi-quantitative RT-PCR, the luciferase analysis was performed in HELA cells, which indicated strong promoter activity.The most strongly stroke-associated variant, rs753589764, did not significantly affect luciferase activity under normal cell culture conditions (p-value = 0.27, 22 technical repeats).However, we cannot rule out a variant effect under cellular stress, e.g., oxidative stress, or in other cell lines, and therefore, further promoter experiments should be performed in future.

Discussion
Stroke heritability has been estimated to range between 30 and 40%, but the genomic loci identified thus far explain only a small fraction of heritability 9 .One potential explanation underlying the missing heritability are rare variants missed by GWAS.Therefore, we performed WES and WGS in a total of 1,051 Finnish individuals with T1D to discover rare and low-frequency variants associated with stroke and its major subtypes, either specific for T1D, or generalizable to the non-diabetic population.We identified multiple significant loci with evidence of replication, including protein altering or truncating variants on ANK1, HAS1, UACA , and LRRN1, as well as a 4q33-34.1 intergenic region.Table 4. Lead genes with replication in individuals with T1D and in the general population.Table represents gene aggregate replication with the best matching model to the discovery stage, and if successfully observed in further models, replication is presented.T1D GWAS (FinnDiane) may include also directly genotyped variants.UKBB (UK Biobank) replication entail WES analysis with several models, i.e., M1: LoF variants 27 , M3/Damaging: LoF and predicted-damaging missense variants 23,27  www.nature.com/scientificreports/With single variant analyses, we identified a missense variant on SREBF1 (rs114001633, p.Pro227Leu), which was exome-wide significantly associated with stroke, and further replicated for hemorrhagic stroke in T1D.As the variant was ultra-rare, and we had a relatively small number of hemorrhagic stroke cases, further replication is needed in T1D to conform this finding.SREBF1 encodes a transcription factor involved in lipid metabolism and insulin signaling 32 .Gene aggregate tests (SKAT-O) detected four genes within which PAVs (ANK1 and LRRN1) or PTVs (HAS1 and UACA ) were associated with stroke with evidence of replication; LRRN1, HAS1, and UACA after adjustment for DKD.ANK1 did not replicate in T1D with the gene aggregate approach, however, one out of the fifteen available variants replicated for stroke in T1D (rs779805849, p.Val136Glu).Of note, SIFT and PolyPhen predicted many ANK1 variants as deleterious 33,34 .ANK1 encodes ankyrin-1, within which variants cause hereditary spherocytosis, an inherited disease that changes the shape of red blood cells 35 .Previous genome-wide association studies have linked the gene to T2D 36 , while another gene from the ankyrin protein family, ANK2, is a previously identified stroke risk locus 37 .
Rare PAVs on LRRN1 were associated with stroke.LRRN1 did not replicate with the corresponding model in T1D, however; with a model extended to low-frequency PAVs, LRRN1 replicated for ischemic stroke.Rare variant replication is problematic with GWAS data due to the uncertainty of the imputation, which may explain the need of increasing the allele frequency threshold to observe a successful replication.Furthermore, LRRN1 was nominally associated with stroke in the general population through an aggregate of ultra-rare loss-of-funtion and deleterious missense variants 27 .LRRN1 encodes leucine rich repeat neuronal protein 1, with a brain-enriched expression profile.
HAS1 consistently replicated for stroke with rare loss-of-funtion and deleterious variant aggregate models in the general population 23,27 , while UACA replicated for stoke with one ultra-rare deleterious variant model 27 .HAS1 encodes an enzyme producing hyaluronan and with expression induced by inflammation and glycemic stress 38 .Of note, an increased hyaluronan turnover has been suggested to follow ischemic stroke 39 .No additional HAS1 PTV carriers were identified among the T1D replication cohort, thus, a diabetes-specific replication is pending.Nevertheless, HAS1 PTVs may be of particular importance in T1D, as dysregulation of endothelial glycocalyx hyaluronan has been suggested to contribute to diabetic complications 40 .Finally, it must be noted that PTVs have not been functionally confirmed as loss-of-function, but the annotations are predictions; PTV at the beginning of a gene is likely more severe than at the end, and in fact, PTVs closer to the HAS1 transcription start site were more strongly associated with stroke.
To increase statistical power on regulatory regions, we performed statistical aggregate tests in genomic windows, enhancers and promoters 41,42 .Of note, we extended genomic window length from the default to increase statistical power, which however also reduced precision as the causal region might be narrower.We found fourteen genome-wide significant stroke-associated windows with low-frequency variants on 4q33-34.1, of which two replicated for stroke in T1D.According to eQTLs and PCHi-C interactions, 4q33-34.1 variants most likely target GALNTL6, MFAP3L or AADAT.We also discovered a suggestively stroke-associated window through rare variants within LINC01500, which replicated for stroke in T1D.According to PCHi-C, the LINC01500 window targets a promoter of DACT1.Finally, an aggregate of rare variants was suggestively associated with stroke on TRPM2-AS promoter, which nearly replicated in T1D (p-value = 0.053).Importantly, transient receptor melastatin 2 (TRPM2) has been previously associated with ischemic stroke 30,31 .Our functional cell-based assay validated the TRPM2-AS region promoter activity.However, the most strongly stroke-associated variant, rs753589764, did not associate with TRPM2-AS promoter activity under normal cell culture conditions in HELA cells.
Limitations of the study include the limited statistical power due to moderate sample size at the discovery stage, replication of rare variants with imputed GWAS data, and non-conservative statistical estimates for the rarest variants due to case-control imbalance (≈1:6), especially for the stroke subtypes.We were able to improve the statistical power on exomes by meta-analyzing WES and WGS, and we performed the stroke-subtype specific analyses only for a limited number of suggestive findings to avoid spurious signals due to unstable statistical estimates.To further improve statistical power, we performed statistical aggregate tests on gene exons and on intergenic regions, i.e., enhancers, promoters, and genomic windows.Of note, we studied only transcribed www.nature.com/scientificreports/enhancers, and thus, some enhancers could have been missed.We defined promoters with an arbitrarily selected 1,000 bp extension downstream TSS, which may not have always been optimal as the promoter lengths vary.Further limitations are the lack of sequencing-based replication data in individuals with T1D, and that we regarded nominal significance as replication (p-value < 0.05).However, we sought for replication by combining available data sources, i.e., FinnGen (Finnish general population GWAS), UK Biobank (general population WES), and FinnDiane (GWAS and genotyping in Finnish individuals with T1D).Of note, stroke cases were younger and had a shorter diabetes duration than controls in the FinnDiane cohorts; the difference being the most extreme in the discovery cohorts, which may have imposed unsuccessful replication for variants with an age or diabetes duration dependent effect.Importantly, gene burden variant selection criteria did not perfectly match to ours within UK Biobank WES 23,27 , especially with the low-frequency protein altering variant models, which may explain some unsuccessful gene aggregate replications.Finally, while conducting the analyses in an isolated population has certain advantages for variant discovery, it also raises the question of generalizability of the findings to other populations.In addition to the replication attempted in the UK Biobank, further research is needed to validate our findings in non-Finnish individuals with T1D.
The strengths of this study include a well characterized cohort and comprehensively performed single variant and aggregate analyses both for the coding and non-coding regions of the genome.Stroke is a challenging phenotype to address with ICD codes and many loci associated with rare stroke phenotypes may go unnoticed even with large population-wide genetic studies.We performed analyses for well-defined stroke phenotypes verified by trained neurologists.Furthermore, as we conducted the analyses in specific high-risk individuals from an isolated population, thus with less genetic and phenotypic diversity, we had improved statistical opportunities to identify genetic risk loci.
In conclusion, we studied rare and low-frequency stroke-associated genetic variants with whole-exome or whole-genome sequencing in 1,051 individuals with T1D and report the first genome-wide study on stroke genetics in diabetes.The results highlight 4q33-34.1,SREBF1, and ANK1 for stroke in T1D; and HAS1, UACA , LRRN1, LINC01500, and TRPM2-AS promoter as stroke risk loci that likely generalize to the non-diabetic population.The represented results require future validation with next-generation sequencing in a larger cohort of individuals with T1D.

Materials
We studied WGS in 571 and WES in 480 non-related individuals with T1D, entailing 112 and 74 stroke cases, respectively (Table 1, Table S1, Fig. S1 and S2).Patients in WGS and WES were non-overlapping.The patient selection for both data sets were originally designed for DKD, such that half of the individuals had severe DKD, and half had no DKD (i.e., normal albumin excretion rate) despite a long duration of T1D 21,43 .Importantly, this resulted in stroke cases being younger and having shorter diabetes duration than controls, contradictory to presumption.Individuals in the present study were diagnosed with T1D by their attending physician and had diabetes onset age < 40 and insulin initiated within one calendar year from the diabetes diagnosis.Stroke cases were identified for the participants from Finnish registries based on ICD codes until the end of 2017 (Table S16).The phenotypes were verified, and stroke cases classified into ischemic-and hemorrhagic strokes by trained neurologists using medical files and brain imaging data.For individuals without data verified by neurologists available (N WGS = 27, N WES = 2), we considered only the registry data, excluded controls with intermediate stroke phenotypes (e.g., transient ischemic attack), and were unable to classify stroke cases into ischemic-and hemorrhagic subtype.Importantly, we required stroke to have occurred after T1D diagnosis, and controls to have > 35 years of age and > 20 years of diabetes duration.Next-generation sequencing data was processed to GRCh38 reference panel, and variants annotated with SNPEff v.5 software 44 (Fig. S12).In variant QC, for autosomal variants, we required Hardy-Weinberg equilibrium (HWE) p-value > 10 -10 and variant call rate > 98%; and for X chromosome variants, only variant call rate > 98%.The pipeline is described in Detailed Methods of the Supplementary Information.
Within the FinnDiane study, we have GWAS data for almost the entire cohort, i.e., 6,458 individuals with T1D or their relatives.GWAS data has been previously processed to GRCh37 reference genome.However, we have now lifted the genotyping positions over to GRCh38, re-imputed the data to SISu v3 reference panel, and annotated with SNPEff v.5 software 44 (Fig. S13).We attempted replication in individuals with T1D within the FinnDiane GWAS data, non-overlapping to sequencing data (N = 3,945, Table S17 and S18, Fig. S14), and restricted to high imputation quality variants (r 2 > 0.80), and by directly genotyping twelve lead variants for replication (N = 3,263, Table S19, Fig. S15).Stroke cases were younger and had shorter diabetes duration than controls in the replication cohorts, comparably to the discovery cohorts, although with a less extreme difference.Of note, variant genotyping was performed with one Agena iPlex multiplexing assay at the Institute for Molecular Medicine Finland, Helsinki, Finland (Table S20), and the genotyping replication limited to individuals within GWAS data in order to perform relatedness adjustment.Stroke phenotype and control criteria within replication in T1D were defined similarly to the WES and WGS data.

Single variant analyses
We analyzed the genome with an additive inheritance model.For variants available in WES and WGS data, we performed score test with rvtests (version 20190205) 45 , followed by fixed-effect inverse variance based metaanalysis (Total MAC ≥ 5, and MAC ≥ 2 in WES and WGS) with metal (version 20110325) 46 .For variants available only in one data set we utilized exact Firth regression (MAC ≥ 5) 45 .Importantly, Firth logistic regression has been suggested the most conservative statistical test for joint rare variant analyses, especially with case-control imbalance, while score test to have the highest statistical power for rare variant meta-analyses 47 .The additive single variant analyses were adjusted for the calendar year of diabetes onset, sex, and two first genomic data principal components (i.e., minimal adjustment setting), and additionally for DKD, which is one of the most important risk factors of stroke in T1D 48 .WGS and WES stroke controls are older and have longer T1D duration than cases-contrary to true stroke predisposition-due to next-generation sequencing patient selection optimization for DKD by considering T1D duration.Thus, in order to avoid statistical bias, we adjusted for the calendar year of diabetes onset; a major stroke risk factor correlated with age, T1D duration, and T1D treatment quality.

Gene aggregate analyses
In order to improve statistical power for rare (MAF ≤ 1%) and low-frequency (MAF ≤ 5%) variants, we performed gene aggregate analyses with an optimal unified sequence kernel association test (SKAT-O) meta-analysis with MetaSKAT (version 0.81) 49 , separately within two distinct classes (Table S21): protein-altering variants and protein-truncating variants i.e., the more severe putative loss-of-function variants 50 .Importantly, the proteinaltering variant class entail protein-truncating variants in addition to variants that alter the amino acid sequence.Of note, SKAT-O maximizes statistical power by optimally combining sequence kernel association test and burden test 51 .All variable sites (MAC ≥ 1) were accepted into gene aggregate analysis, and the aggregate tests were required to entail at least two variants (N variant ≥ 2), with a cumulative MAC (CMAC) across all included variants within the gene ≥ 5. We adjusted the analyses for the calendar year of diabetes onset, sex, and the two first genomic data principal components, and additionally for DKD.We did not report genes with all variants in perfect LD, and inspected individual variant stroke-associations within the genes using the score test fixedeffects meta-analysis 45,46 .Multiple testing correction, based on the number of tested genes, resulted in significance thresholds of p-value < 4 × 10 -6 for PAVs (MAF ≤ 1%: N gene = 11,954; MAF ≤ 5%: N gene = 13,069), p-value < 8 × 10 -5 for PTVs with MAF ≤ 1% (N gene = 663), and p-value < 6 × 10 -5 for PTVs with MAF ≤ 5% (N gene = 908).In addition, we investigated stroke-associations for 17 autosomal Mendelian stroke risk genes regardless of CMAC 19 , and were able to report associations for 13 of them.

Sliding-window and regulatory region aggregate analyses with whole-genome sequencing
To increase statistical power for low-frequency and rare variants on intergenic regions, we performed functionally informed sliding-window analyses, i.e., aggregate analyses within 4,000 base pair (bp) regions (N variant ≥ 2, CMAC ≥ 5)-separated by 2,000 bps-with variants statistically weighted according to their rarity and functional importance using STAAR-O (STAAR R package 0.9.6) 41,52 .Functional importance was defined with Combined Annotation-Dependent Depletion (CADD) data 52 using variant MAF (to up-weight rarer variants), pre-computed CADD score, and the first annotation principal component from seven annotation classes (Fig. S16, Table S22), calculated following the guidelines 41 .Of note, the scores were utilized on the PHRED scale.We adjusted the analyses for the calendar year of diabetes onset, sex, and the two first genomic data principal components.
We studied established regulatory regions, i.e., enhancers and promoters (N variant ≥ 2, CMAC ≥ 5), as defined in FANTOM5 cap analysis of gene expression (CAGE) human data reprocessed to the GRCh38 reference genome 42 , with promoters defined as the transcription start site (TSS) extended to 1,000 bp, and weighted by the variant rarity in PHRED scale.FANTOM5 atlases have been measured with multiple human primary cell lines, tissues, and cancer cell lines 53,54 .The regulatory regions were analyzed with STAAR R package 0.9.6 41 , by adjusting for calendar year of diabetes onset, sex, and two first genomic data principal components.With low-frequency variants, the multiple testing corrected significance thresholds were p-value < 2.9 × 10 -7 for promoters (N region = 172,134) and p-value < 2.6 × 10 -6 for enhancers (N region = 19,472).For rare variants, the thresholds were p-value < 3.5 × 10 -7 (N region = 141,779) and p-value < 4.3 × 10 -6 (N region = 11,665), respectively.We did not report regions with all variants in perfect LD.

Replication
Within the FinnDiane GWAS data, we attempted replication of high imputation quality genetic variants (r 2 > 0.80) with score test (rvtests 20190205 45 ) and had good statistical power (> 80%) to detect a nominal association with an odds ratio (OR) ≥ 2.5 for additive low-frequency variants (MAF = 1%) (Fig. S17) 55 .However, for rare variants with MAF = 0.1% and OR < 9, we had only limited power to detect an association even with nominal significance (p-value < 0.05).Thus, we considered nominal significance as the replication threshold (p-value < 0.05).We attempted direct genotyping for replication for twelve variants, but minor allele carriers were observed only for seven of them (Table S20).We performed single variant analyses for the genotyped variants similarly with score test, except for one LRRN1 variant with linear regression and no relatedness adjustment (stats R package 4.2.1)due to lack of alternative allele carriers among individuals with the required relatedness information.Most variants within the aggregate discoveries were rare or ultra-rare (MAF≈0.1%),making replication with imputed genomic data problematic.Nevertheless, we attempted replication within the FinnDiane GWAS data (r 2 > 0.80) by including also the directly genotyped variants (SKAT-O, STAAR-O).We performed SKAT-O using GMMAT R package 1.3.2 by imputing missing genotype dosages to mean 56 , while intergenic aggregate analyses were performed similarly with STAAR R package 41 .Replication analyses were adjusted comparably to the discovery stage analyses, except that relatedness in replication was accounted for with relatedness matrices instead of genomic principal components (Balding-Nichol's approximation kinship matrix in single variant analysis and GEMMA relatedness matrix in aggregate analyses) 45,57 .We attempted replication in the general population for genetic variants from the large-scale population-wide FinnGen project release 6 GWAS data with phenotypes best matching our definitions (https:// www.finng en.fi/ en) (Table S23), and for the gene aggregate discoveries from UK Biobank summary statistics 23,27 .Of note, no proxies in LD were found for the lead single variant

Figure 6 .
Figure 6.TRPM2-AS regional plot and experimental data.(A) Regional plot of TRPM2 and TRPM2-AS extended region (Gviz R package 1.38.3 61 ); the discovered promoter is highlighted (red).(B) Semi-quantitative RT-PCR detecting TRPM2-AS transcript in HELA cells, but not in HUVEC and HEK-293 cells (hypoxanthine phosphoribosyltransferase 1 (HPRT1) and 18S ribosomal RNA as positive control), (C) Relative expression of TRPM2-AS and TRPM2 transcripts in HELA cells (HPRT1 used as reference transcript to normalize quantitative RT-PCR), (D) Firefly/Renilla luciferase assay of promoter activity.Empty vector (mean = 1.0, 12 technical repeats) as transfection control for baseline luciferase activity was compared to TRPM2-AS control promoter, i.e., major allele in all identified TRPM2-AS variants (mean = 56.5, 11 technical repeats, p-value = 0.00022); which was further compared to a TRPM2-AS promoter with rs753589764 minor allele (mean = 72.6,11 technical repeats, p-value = 0.27): All cloned before firefly reporter gene to evaluate potential transcriptional promoter activity.Statistical significance was assessed with Student's t-test and error bars represent standard error.