Genome-wide association meta-analysis of spontaneous coronary artery dissection identifies risk variants and genes related to artery integrity and tissue-mediated coagulation

Spontaneous coronary artery dissection (SCAD) is an understudied cause of myocardial infarction primarily affecting women. It is not known to what extent SCAD is genetically distinct from other cardiovascular diseases, including atherosclerotic coronary artery disease (CAD). Here we present a genome-wide association meta-analysis (1,917 cases and 9,292 controls) identifying 16 risk loci for SCAD. Integrative functional annotations prioritized genes that are likely to be regulated in vascular smooth muscle cells and artery fibroblasts and implicated in extracellular matrix biology. One locus containing the tissue factor gene F3, which is involved in blood coagulation cascade initiation, appears to be specific for SCAD risk. Several associated variants have diametrically opposite associations with CAD, suggesting that shared biological processes contribute to both diseases, but through different mechanisms. We also infer a causal role for high blood pressure in SCAD. Our findings provide novel pathophysiological insights involving arterial integrity and tissue-mediated coagulation in SCAD and set the stage for future specific therapeutics and preventions.


Genome-wide association meta-analysis of spontaneous coronary artery dissection identifies risk variants and genes related to artery integrity and tissue-mediated coagulation
Spontaneous coronary artery dissection (SCAD) is an understudied cause of myocardial infarction primarily affecting women. It is not known to what extent SCAD is genetically distinct from other cardiovascular diseases, including atherosclerotic coronary artery disease (CAD). Here we present a genome-wide association meta-analysis (1,917 cases and 9,292 controls) identifying 16 risk loci for SCAD. Integrative functional annotations prioritized genes that are likely to be regulated in vascular smooth muscle cells and artery fibroblasts and implicated in extracellular matrix biology. One locus containing the tissue factor gene F3, which is involved in blood coagulation cascade initiation, appears to be specific for SCAD risk. Several associated variants have diametrically opposite associations with CAD, suggesting that shared biological processes contribute to both diseases, but through different mechanisms. We also infer a causal role for high blood pressure in SCAD. Our findings provide novel pathophysiological insights involving arterial integrity and tissue-mediated coagulation in SCAD and set the stage for future specific therapeutics and preventions. Cardiovascular disease is the leading cause of death in women, but sex-specific aspects of the risk of heart disease and acute myocardial infarction (AMI) remain understudied 1 . Spontaneous coronary artery dissection (SCAD) and atherosclerotic coronary artery disease (CAD) are both causes of acute coronary syndromes leading to AMI [2][3][4][5][6] . However, in contrast with CAD, SCAD affects a younger, predominantly female population 7 and arises from the development of a hematoma, leading to dissection of the coronary tunica media with the eventual formation of a false lumen, rather than atherosclerotic plaque erosion or rupture 8 . SCAD has been clinically associated with migraine 9 and extra-coronary arteriopathies, including fibromuscular dysplasia (FMD) [10][11][12][13] . However, co-existent coronary atherosclerosis is uncommon 8,14 . While the genetic basis of CAD is increasingly well established 15 , the pathophysiology of SCAD remains poorly understood 4 . The search for highly penetrant mutations in candidate pathways or by sequencing has garnered a low yield, often pointing to genes involved in other clinically undiagnosed inherited syndromes manifesting as SCAD 16 . Previous investigations of the impact of common genetic variation on the risk of SCAD have described five confirmed risk loci [17][18][19][20] .
In this Article, we performed a meta-analysis of genome-wide association studies (GWASs) comprising 1,917 SCAD cases and 9,292 controls of European ancestry. We identified 16 risk loci, including 11 new association signals, demonstrating a substantial polygenic heritability for this disease. Importantly, we show that several common genetic risk loci for SCAD are shared with CAD but have a directionally opposite effect and a different genetic contribution of established Article https://doi.org/10.1038/s41588-023-01410-1 (for example, the colon, small intestine and uterus) ( Supplementary  Fig. 5). Based on recently published analyses of single-cell open chromatin in 30 adult tissues 24 , we determined that vascular smooth muscle cells (VSMCs) and fibroblasts were the top enriched cell types for SCAD-associated loci among clusters represented in aorta and tibial artery datasets ( Fig. 2a and Supplementary Fig. 6). Consistently, all but one SCAD locus included at least one variant that overlapped with enhancer marks or open chromatin peaks in coronary artery tissue, VSMCs or fibroblasts (Supplementary Fig. 7 and Supplementary Table 5). Among the top associated variants for SCAD, 14 were expression quantitative trait loci (eQTLs) for nearby genes in the aorta, coronary or tibial artery, whole blood or cultured fibroblasts (Fig. 1b and  Supplementary Table 5).

Tissue coagulation as a novel mechanism in SCAD
We applied a multi-source strategy to identify candidate genes located in risk or GWAS loci, or loci at risk for SCAD. We prioritized: (1) genes that were targets of eQTLs colocalizing with a GWAS signal (Supplementary Fig. 8a and Supplementary Table 6) or transcriptome-wide association study (TWAS) hits in at least one tissue relevant to arterial dissection (aorta, coronary or tibial artery, fibroblasts or whole blood from the Genotype Tissue Expression (GTEx) database) (Supplementary Fig. 8b and Supplementary Table 7); (2) genes with a biological function linked to the cardiovascular system in humans or mice; (3) genes involved in significant long-range chromatin conformation interactions from Hi-C data with SCAD-associated variants in the aorta 25 ; and (4) those genes closest to or overlapping with the top associated variants. We identified one specific and strong candidate gene in 14 loci (Fig. 1b). For instance, the tissue factor gene F3 stood out as the most likely target gene near rs1146473 (odds ratio = 1.32; P = 5.8 × 10 −9 )-a locus on chromosome 1 that we describe as novel for SCAD and any cardiovascular disease or trait so far. F3 is the closest coding gene to the association signal and was a TWAS hit in artery tissue (Supplementary Table 7). In addition, the rs1146473 risk allele for SCAD confidently cardiovascular risk factors. These findings implicate arterial integrity related to extracellular matrix biology, vascular tone and tissue coagulation in the pathophysiology of SCAD.

GWAS meta-analysis and single-nucleotide polymorphism heritability
We conducted a GWAS meta-analysis of eight independent case-control studies (Supplementary Figs. 1 and 2 and Supplementary Table 1). Sixteen loci demonstrated genome-wide-significant signals of association with SCAD, among which 11 were newly described for this disease (Table 1, Fig. 1a, Supplementary Table 2 and Supplementary Fig. 3). One locus on chromosome 4 (AFAP1) was recently reported for SCAD in the context of pregnancy 19 and has now been confirmed as being generally involved in SCAD ( Table 1). The estimated odds ratios of associated loci ranged from 1.25 (95% confidence interval (CI) = 1.16-1.35) in ZNF827 on chromosome 4 to 2.04 (95% CI = 1.77-2.35) on chromosome 21 near KCNE2 (Table 1). We report evidence for substantial polygenicity for SCAD with an estimated single-nucleotide polymorphism (SNP)-based heritability above 0.70 (h 2 SNP = 0.71 ± 0.11 on the liability scale using linkage disequilibrium score regression 21 and h 2 SNP = 0.70 ± 0.12 using SumHer 22 ; Supplementary Table 3). The ECM1/ADAMTSL4 locus on chromosome 1 accounted for the largest proportion of heritability for SCAD in our dataset (h 2 = 0.028), followed by the COL4A1/COL4A2 locus, which contained two independent GWAS signals (h 2 = 0.022; Supplementary Table 4 and Supplementary Fig. 4). Overall, we estimate that the 16 loci explain ∼24% of the total SNP-based heritability of SCAD (Supplementary Table 4).

Functional annotation of variants in SCAD loci
We found SCAD-associated variants to be significantly enriched in enhancer marks specific to gene expression in arterial tissues from ENCODE 23 (for example, the aorta, tibial artery, thoracic aorta and coronary artery), as well as several tissues rich for smooth muscle cells  (posterior probability = 94%) colocalized with an eQTL signal of F3 in the aorta, supporting the genetic risk to potentially be the result of decreased F3 expression in arteries ( Fig. 2b and Supplementary Table 6). Tissue factor, also known as coagulation factor III, forms a complex with factor VIIa, which is the primary initiator of blood coagulation. Hence, reduced factor III expression is potentially a key biological mechanism contributing to hematoma formation in the coronary arteries of SCAD survivors. Consideration of genes encoding druggable targets, as derived by Finan et al. 26 , indicated that tissue factor is a clinical phase drug candidate (tier 1 druggable target), with target reference numbers CHEMBL4081 (factor III) and CHEMBL2095194 (factor III/factor VII complex) (Supplementary Table 8).
To globally assess the biological mechanisms involving prioritized genes, we applied a network query based on Bayesian gene regulatory networks constructed from expression and genetics data from arterial tissues and fibroblasts 27-29 . We found extracellular matrix organization to be the biological function at which most prioritized genes and their respective immediate subnetworks clustered ( Supplementary Fig. 9). Among the genes we prioritized in novel loci, a number encode proteins involved in extracellular matrix formation, including integrin alpha 1 (ITGA1), basement membrane constituent collagen type IV alpha 1 chain (COL4A1) and alpha 2 chain (COL4A2), serine protease HtrA serine peptidase 1 (HTRA1), metallopeptidase thrombospondin type 1 domain containing 4 (THSD4, encoding a partner of fibrillin 1, whose gene is located in a previously reported SCAD locus (FBN1)) and TIM metallopeptidase inhibitor 3 gene (TIMP3). Interestingly, integrin alpha 1, HTRA1 and collagen type IV subunits were labeled as potentially druggable targets based on their similarity to approved drug targets and members of key druggable gene families (tier 3; Supplementary  Table 8). Of note, the F3 subnetwork also clustered in extracellular matrix organization and connected with HTRA1 and TIMP3 subnetworks through Bayesian network edges from the aorta and coronary artery ( Supplementary Fig. 9).

Shared genetics between SCAD and arterial diseases
With the exception of the F3 locus, SCAD risk loci located within 1 megabase of the lead SCAD variants were at least suggestively (P < 10 −5 ) associated with other forms of cardiovascular and neurovascular disease. Using trait colocalization analyses, we found that the same variants were likely to be causal both for SCAD and the other diseases or traits at 15 loci ( Fig. 3a and Supplementary Table 9). However, the directions of the effects were not systematically consistent across the loci for all of the diseases. Globally, SCAD loci showed evidence for high posterior probability for the same risk alleles to also probably be causal for FMD and cervical artery dissection ( Fig. 3a and Supplementary Table 9). Linkage disequilibrium score regression-based genetic correlations indicated that SCAD correlates positively with FMD (r g = 0.38 ± 0.18; P = 0.03) and cervical artery dissection (r g = 0.61 ± 0.20; P = 2.4 × 10 −3 ; Fig. 3b and Supplementary Table 10), which is consistent with the clinical observation of frequent coexistence of these arteriopathies in patients with SCAD. For instance, FMD is reported in ∼40-60% of patients with SCAD 11,30 . Stratified analyses in the four largest casecontrol studies where FMD arteriopathies were screened indicated globally similar associations with SCAD ( Supplementary Fig. 10 and Supplementary Table 11). Finally, genetic correlations indicated that SCAD positively correlates with several neurovascular diseases where predominantly arterial structure and/or function are altered, including stroke (r g = 0.17 ± 0.06; P = 4.5 × 10 −3 ), migraine (r g = 0.18 ± 0.06; P = 1.3 × 10 −3 ), intracranial aneurysm (r g = 0.22 ± 0.06; P = 2.0 × 10 −4 ) and subarachnoid hemorrhage (r g = 0.27 ± 0.07; P = 6.4 × 10 −5 ) ( Fig. 3b and Supplementary Table 10).

Opposite genetic link between SCAD and CAD
While patients with CAD are predominantly men (∼75%) who often have pre-existing cardiometabolic comorbidities (mainly dyslipidemia, hypertension and type 2 diabetes), patients with SCAD are on average younger, present with fewer cardiovascular risk factors and are overwhelmingly women (>90%) 2,4 . Using genetic association colocalization and genetic correlation, we genetically compared SCAD with CAD. We found that, among SCAD loci, several were known to associate with CAD. Disease association colocalization analyses showed that for six loci SCAD and CAD are likely to share the same causal variants with high posterior probabilities (posterior probability of the shared causal variant hypothesis (H4) = 84-100%), but all with opposite risk alleles ( Fig. 3a   Higher opacity is used to identify significant associations (adjusted P < 0.05).
Bottom, composition of artery tissues relative to 105 single-cell subclusters, as determined by snATAC-seq in 30 adult tissues 24 . Only subclusters representing >1% of cells from either the aorta or tibial artery were represented. b, Representation of the SCAD TWAS z score for each prioritized gene in GWAS loci. The point shape indicates the tissue used in the TWAS association. The point color distinguishes genes located at different loci. The absence of a symbol indicates that the gene did not show significant heritability based on the eQTL data in the corresponding tissue. TWAS P values were calculated by two-tailed z test against a null distribution calculated by permutation for each gene or tissue 44 . Higher opacity is used to identify significant associations (Bonferroni adjusted P < 0.05), corresponding to a z score of >4.8 or <−4.8 (dashed gray lines).

Cardiovascular risk factors and risk of SCAD and CAD
We found that SCAD shared several causal variants with SBP and DBP, involving both the same and opposite directional effects ( Fig. 3a and Supplementary Table 9). We found one shared locus with hemoglobin levels and a significant genetic correlation with SCAD (r g = 0.12 ± 0.03; P = 2.7 × 10 −5 ; Fig. 3b). However, SCAD loci were not shared with body mass index (BMI), lipid traits (including low-density lipoprotein (LDL) cholesterol and high-density lipoprotein (HDL)), type 2 diabetes or smoking, and these traits did not correlate with SCAD at the genomic level (Supplementary Tables 9 and 10). Interestingly, we found significant positive genetic correlations both with SBP (r g = 0.12 ± 0.03; P = 1.0 × 10 −4 ) and DBP (r g = 0.17 ± 0.03; P = 2.6 × 10 −7 ), indicating a shared genetic basis with SCAD ( Fig. 3b and Supplementary Table 10).
To assess the extent to which blood pressure and main cardiovascular risk factors may contribute to the risk of SCAD, we leveraged existing GWAS datasets to identify instrumental variables and conducted comparative Mendelian randomization associations with SCAD or CAD. We found robust significant associations estimated by inverse variance-weighted (IVW), MR-Egger and weighted median methods between genetically predicted blood pressure traits and increased risk of SCAD (β IVW/SBP = 0.05 ± 0.01 (P = 7.6 × 10 −6 ); β IVW/DBP = 0.10 ± 0.02 (P = 1.9 × 10 −8 )) and CAD (β IVW/SBP = 0.04 ± 0.002 (P = 8.6 × 10 −49 ); β IVW/DBP = 0.06 ± 0.004 (P = 1.6 × 10 −44 )) ( Fig. 4 and Supplementary  Table 13). Similar associations were estimated when we analyzed only women with SCAD, women with CAD or men with CAD, although analyses only in men with SCAD were limited by the extremely small numbers of male cases (Supplementary Table 14). Genetically determined BMI, lipid traits, type 2 diabetes and smoking status did not influence the risk for SCAD. However, we were able to confirm that these cardiometabolic traits are strong genetic risk factors for CAD ( Fig. 4 and Supplementary Table 13). Our findings indicate that genetically elevated blood pressure is the only shared genetic risk factor between SCAD and CAD, albeit involving potentially different genetic loci.

Discussion
In this Article, we provide the largest study to date aimed at understanding the genetic basis of SCAD-an understudied cause of AMI that primarily affects women. We report novel associations and demonstrate high polygenic heritability for SCAD. We leverage integrative functional annotations to prioritize genes that are likely to be regulated in VSMCs and the fibroblasts of arteries. Insights from the biological functions of genes highlight the central role of extracellular matrix integrity and reveal impaired tissue coagulation as a novel potential mechanism for SCAD. Globally, we demonstrate the polygenic basis of SCAD to be shared with an important set of cardiovascular diseases. However, a striking directionally opposite genetic impact is found with atherosclerotic CAD, involving multiple risk loci and leading to a genome-wide negative genetic correlation. We provide evidence supporting genetically predicted higher blood pressure as an important  Article https://doi.org/10.1038/s41588-023-01410-1 risk factor for SCAD, but not other well-established cardiovascular factors. Our results set the stage for future investigation of novel biological pathways relevant to both SCAD and CAD and potential therapeutic and preventive strategies specifically targeting SCAD.
As an understudied condition that was previously thought to be uncommon, SCAD was initially suspected to involve rare and highly penetrant mutations. However, recent sequencing studies have suggested that only a small proportion (~3.5%) of SCAD cases are due to rare variants 16,32 . This is in keeping with increasing clinical recognition suggesting that this condition is not rare and occurs globally in populations of both European and non-European ancestry, with similar disease characteristics and probably similar prevalence 2,4,33,34 . Despite a modest sample size, we identified 16 risk loci accounting for about one-quarter of the polygenic heritability, which we estimate to be as high as ∼71%, therefore indicating that SCAD is predominantly a complex polygenic disease. However, we acknowledge that larger GWAS settings, including ancestrally diverse populations, will enhance the statistical power needed to provide validation through replication of the reported risk loci and estimated polygenic heritability.
This study supports the presence of genetic overlap between the risk of SCAD and other vascular diseases involving generally younger individuals and more women, such as cervical arterial dissection, migraine, subarachnoid hemorrhage and FMD. These conditions are reported to occur at increased frequency in patients with SCAD 10-13 , supporting shared causal biological mechanisms. Among the genes we prioritize as novel SCAD loci, we highlight the ATPase plasma membrane Ca 2+ transporting 1 gene (ATP2B1) that we recently reported to associate with FMD 35 -a well-established locus for blood pressure risk 36 via its role in intracellular calcium homeostasis in VSMCs and blood pressure regulation 37 . Most importantly, we provide evidence for a causal genetic effect of both SBP and DBP in SCAD risk. These findings provide an important genetic basis to support observational data suggesting that control of blood pressure may be an important factor in reducing the risk of recurrence after SCAD 38 . However, our findings also suggest that controlling other causal risk factors for CAD, such as LDL cholesterol with statins, may confer less benefit in SCAD than in CAD.
Knowledge of the molecular mechanisms leading to SCAD has been limited. Insights from sequencing studies of rare genetic variants have shown that most are associated with genes known from hereditary connective tissue disorders such as vascular Ehlers-Danlos, Loeys-Dietz and Marfan syndromes, as well as adult polycystic kidney disease 16,32 . A striking finding from our study is the identification of the tissue factor gene F3-a critical component of tissue-mediated blood coagulation-as a strong candidate gene in a risk locus for SCAD. We found that genetically determined lower expression of F3 in arterial tissue was associated with a higher risk for SCAD, involving variants located in putative functional regulatory elements in the coronary artery, VSMCs and fibroblasts. Tissue factor is synthesized at the subendothelial level of VSMCs and by fibroblasts in the adventitia surrounding the arteries 39 . In SCAD, once an intramural hemorrhage has initiated, propagation and pressurization of the false lumen may depend, in part, on coagulation and stabilization of the hematoma. Tissue factor is also a druggable target, albeit a potentially challenging one given its known multiple physiological and pathophysiological roles ranging from hemostasis to cancer metastasis. Tissue factor is widely studied in the context of prothrombotic conditions, including atherosclerosis, although notably the genetic variants we describe here do not associate with atherosclerotic disease. This feature is an exception to the highly pleiotropic nature of the variants we describe in the remaining SCAD loci, suggesting impaired tissue-initiated coagulation as a putative specific mechanism in SCAD.
We identify regulation of the extracellular matrix of arteries as the predominant polygenic biological mechanism for SCAD. Integrative prioritization analyses revealed 13 potential causal genes with established key roles in maintaining arterial wall integrity and function. Among these, we highlight the serine protease HTRA1 and  Table 13).
Article https://doi.org/10.1038/s41588-023-01410-1 metallopeptidase inhibitor TIMP3, which are involved in matrix disassembly. TIMP3 clusters in the main network for extracellular matrix organization that includes ADAMTSL4, LRP1 and COL4A1, with connections with subnetworks of F3. This clustering is consistent with the biological function of TIMP3 as an inhibitor of matrix metalloproteinases with domains interacting with ADAMTS proteins and LRP1, involving proteins encoded by genes prioritized in SCAD loci 40 . Interestingly, we found a novel association signal with SCAD in the metallopeptidase thrombospondin type 1 domain containing 4 gene (THSD4) that promotes fibrillin 1 elastic fiber assembly, and confirm the previously reported associations near ADAMTSL4 and FBN1 (refs. 18,20). We showed that genetically decreased expressions of these genes in arteries were correlated with higher SCAD risk alleles in arteries or fibroblasts. This finding suggests that a genetic predisposition to a weaker extracellular matrix may increase the vulnerability of traversing intramural microvessels to disruption, increasing the risk of initiation and propagation of a false lumen within the coronary vessel wall, leading to SCAD. Many of the risk loci for SCAD that we report here, as well as their prioritized genes, are already known from atherosclerotic disease GWASs. However, here we provide compelling and intriguing evidence for the opposite directionality of a substantial fraction of genetic bases for SCAD versus CAD, suggesting that some key biological mechanisms involved in the two diseases are also likely to be opposite, which is consistent with the clinical observation of a lower-than-expected burden of atherosclerotic disease in patients with SCAD. For example, the association signals in the COL4A1/COL4A2 locus are in an opposite direction to their contribution to CAD 41 . This locus encodes α1 and α2 chains of type IV collagen, with transcripts generated through a common promoter. Type IV collagen is the main component of the basement membrane of arterial cells and plays a key role in the structural integrity and biological functions of VSMCs in the tunica muscularis. Decreased collagen IV expression increases the risk of CAD 15,42 . Proposed potential mechanisms for this include a disinhibition of VSMC-intimal migration during atherogenesis or an increase in the vulnerability of atherosclerotic plaque to rupture 42 . In contrast with CAD, our data indicate that genetically mediated increased collagen IV expression also increases the risk of SCAD. Better understanding of how these directionally opposite changes modify the risk of CAD and SCAD has considerable potential to enhance our understanding of the molecular genetic mechanisms that confer risk in both diseases.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01410-1.

Patients and control populations
Our meta-analysis included participants of European ancestry from eight studies: DISCO-3C, SCAD-UK I, SCAD-UK II, Mayo Clinic, DEFINE-SCAD, CanSCAD/MGI, VCCRI I and VCCRI II ( Supplementary  Fig. 1). Patients with SCAD presented with similar clinical characteristics (Supplementary Table 1), as well as homogeneous diagnosis, exclusion and inclusion criteria. All of the studies were approved by national and/ or institutional ethical review boards. Further study-specific clinical details are provided in the Supplementary Note.

Genome-wide association meta-analysis
Details of the pre-imputation quality control steps for each study are listed in Supplementary Table 15. Briefly, genotyping was performed using commercially available arrays or genome sequencing (SCAD-UK II and VCCRI II).
To increase the number of tested SNPs and the overlap of variants available for analysis between different arrays, the genotypes of all European ancestry cohorts except SCAD-UK II and VCCRI II were imputed to the Haplotype Reference Consortium version 1.1 reference panel 45 on the Michigan Imputation Server 46 . A GWAS was conducted in each study under an additive genetic model using PLINK version 2.0 (ref. 47). For chromosome X, males and females were both on a 0.2 scale under the chromosome X inactivation assumption model. Models were adjusted for population structure using residues from the first five principal components and sex, except in the women-only analyses. Before meta-analysis, we removed SNPs with low minor allele frequencies (<0.01), low imputation quality (r 2 < 0.8) and deviations from Hardy-Weinberg equilibrium (P < 10 −5 ). A total of 6,691,677 variants met these criteria and were kept in the final results. Results from individual GWASs were combined using an inverse variance-weighted fixed-effects meta-analysis in METAL software 48 , with correction for genomic control. Heterogeneity was assessed using the I 2 metric from the complete study-level meta-analysis.
Between-study heterogeneity was tested using Cochran's Q statistic and considered significant at P ≤ 10 −3 . The genome-wide significance threshold was set at the level of P = 5.0 × 10 −8 . LocusZoom (http://locuszoom.org/) was used to provide regional visualization of the results.

Functional annotation
Identification of potential functional variants. To generate a list of potential functional variants, we first identified the 95% credible set of variants using the ppfunc function of the corrcoverage R package (version 1.2.1). The posterior probability of causality was evaluated from marginal z scores for all variants within 500 kilobases (kb) of the lead SNP at each locus. In the COL4A1/COL4A2 locus, where we found two association signals, these were separated by placing an equidistant border from each lead SNP for the inclusion of SNPs in the analysis. Variants with a cumulated posterior probability of up to 95% were kept for further analyses. To consider potentially poorly imputed variants in one of the individual case-control studies, we also included variants in high linkage disequilibrium (r 2 > 0.7) with the lead SNP at each locus, based on information from European populations (1000 Genomes reference panel) queried using the ldproxy function of the LDlinkR package (version 1.1.2) 49 .

Enrichment of SCAD variants in regulatory regions.
To calculate the enrichment of SCAD-associated SNPs among functionally annotated genomic regions, we retrieved available H3K27ac chromatin immunoprecipitation followed by sequencing (ChIP-seq) datasets (narrowPeak beds) in any tissue from ENCODE (https://www.encodeproject.org/ (ref. 50)) and single-nucleus assay for transposase-accessible chromatin with sequencing (snATAC-seq) peak files (bed format) from the Human Enhancer Atlas (http://catlas.org/humanenhancer (ref. 24)). A complete list of datasets is available in Supplementary Table 16. For H3K27ac marks, bed files corresponding to the same tissue were concatenated and sorted before combining overlapping peaks using the bedtools (version 2.29.0) merge command. Variant enrichment was calculated using the GREGOR package (version 1.4.0) 43 . All potential functional variants (95% credible set and linkage disequilibrium proxies as described above) were used as inputs and the parameters were adjusted so as not to pick additional linkage disequilibrium proxies (LDWINDOWSIZE = 1). P values were adjusted for multiple testing by the application of Bonferroni correction.

Identification of variants with potential regulatory function.
We used H3K27ac peaks in coronary arteries (as described above), open chromatin regions in healthy coronary arteries (obtained as previously described 35,51 ) and open chromatin regions from merged snATAC-seq clusters, which were mapped fragments from snATAC-seq in 25 adult tissues that we retrieved from the Gene Expression Omnibus (GSE184462) 24 in bed format. Mapped fragments from all clusters representing >1% of cells in at least one arterial tissue (T lymphocyte 1, CD8 + , endothelial general 2, endothelial general 1, macrophage general, fibroblast general, vascular smooth muscle 2 or vascular smooth muscle 1) were extracted and grouped by annotated cell type as T lymphocytes, macrophages, fibroblasts, endothelial cells and VSMCs, respectively. Genome coverage was calculated using the bedtools (version 2.29.0) coverage function. We detected peaks from bedGraph output using the MACS2 bdgpeakcall function (Galaxy Version 2.1.1.20160309.0) on the Galaxy webserver 52,53 . All peak files were extended 100 base pairs upstream and downstream using the bedtools (version 2.29.0) slop function. We detected overlaps of SCAD potential functional variants with relevant genomic regions using the findOverlap function from the rtracklayer package (version 1.52.1) 54 . We used the Integrated Genome Browser (version 9.1.8) to visualize read density profiles and peak positions in the context of the human genome 55 .
Gene prioritization. Genes located within 500 kb of lead variants were annotated to prioritize the most likely causal genes. To find the closest gene(s) from lead SNPs and genes overlapping with variants in the credible set of causal SNPs, gene coordinates were retrieved from Gencode release 38 and aligned to hg19 genomic coordinates (gencode.v38lift37.annotation.gff3.gz). Significant eQTL associations and all SNP-gene eQTL associations in the version 8 release of the GTEx database were retrieved from the GTEx website (www.gtexportal.org/ home/datasets). Colocalization of association with SCAD and eQTLs was evaluated using the R coloc package (version 5.1.0) with default values as priors. We considered that there was evidence for colocalization if H4 coefficients were >75% or if eQTL association was significant for SCAD lead SNPs and H4 was over 25%. TWASs were performed using the FUSION R/Python package 44 . Gene expression models were pre-computed from GTEx data (version 8 release) and were provided by the authors. Only genes with a heritability P < 0.01 were used in the analysis. Both tools used linkage disequilibrium information from the European panel of phase 3 of the 1000 Genomes Project. Bonferroni multiple testing correction was applied using the p.adjust function in R (version 4.1.0). Significant capture Hi-C hits in aorta tissue were provided as supplementary data by Jung et al. 25 . Genes associated with mouse cardiovascular phenotypes (code MP:0005385) were retrieved from the Mouse Genome Informatics database (www.informatics. jax.org) 56 . We also queried the DisGeNET database, using the disge-net2r package (version 0.99.2), for genes with reported evidence in human cardiovascular disease (code C14) with a score of >0.2, including "ALL" databases 57 . In the absence of a missense variant, colocalization and TWAS criteria were given a tenfold weight compared with other criteria. At each locus, we prioritized genes fulfilling the largest number of criteria. In cases where several candidates were retained, we prioritized genes that were most likely to have a function in arterial disease (for example, expression in arterial tissues or exclusion of pseudo-genes).
Article https://doi.org/10.1038/s41588-023-01410-1 Druggability of prioritized genes. The druggability of the gene products identified through the GWAS was assessed by reference to the set of genes encoding druggable targets derived by Finan et al. 26 using ChEMBL version 17. Targets in this set are subclassified into: (1) the efficacy targets of approved agents and clinical phase drug candidates (tier 1); (2) genes encoding targets with known bioactive drug-like small molecule binding partners and those with substantial sequence with approved drug targets (tier 2); and (3) genes encoding secreted or extracellular proteins, proteins with more distant similarity to approved drug targets and members of key druggable gene families not already included in tiers 1 or 2. Further lookups of approved and clinical phase targets were performed against ChEMBL 58 version 30 and the British National Formulary (accessed 9 April 2021). Note that identified drug targets can either be: (1) a single protein providing a 1:1 link with the causal gene nominated in a GWAS and post-GWAS analysis; (2) a protein complex where the causal gene can encode a member of the complex; or (3) a protein family with the causal gene being a member of the family.
Bayesian network query of SCAD candidate genes. Gene expression data from the aorta artery, coronary artery, tibial artery and cultured fibroblasts were curated from version 8 of the GTEx database (ref. 28). Gene expression data from the mouse aorta was curated from the Hybrid Mouse Diversity Panel (HMDP) 27 . Tissue-specific gene regulatory Bayesian networks were constructed from the GTEx and HMDP gene expression data using RIMBANET 29 . The Bayesian network from each dataset included only network edges that passed a probability of >30% across 1,000 generated Bayesian networks starting from different random genes. Bayesian networks were combined for the top GWAS hits query, and mouse gene symbols were converted to their human orthologs. Bayesian networks were queried for the identified top GWAS hits to identify their first-degree network connections and to determine connections between their surrounding subnetwork nodes. The directions of edges were informed by prior knowledge, such as eQTLs and previously known regulatory relationships between genes. Subnetworks were annotated by top biological pathways representative of the subnetwork genes using Enrichr with a false discovery rate of <0.05.

Colocalization with other traits and diseases
Summary statistics were retrieved from individual studies, as indicated in Supplementary Table 17. At each locus, we selected variants found in both SCAD and the other studies with a high quality of imputation (r 2 > 0.9) and located within 500 kb from the SCAD lead SNP. COL4A1 and COL4A2 loci were separated by placing an equidistant border from SCAD lead SNPs for the inclusion of SNPs in the analysis. Signal colocalization was evaluated using the R coloc package (version 5.1.0) with default values as priors. We reported H4 coefficients indicating the probability of two signals sharing a common causal variant at each locus.

Heritability estimates and genetic correlation
We used linkage disequilibrium score regression 21 implemented in the ldsc package (version 1.0.1; https://github.com/bulik/ldsc/) and SumHer 22 implemented in the LDAK software (www.ldak.org) to quantify the heritability explained by common variants or SNP-based heritability (h 2 SNP ) for SCAD and the degree of genetic correlation between SCAD and other diseases and traits. We also used SumHer to estimate the SNP-based heritability attributable to loci associated with SCAD at genome-wide statistical significance. Loci were defined as the 1 megabase region around lead SNPs in the GWAS meta-analysis. SNPs belonging to each locus were used as annotations to calculate the partitioned heritability. Two analyses were performed: one that considered separated loci and a second that aggregated all SNPs as one annotation. Summary statistics were acquired from the respective consortia and are detailed in Supplementary Table 17. For each trait, we refined the summary statistics to the subset of HapMap 3 SNPs to reduce the potential bias due to poor imputation quality. Correlation analyses were restricted to European ancestry meta-analyses summary statistics. We used the European linkage disequilibrium score files calculated from the 1000 Genomes reference panel and provided by the developers. P < 1.9 × 10 −3 , corresponding to adjustment for 26 independent phenotypes, was considered significant. We conditioned SCAD association on cardiometabolic trait genetic association using the mtCOJO tool from the GCTA pipeline 31 . The resulting summary statistics were then used to compute genetic correlations between SCAD, conditioned on cardiometabolic traits and traits of interest.

Mendelian randomization analyses
We applied a stringent selection process for instrumental variables to ensure the validity of our Mendelian randomization results. To select valid instrumental variables that respect the three key assumptions ((1) strong association with the exposure; (2) independence from potential confounders between the exposure and outcome; and (3) influence on the outcome only through the exposure), we used linkage disequilibrium clumping with a P value threshold of <5 × 10 −8 and a linkage disequilibrium r 2 < 0.001 within a 10,000 kb window based on the European population in the 1000 Genomes Project. We excluded candidate instrumental variables that were absent in the summary statistics data from a GWAS of our outcome (SCAD/CAD). To minimize the risk of horizontal pleiotropy, we removed candidate instrumental variables that were associated with the outcome or in high to moderate linkage disequilibrium (r 2 > 0.6 within a 10,000 kb window).
We used the multiplicative random-effects IVW method 59 implemented in the TwoSampleMR R package to estimate the associations of genetically predicted cardiovascular risk factors, including blood pressure (SBP and DBP), lipids (HDL, LDL and triglycerides), BMI, smoking liability and type 2 diabetes, with each of the outcomes of interest (SCAD or CAD). Estimates were scaled to a doubling in genetically predicted smoking risk, or to a one-unit increase in the genetically predicted trait for the continuous traits. We performed sensitivity analyses using the weighted median and MR-Egger methods to assess the consistency of estimates under alternative assumptions about genetic pleiotropy, as recommended 59 . We also performed Cochran's Q test to assess the heterogeneity between estimates obtained using different variants. As 11 risk factors were assessed, a Bonferroni-corrected significance level of 0.05/9 = 5.6 × 10 −3 was used as the threshold for statistical significance in this analysis. P values between 5.6 × 10 −3 and 0.05 were considered suggestively significant.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Gene reference names and coordinates were retrieved from the GENCODE project through the European Bioinformatics Institute FTP server. gencode.v38.annotation.gff3 and gencode.v38lift37.annotation.gff3 files were used. eQTL data were retrieved from version 8 of the GTEx database (https://gtexportal.org/home/datasets). H3K27ac ChIP-seq datasets (narrowPeak beds) in any tissue were retrieved from ENCODE (https://www.encodeproject.org/). Single-nucleus ATAC-seq peak files (bed format) were retrieved from the Human Enhancer Atlas (http://catlas.org/humanenhancer). Open chromatin regions in healthy coronary arteries were generated from raw reads retrieved from the Sequence Read Archive (SRR2378591, SRR2378592 and SRR2378593). Raw snATAC-seq data in 25 adult tissues were retrieved from the Gene Expression Omnibus (GSE184462). Gene expression models for TWASs were retrieved from the Gusev laboratory website (http://gusevlab.org/ projects/fusion/) based on GTEx data (v8 release). Gene expression data from aorta arteries, coronary arteries, tibial arteries and cultured Article https://doi.org/10.1038/s41588-023-01410-1 fibroblasts were curated from version 8 of the GTEx database (www. gtexportal.org/home/datasets). Gene expression data from mouse aortas were curated from the HMDP. Genes associated with mouse cardiovascular phenotypes (code MP:0005385) were retrieved from Mouse Genome Informatics (www.informatics.jax.org). GWAS summary statistics were retrieved from http://www.cardiogramplusc4d. org/data-downloads/, http://ftp.ebi.ac.uk/pub/databases/gwas/sum-mary_statistics/, https://www.megastroke.org/, http://www.nealelab. is/uk-biobank or https://diagram-consortium.org/downloads.html or retrieved from authors, as detailed in Supplementary Table 17. The set of genes encoding druggable targets was derived using ChEMBL version 17 and further analyzed using ChEMBL version 30 and the British National Formulary (accessed 9 April 2021). Summary statistics for SCAD association from the meta-analysis are available in the GWAS Catalog (GCP000522). Full lists of the datasets used in this study, along with the corresponding accession numbers, are available in Supplementary Tables 16 and 17.
Article https://doi.org/10.1038/s41588-023-01410-1 supported by grant PT17/0019, of the PE I + D + i 2013-2016, funded by Instituto de Salud Carlos III and the European Regional Development Fund. Fondation Alzheimer supported genotyping of the 3C study (Paris, France) to P.A. We thank AstraZeneca's Centre for Genomics Research (Discovery Sciences, BioPharmaceuticals R&D) for funding the sequencing of participants in cohort SCAD-UK I and providing bioinformatics support. We acknowledge the leadership of the ESC-ACVC SCAD Study Group. The DISCO investigators thank the French Society of Cardiology and French Coronary Atheroma and Interventional Cardiology Group for support, as well as clinical research associates of the Clermont-Ferrand University Hospital: E. Chazot, C. Bellanger, L. Cubizolles, A. Thalamy and O. Lamallem. The SCAD-UK study investigators acknowledge J. Middleton, J. Plume, D. Alexander, D. Lawday and A. Marshall for support with SCAD research, as well as the Research Analytics and Informatics team at AstraZeneca's Centre for Genomics for processing and analyzing sequencing data. The VCCRI study investigators thank C. M. Y. Wong, K. Mishra and R. Johnson for contributions to data collection and sample processing, as well as the Medical Genome Reference Bank, including the 45 and Up and ASPREE study patients who were controls for this study. The CanSCAD/MGI study investigators acknowledge the University of Michigan Precision Health Initiative and Medical School Central Biorepository for providing biospecimen storage, management, processing and distribution services, as well as the Center for Statistical Genetics in the Department of Biostatistics at the School of Public Health for genotype data management in support of this research. The MEGASTROKE project received funding from sources specified at http://www.megastroke. org/acknowledgements.html. The GTEx Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, as well as by the National Cancer Institute, National Human Genome Research Institute, National Heart, Lung, and Blood Institute, National Institute on Drug Abuse, National Institute of Mental Health and National Institute of Neurological Disorders and Stroke. We acknowledge the FMD Society of America and Vancouver SCAD Conference organizers for enabling study enrollments at patient meetings.
Corresponding author(s): Nabila Bouatia-Naji Last updated by author(s): Apr 17, 2023 Reporting Summary Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Portfolio policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection No software was used for data collection.