ORIGINAL ARTICLE Subtype Specificity of Genetic Loci Associated With Stroke in 16 664 Cases and 32 792 Controls

BACKGROUND: Genome-wide association studies have identified multiple loci associated with stroke. However, the specific stroke subtypes affected, and whether loci influence both ischemic and hemorrhagic stroke, remains unknown. For loci associated with stroke, we aimed to infer the combination of stroke subtypes likely to be affected, and in doing so assess the extent to which such loci have homogeneous effects across stroke subtypes. METHODS: We performed Bayesian multinomial regression in 16 664 stroke cases and 32 792 controls of European ancestry to determine the most likely combination of stroke subtypes affected for loci with published genome-wide stroke associations, using model selection. Cases were subtyped under 2 commonly used stroke classification systems, TOAST (Trial of Org 10172 Acute Stroke Treatment) and causative classification of stroke. All individuals had genotypes imputed to the Haplotype Reference Consortium 1.1 Panel. RESULTS: Sixteen loci were considered for analysis. Seven loci influenced both hemorrhagic and ischemic stroke, 3 of which influenced ischemic and hemorrhagic subtypes under both TOAST and causative classification of stroke. Under causative classification of stroke, 4 loci influenced both small vessel stroke and intracerebral hemorrhage. An EDNRA locus demonstrated opposing effects on ischemic and hemorrhagic stroke. No loci were predicted to influence all stroke subtypes in the same direction, and only one locus (12q24) was predicted to influence all ischemic stroke subtypes. CONCLUSIONS: Heterogeneity in the influence of stroke-associated loci on stroke subtypes is pervasive, reflecting differing causal pathways. However, overlap exists between hemorrhagic and ischemic stroke, which may reflect shared pathobiology predisposing to small vessel arteriopathy.

T he burden of stroke on global healthcare and society is substantial; it is consistently one of the leading causes of death and disability worldwide, 1 and a major cause of cognitive impairment and dementia.However, there exist significant gaps in our understanding of the pathological processes that underlie the disease.In recent years, GWAS (genome-wide association studies) have made considerable advances in identifying genetic components underlying complex traits, in many cases identifying novel disease pathways and treatments. 2 Characterizing the genetic component to stroke has been challenging, in part due to clinical heterogeneity, with at least 3 distinct major pathological processes (cardioembolism, large artery atherosclerosis, and small vessel disease) underlying the majority of ischemic strokes; and 2 processes underlying the majority of intracerebral hemorrhagic stroke (small vessel disease and cerebral amyloid angiopathy). 3,4However, recent GWAS have made considerable advances; 32 independent genomewide significant loci were identified in the MEGASTROKE project. 5The majority of these loci were identified as being associated with inclusive all stroke or ischemic stroke categories, rather than specific stroke subtypes.This is in part due to study design, with much larger samples for these broader categories and only a fraction of stroke cases having detailed phenotyping.Indeed, this finding is in contrast to earlier studies that identified loci such as HDAC9, PITX2 as being associated with specific subtypes. 6,7To interpret genetic risk associations in the context of biological mechanisms, a pertinent question is whether the newly identified stroke-associated loci truly confer risk across all stroke subtypes, or whether isolated or combinations of subtypes are affected.At least one of the novel variants (on chromosome 1q22) shows association with both ischemic and hemorrhagic stroke, which might point to some shared mechanisms underlying these clinically distinct entities, which have thus far been separated in genetic studies.
Conventional approaches to GWAS, which employ within study analysis and subsequent meta-analysis across groups, do not enable detailed model comparison across different subgroups.In this analysis, we used multinomial logistic regression on well-characterized subjects with individual-level data to investigate the association of all identified genetic GWAS loci to date with all stroke subtypes (cardioembolic [CES], large artery stroke [LAS], small vessel stroke [SVS], and intracerebral hemorrhage [ICH]), determining the most likely combination of stroke subtypes affected at each locus.We performed our analysis using 2 established subtyping approaches: the TOAST (Trial of Org 10172 in Acute Stroke Treatment), 8 and causative classification of stroke (CCS) system, 9 to provide a comprehensive account of these loci across available classification systems.Our overall aim was to evaluate genetic loci identified in previous studies using stroke data sets with well-defined phenotyping to determine if subtype specificity or cross-subtype associations could be identified.

METHODS
To minimize the possibility of unintentionally sharing information that can be used to reidentify private information, a subset of the data generated for this study are available from the database of Genotypes and Phenotypes (dbGaP) and can be accessed at https://www.ncbi.nlm.nih.gov/projects/gap/cgibin/study.cgi?study_id=phs000615.v1.p1.Trinculo v0.96 is available from https://sourceforge.net/projects/trinculo/files/.MEGASTROKE data is available from http://megastroke.org.All contributing studies were approved by institutional review committees.All research participants contributing clinical and genetic samples for analysis in this study provided written informed consent.Full methods are provided in the Data Supplement.

RESULTS
After QC, there were up to 16 664 cases and 32 792 controls remaining for analysis (Table 1).In the merged data set, a binomial genome-wide analysis of all cases against controls had a genomic inflation λ=1.09, while the linkage disequilibrium score regression (LDSCORE) intercept value was 1.04, 10 suggesting that the majority of inflation was due to polygenicity and that any bias introduced by merging the data sets was minimal.
Sixteen loci contained SNPs with log (Bayes factors) of at least 4 in analyses of alternative stroke classification systems: TOAST or CCS system (causative system).We took these 16 loci forward for further model selection.Plots for all loci under each classification system are provided in Figures I through XVI in the Data Supplement.For each of the 16 loci, we identified the most likely combination of associated phenotypes at each locus (Figure 1, Table 2) based on model selection.A comparison of odds ratios (ORs) for analyzed loci from   I in the Data Supplement.
For 7 loci, the combination of phenotypes most likely to be influenced by the lead genetic variant at the loci included both ischemic and hemorrhagic stroke subtypes.These are shown in Figure 2. At four of these loci: EDNRA, 1q22, MMP12, and SH3PXD2A, the ischemic subtype included SVS, highlighting shared mechanisms underlying ICH, and SVS, likely through predisposition to cerebral small vessel disease.At the EDNRA locus, the direction of association for ICH was opposite to that for LAS and SVS, pointing to contrasting influence on ischemic and hemorrhagic stroke risk.We explored whether ICH-associated loci were specific to deep or lobar ICH.As in previous reports, 11,12 associations at 1q22 and COL4A2 appear to be specific to deep ICH, with no effect in lobar ICH.For other regions, the evidence for specificity was more equivocal (Table II in the Data Supplement).
For 4 loci HDAC9, PITX2, ZFHX3, and ANK2, only one phenotype was affected by the lead variant (Figure 1, Figures X, XIII, XVI, and V in the Data Supplement) in the most likely configuration across all classification systems.Several other loci 9p21, 12q24, 16q24, and FOXF2 were associated with only one phenotype under particular classification systems but did not show consistency across TOAST and CCS (Figures II, III, IV, and IX in the Data Supplement).For TSPAN2, which was previously identified as being associated with LAS, 13 the best-fit model also included CES under CCS, albeit with a much weaker effect than LAS (rs17479660; CES, OR=1.08;LAS, OR=1.19 under CCS).Echoing previous results, the locus showed much stronger significance under CCS classifications than under TOAST (Figure XV in the Data Supplement).
For COL4A2, the strongest association found under TOAST was for rs9515201.The most likely model contained ICH (OR=1.14)and SVS (OR=1.13),consistent with findings from previous analyses. 12However, under CCS an alternate SNP, rs1927349, was the strongest associated.No association with SVS was observed, and a weak association with CES was observed instead.Reasons for this discrepancy between CCS and TOAST are not immediately clear, but nonoverlapping samples between the 2 classification systems are a likely factor.
The mean (SD) number of stroke subtypes affected at each locus were 1.88 (0.89) under TOAST and 1.69 (0.87) under CCS.Under CCS, the most common combination of affected subtypes was SVS and ICH (4 loci).

DISCUSSION
We performed a large-scale genetic analysis, characterizing the effects of established stroke risk loci with ischemic and hemorrhagic stroke subtypes in up to 16 664 cases and 32 792 controls.Our main findings are 2-fold.First, for the vast majority of loci studied, multiple but never all stroke subtypes were affected at the locus.Only one locus (12q24) was assumed to influence all ischemic stroke subtypes.This indicates that although these loci were identified in analyses of inclusive stroke phenotypes, in the main, their effects are specific to particular combinations of stroke subtypes.The mean number of subtypes affected was 1.88 for TOAST and 1.69 for CCS classification systems.Notable exceptions were the PITX2 and ZFHX3 loci, which were associated with cardioembolic stroke most likely through atrial fibrillation (for which they are well-established loci 14 ), and HDAC9 which is associated with large vessel stroke.Under TOAST, the FOXF2 locus was associated solely with SVS.However, under CCS, LAS was also implicated.For CCS, the 9p21 locus was predicted to influence only LAS.However, under TOAST, SVS was also implicated.Our analyses suggest that ANK2 confers risk of stroke predominantly through its influence on ICH.We were unable to identify any loci for which the most likely model included all stroke phenotypes in the same direction and only one (12q24) which for which the most likely model included all ischemic stroke subtypes.Second, we find evidence that several loci influence both hemorrhagic and ischemic stroke.This was evident for 7 loci in total (1q22, COL4A2, EDNRA, LINC01492, MMP12, SH3PXD2A, and CDK6).Under CCS, 4 loci (SH3PXD2A, MMP12, EDNRA, and 1q22) influenced both SVS and ICH, highlighting shared mechanisms underlying small vessel disease.Previous GWAS analyses have tended to separate ischemic and hemorrhagic stroke on the basis of presumed differing causes.Our results suggest that including hemorrhagic alongside ischemic stroke in multiphenotype analyses will provide further insights.
For one locus EDNRA (endothelin receptor type A), the association with ICH was in the opposite direction to the ischemic stroke subtypes, suggesting opposing risk mechanisms.][17][18] The locus has also been associated with migraine in candidate gene studies, 19 but this has not been validated in GWA studies and is likely a false positive. 20ET A (EDNRA encodes the type A receptor) for ET-1 (endothelin-1), a potent vasoconstrictor with proinflammatory effects.ET A -specific antagonists increase nitric oxide-mediated endothelium-dependent relaxation, reduce ET-1 levels and inhibit atherosclerosis in mice, 21 suggesting that higher levels of ET A are proatherogenic: consistent with the observation that higher ET A levels are observed in atherosclerotic plaques. 22Based on this, one might expect the EDNRA risk variant (C allele of rs17612742 in this study) to lead to increased risk of ischemic stroke through elevated ET A levels.Indeed, in GWA studies of intracranial aneurysm, the susceptibility variant (in LD with the T allele of rs17612742 in our study) was shown to result in higher transcription factor binding affinity, likely resulting in repression of the transcriptional activity of EDNRA. 17This suggests that carriers of the C allele have lower levels of EDNRA, which consequently higher ET-1 levels and greater susceptibility to atherosclerosis.The reason why for carriers of T allele lower levels of ET A might promote intracranial aneurysm and intracerebral hemorrhage is not immediately obvious, but several mechanisms are possible.Levels of ET-1 have been linked to vascular remodeling, an important process underlying ICH and IA 23,24 ; subtle changes in this process induced by altered availability of ET A is one such mechanism.Deep ICH and ischemic SVS arise due to the same arteriopathy that arises in the deep perforating arteries of the brain.The EDNRA variant in this study points to a mechanism that influences whether the resulting pathology is ischemic or hemorrhagic and as such warrants further detailed investigation.
Some loci were notably more significant when phenotyped using CCS; SH3XPD2A, MMP12, TSPAN2, FOXF2, and EDNRA, which might point to CCS having greater accuracy and, therefore, utility in stroke GWA studies.However, the opposite was also true for others: 16q24, HDAC9.We note that some differences may be due to the fact that not all individuals were subtyped under both CCS and TOAST; the TOAST cohort was a least 20% larger.A detailed discussion of the relative merits of TOAST and CCS is beyond the scope of this article, but our results highlight that the importance of collecting individual phenotypic qualities that make up the etiologic subtypes in genetic studies of stroke so that associated loci can be more systematically examined.
Our study has several strengths.The data set was a large stroke population, including intracerebral hemorrhage and ischemic stroke cases, the majority of which were subtyped under both TOAST and CCS.We had full access to genotype-level data enabling us full control over all analyses.The implementation of a multinomial regression approach enabled us to systematically assess which stroke subtypes were likely to be affected at each locus, which would not be formally possible under standard binomial regression approaches which analyze each stroke subtype separately.Ultimately, mechanistic studies will be required to determine the influence of associated genetic variants, but analyses such as this have utility in directing the focus and model systems suitable for such follow-up studies.
Similarly, there are limitations.We present results for the most likely combination of stroke phenotypes affected at each locus: the best-fitting model.We had limited statistical power to determine with statistical certainty that this was the correct model; significantly larger samples would be required to achieve this.One consequence of this is that there remains the potential that some associations are due to random variation rather than true biological differences.It would, therefore, be prudent to treat some of the findings here as preliminary until confirmed in larger samples.Due to the challenges of performing these analyses across different ancestry populations, and as we only had a small number of non-European ancestry ICH cases available which could lead to overfitting, we performed analyses in European populations only.The results can, therefore, not be generalized to all populations.Repeating these analyses once sufficient data from other ancestral groups are available should be highly prioritized to ensure advancements in the field are made for all ancestral groups.In all analyses we assume there is a single causal variant at the locus, which may not be true in all cases.Our analyses are based on use of a default prior, which has been used in many genetic studies.An alternative is to derive an empirical prior from associated genetic loci.As more loci are identified as being associated with stroke, this will become a more realistic possibility and should be explored in future analyses.

CONCLUSIONS
Our findings suggest that although large scale genomewide studies of broad all stroke or all ischemic stroke phenotypes are able to identify multiple associations; it should not be assumed that such associations confer risk equally across stroke subtypes.Heterogeneity in the influence of genetic variants on different stroke subtypes is the norm, not the exception.The multinomial regression approach used here provided insights into the etiological stroke subtypes most prominently influenced by genetic variants at these loci-a prerequisite to decide on the most appropriate model systems to choose for further mechanistic studies.Stroke is a complex, heterogeneous disorder: our findings highlight the ongoing need for large, well-phenotyped case collections and tailored analytic strategies to decipher the underlying genetic mechanisms.
MEGASTROKE and the most recent ICH publication with those from our analysis showed high consistency (r 2 =0.95, Figure XVII in the Data Supplement) despite slightly differing samples.Linkage Disequilibrium (LD) values between our lead and previously published SNPs for the 16 loci in this analysis are provided in Table

Figure 1 .
Figure 1.Stroke subtypes in best fitting model at each locus, for Causative Classification System (CCS) and TOAST (Trial of Org 10172 Acute Stroke Treatment Classification System) classification systems, with size weighted by association odds ratio.Results are presented for the 16 loci showing log (Bayes factor) >4 in causative classification system of stroke (CCS) or TOAST analyses.Classification/locus combinations in gray indicate that the locus did not reach log (Bayes factor) >4 in that analysis.CES indicates cardioembolic stroke; LAS, large artery stroke; SVS, small vessel stroke; and ICH, intracerebral hemorrhage.

Figure 2 .
Figure 2. Local plots showing associations with regions conferring risk of ischemic and hemorrhagic stroke and odds ratios for all stroke subtypes.A, 1q22 region; (B) Endothelin receptor type A region; (C) Cyclin dependent kinase 6 CDK6 region; (D) Long Intergenic Non-Protein Coding RNA 1492 region; (E) SH3 And PX Domains 2A region; (F) Matrix metalloproteinase 12 region; and (G) COL4A2 collagen type IV alpha 2 chain region.Results are presented for the classification system in which the locus showed strongest significance.Stroke subtypes in bold indicate those included in the best fitting model and, therefore, predicted to be influenced by the lead genetic variant, based on Bayesian model selection.CCS indicates causative classification of stroke system; CES, cardioembolic stroke; LAS, large artery atherosclerotic stroke; SVS, small vessel stroke; ICH, intracerebral hemorrhage; TOAST, Trial of Org 10172 Acute Stroke Treatment Classification System; and UND, stroke of undetermined cause.

Table 1 . Sample Sizes TOAST CCS N Age (mean [SD])
CCS indicates causative classification of stroke system (causative system); CES, cardioembolic stroke; ICH, intracerebral hemorrhage; LAS, large artery atherosclerotic stroke; SVS, small artery occlusion stroke; TOAST, Trial of Org 10172 Acute Stroke Treatment Classification System; and UND, stroke of undetermined cause.