Identification of 15 novel risk loci for coronary artery disease and genetic risk of recurrent events, atrial fibrillation and heart failure

Coronary artery disease (CAD) is the major cause of morbidity and mortality in the world. Identification of novel genetic determinants may provide new opportunities for developing innovative strategies to predict, prevent and treat CAD. Therefore, we meta-analyzed independent genetic variants passing P <× 10−5 in CARDIoGRAMplusC4D with novel data made available by UK Biobank. Of the 161 genetic variants studied, 71 reached genome wide significance (p < 5 × 10−8) including 15 novel loci. These novel loci include multiple genes that are involved in angiogenesis (TGFB1, ITGB5, CDH13 and RHOA) and 2 independent variants in the TGFB1 locus. We also identified SGEF as a candidate gene in one of the novel CAD loci. SGEF was previously suggested as a therapeutic target based on mouse studies. The genetic risk score of CAD predicted recurrent CAD events and cardiovascular mortality. We also identified significant genetic correlations between CAD and other cardiovascular conditions, including heart failure and atrial fibrillation. In conclusion, we substantially increased the number of loci convincingly associated with CAD and provide additional biological and clinical insights.


Supplementary Note
Summary of the candidate genes in the novel associated loci Locus 1q21.3, rs11810571 (TDRKH): The function of TDRKH is unknown in the literature, whereas the functions of the isoforms encoded by RORC, RORγ and RORγt are complex and widely studied 1 .
Locus 3p21.31, rs7623687 (RHOA, AMT, TCTA, CDHR4 and KLHDC8B): RHOA (Ras homolog gene family, member A) is a small GTPase protein from the Rho family. The effects of RHOA are not all known it is primarily associated with cytoskeleton regulation, mostly actin stress fibers formation and actomyosin contractility. The RhoA/Rho-associated coiled-coil-forming kinase (ROCK) pathway participates in acute myocardial infarction and inhibiting of this pathway with atorvastatin improves the post-infarct microenvironment 2 . The AMT gene provides instructions for making an enzyme called aminomethyltransferase. This is one of four subunits that make up glycine cleavage enzyme. This complex is active in mitochondria. Mutations in this gene are responsible for ~15% of the glycine encephalopathy 3 . Only a few studies have been studying the role of TCTA (T-cell leukemia translocation-altered), it has been reported to play a role in human tumorigenesis and osteoclastogenesis 4 and inhibit proliferation of fibroblast-like synoviocytes 5 . CDHR4 (cadherin-related family member 4) cadherins are calciumdependent cell adhesion proteins. They preferentially interact with themselves in a homophilic manner in connecting cells; cadherins may contribute to the sorting of heterogeneous cell types. KLHDC8B encodes a protein which forms a distinct betapropeller protein structure of kelch domains (allowing for protein-protein interactions). Mutations have been associated with Hodgkin lymphoma. Locus 3q21.2, rs142695226 (ITGB5 and UMPS): The product of ITGB5, integrin β5 is widely studied for it's role in cell adhesion and integrin-mediated signaling. It plays a role in angiogenesis, overexpression promotes new blood vessel formation in vivo by enhancing the binding capacity of circulating angiogenic cells to endothelial cells, among other molecular effects 6 . UMPS encodes a uridine 5'-monophosphate synthase, it catalyzes the reaction of orotic acid and ribose-5-phosphate to uridine monophosphate (UMP), an energy-carrying molecule.
Locus 3q25.2, rs433903 (ARHGEF26(SGEF) and DHX36): ARHGEF26 (Rho Guanine Nucleotide Exchange Factor 26) encodes a member of the Rho-guanine nucleotide exchange factor (Rho-GEF) family. These proteins regulate Rho GTPases by catalyzing the exchange of GDP for GTP; ARGHEF26 is also named SGEF, reported to play a crucial role in atherosclerosis and is suggested to be a potential therapeutic target 7 . DHX36 is a gene which is a member of the DEAH-box family of RNA-dependent NTPases. It may be involved in regulation of telomere length, function in sex development and spermatogenesis and may play a role in ossification [genecards].
Locus 4q21.21, rs10857147 (PRDM8 and FGF5): PRDM8 encodes a protein that belongs to a conserved family of histone methyltransferases that acts predominantly as negative regulators of transcription [genecards]. FGF5 is a member of the fibroblast growth factor family that play an important role in cell proliferation and differentiation; FGF5's major role is in regulation of hair length 8 .
Locus 4q27, rs11723436 (MAD2L1 and PDE5A): MAD2L1 is a component of the mitotic spindle assembly checkpoint it prevents the anaphase, until all chromosomes at the metaphase are aligned. PDE5A (Phosphodiesterase 5A) is a Protein Coding gene. It is involved in the regulation of intracellular concentrations of cyclic nucleotides and is important for smooth muscle relaxation in the cardiovascular system. PDE5 expression is increased in patients with advanced cardiomyopathy 9 .
Locus 4q31.22, rs35879803 (ZNF827): ZNF827 is a largely unknown zinc finger protein. It has been reported to recruit the NuRD (Nucleosome Remodeling Deacetylase) complex that has chromatin remodeling and histone deacetylase activities. The NuRD-ZNF827 complex promotes telomere-telomere recombination, it integrates and controls multiple mechanistic elements of 'alternative lengthening of telomeres' (ALT) activity 10 . Locus 6p22.3, rs35541991 (HDGFL1): HDGFL1 encodes hepatoma-derived growth factor-like 1. Variants near HDGFL1 have been genome wide associated with Total iron binding capacity 11 but its function remains to be determined.
Locus 11p15.2, rs1351525 (ARNTL): ARNTL (Aryl Hydrocarbon Receptor Nuclear Translocator Like), a transcriptional activator, and its product BMAL1, form the core components of the circadian clock and mainly known for interactions with CLOCK genes.
Locus 12q13.13, rs11170820 (HOXC4): There is not much known about HOXC4. HOXC4, is one of several HOXC genes located in a cluster on chromosome 12; three genes, HOXC5, HOXC4 and HOXC6, share a 5' non-coding exon. The homeobox genes encode a highly conserved family of transcription factors that play an important role in morphogenesis in all multicellular organisms. [genecards]. HOXC4 has been studied in relationship to differentiation of hematopoietic stem cells 12 and adipose tissue 13 .
Locus 12q24.31, rs2244608 (HNF1A, OASL): HNF1A is a frequent cause of monogenic diabetes (MODY-HNF1A) and highly expressed in liver, pancreas and the proximal tubule of the kidney. It has been shown to be highly associated with lipid levels 14 , and suggested to be involved in CRP, GGT, and other atherosclerotic and metabolic risk factors 15 . It plays a major role in the expression of various hepatic, renal, and pancreatic genes/proteins including megalin (Low density lipoprotein-related protein 2), cubilin 16 , PCSK9 17 . Altogether, HNF1A is a pleiotropic gene that is widely studied with many functions. OASL encods oligoadenylate synthetase enzymes, which are cytoplasmic dsRNA sensors belonging to the antiviral innate immune system. Locus 14q24.3, rs3832966 (TMED10, NEK9, ZC2HC1C, RPS6KL1, EIF2B2 and ACYP1): Little is known about TMED10's function. TMED10 is thought to be a type I membrane protein that is localized to the plasma membrane and golgi cisternae, involved in vesicular protein trafficking. NEK9 recently reported to be a cause for a lethal skeletal dysplasia. Loss of function results in defects of fibroblasts including a reduced proliferation capability and delayed cell cycle progression through the G1/S boundary and S-phase and could also be involved in ciliopathy 18  ACYP1 is a member of the acylphosphatase family. The encoded protein is a small cytosolic enzyme that catalyzes the hydrolysis of the carboxyl-phosphate bond of acylphosphates. Two isoenzymes have been isolated and described based on their tissue localization: erythrocyte (common) type acylphosphatase encoded by this gene, and muscle type acylphosphatase [genecards]. nothing is known about the function of ZC2HC1C (Zinc Finger C2HC-Type Containing 1C) and RPS6KL1 (Ribosomal Protein S6 Kinase Like 1).
Locus 16q23.1, rs33928862 (BCAR1): Breast cancer anti-estrogen resistance protein 1 is a protein that in humans is encoded by the BCAR1 gene and involved in various cellular events, basic signaling of developmental/physiological processes and involved in regulation homeostasis of various tissues, BCAR1's functions and role has been reviewed previously 19 . A variant in LD (r 2 =0.65), rs4888378, has been associated with Carotid Intima-Media Thickness and coronary artery disease risk 20 . Locus 16q23.3 rs7500448 (CDH13): CDH13 is a widely studied member of de cadherin family, it is an adhesion glycoprotein known as T-cadherin and is recognized as an LDL receptor, although different to other LDL recepters, it activates Erk 1/2 tyrosine kinase and the nuclear translocation of NF-kappaB 2122 . GVs near this gene have previously been genome wide associated with blood pressure 23 and adiponectin levels 24 (P=6.8×10 −165 ), among others. The locus has also been identified in one of the first genome wide association studies of coronary artery disease 25 , although not at genome wide significance. None of the reported SNPs were in LD (r 2 >0.001) with the current finding, rs7500448. We identified rs7500448 to be highly associated (P=8×10 −13 ) with pulse pressure in UK Biobank. Locus 19q13.2, rs138120077 and rs8108632 (B9D2, TGFB1, HNRNPUL1 and CCDC97): not much is known about B9D2's function, the encoded protein localizes to basal bodies and cilia, mutations cause Meckel syndrome 26 . TGFB1, transforming growth factor beta1, is one of the most widely studied genes. It is a multifunctional peptide which regulates proliferation, differentiation, adhesion, migration, among other functions and studied for its role in angiogenesis, cardiovascular syndromes and vascular biology 27-29 . rs2241718 near TGBF1 has been prioritized as a functional regulatory variant 30 but is in low LD with the 2 signals identified in our study. The heterogeneous nuclear ribonucleoprotein U-like 1 (HNRPUL1) gene encoding for a hetero-geneous ribonuclear protein believed to be involved in mRNA processing and transport 31,32 , candidate studies found significant associations between variants and CAD in high risk people with familial hypercholesteromelia 33 . Nothing is known for CCDC97 (Coiled-Coil Domain Containing 97), but it has been recently studied as a candidate for regulatory mechanisms of CAD, together with TGFB1 30 . This study showed that while the 3′-untranslated region variant at CCDC97/TGFB1, rs2241718, was predicted to affect binding, this variant might not alter endogenous CCDC97 levels, but rather serve as an enhancer for neighboring TGFB1 in human coronary artery smooth muscle cells.

Definitions used for UK Biobank analyses
Prevalent and incident coronary artery disease (CAD), hypercholesterolemia, hypertension, diabetes, myocardial infarction (MI), heart failure, atrial fibrillation / flutter, cerebral infarction was derived from self-reported (touchscreen questionnaire and verbal interview) and/or the diagnosis was captured using the Hospital Episode For defining the control group we excluded participants who reported that their mother, father or sibling suffered form 'heart disease' (Field ID 20107, 20110 and 20111). Information on smoking status was collected using the touchscreen questionnaire at baseline visit. Medication usage was collected at the baseline visit during a verbal interview by a trained nurse on prescription medications (Field ID 20003). Data on beta block-blocker and calcium channel-blocker therapy was defined with corresponding medication codes (beta-blockers and calcium channel blockers, please see below for the exact codes that were used). Body mass index was calculated using BMI value constructed from height and weight measured during the initial Assessment Centre visit (Field ID 21001) and Body composition estimation by impedance measurement (Field ID 23104). Blood pressure was measured using the manual reading (Field ID 93, 94) and automated reading (Field ID 4079, 4080) measurements. Pulse pressure was calculated by subtracting the diastolic from the systolic blood pressure value. Mean arterial pressure (MAP) was calculated by MAP = [ (2 x diastolic blood pressure) + systolic blood pressure] divided by 3. When multiple measurements during firs visit were available the mean of all measurements were averaged and used in the analysis. In the UK Biobank cohort PWV for ASI assessment was measured using the PulseTrace PCA2 (CareFusion, San Diego, USA) (Field-ID 21021). The PulseTrace PCA2 uses finger photoplethysmography to obtain the pulse waveform during a 10-15 seconds measurement using an infrared sensor clipped to the end of the index finger 34 . When multiple measurements were available the mean of all measurements were averaged and used in the analysis.
Beta-blocker medication codes: 1140866724, 1140866738, 1140860192, rs8108632 Supplementary Figure 1 | Regional plots of the 15 novel genome wide associated loci with CAD. LD (R 2 ) was based on the Europeans of 1000 Genomes Phase 1 v3. P-values were based on the CARDIoGRAMplusC4D GWAS data to provide an accurate overview of the P-value distribution among variants at each locus.