Genetic Risk Analysis of Coronary Artery Disease in a Population-based Study in Portugal, Using a Genetic Risk Score of 31 Variants

Background Genetic risk score can quantify individual’s predisposition to coronary artery disease; however, its usefulness as an independent risk predictor remains inconclusive. Objective To evaluate the incremental predictive value of a genetic risk score to traditional risk factors associated with coronary disease. Methods Thirty-three genetic variants previously associated with coronary disease were analyzed in a case-control population with 2,888 individuals. A multiplicative genetic risk score was calculated and then divided into quartiles, with the 1st quartile as the reference class. Coronary risk was determined by logistic regression analysis. Then, a second logistic regression was performed with traditional risk factors and the last quartile of the genetic risk score. Based on this model, two ROC curves were constructed with and without the genetic score and compared by the Delong test. Statistical significance was considered when p values were less than 0.05. Results The last quartile of the multiplicative genetic risk score revealed a significant increase in coronary artery disease risk (OR = 2.588; 95% CI: 2.090-3.204; p < 0.0001). The ROC curve based on traditional risk factors estimated an AUC of 0.72, which increased to 0.74 when the genetic risk score was added, revealing a better fit of the model (p < 0.0001). Conclusions In conclusion, a multilocus genetic risk score was associated with an increased risk for coronary disease in our population. The usual model of traditional risk factors can be improved by incorporating genetic data.


Introduction
Coronary artery disease (CAD) has become a major health problem worldwide, with increasing prevalence and high morbidity and mortality. Traditional risk factors (TRFs) are insufficient to identify asymptomatic high-risk individuals. Epidemiology and family studies have long documented that approximately 50% of the susceptibility for heart disease is genetic. 1 Knowledge of genetic predisposition to cardiac disease is crucial for its comprehensive prevention and treatment.
Although much of the genetic basis of coronary disease remains to be discovered, some progress has been made using both candidate gene and genome-wide association studies (GWAS). 2 In fact, a number of genetic variants have been previously identified at several genomic regions associated with CAD. 2 Until now, the risk attributable to any individual variant has been modest. However, discovering and combining multiple loci with modest effects into a global genetic risk score (GRS) could improve the identification of high-risk populations and improve individual risk assessment. Therefore, the purpose of this work was to generate a multilocus GRS based on common variants previously shown to be associated with CAD, and evaluate whether it is independent of TRFs and improves the predictive ability of a model based only on TRFs.

Study Population
Study population was enrolled from GENEMACOR (GENEs in Madeira Island Population with CORonary artery disease), a population-based ongoing case-control registry of CAD with 2,888 participants, 1,566 cases (mean age 53.3 ± 8.0 years, 79.1% male) and 1322 controls (mean age 52.7 ± 7.8 years, 76.4% male). Cases were selected from patients discharged after being admitted for myocardial infarction/unstable angina diagnosed according to the previously described criteria, 3 or with CAD confirmed by coronary angiography with ≥ 1 coronary lesions of ≥ 70% stenosis in ≥ 1 major coronary artery or its primary branches. Absent or non-flow limiting atheroma was excluded from the analysis. The control group consisted of healthy volunteers, without symptoms or history of CAD, selected from the same population. All controls underwent clinical assessment of conventional cardiovascular risk factors, an electrocardiogram (ECG), and, in doubtful cases, an exercise stress test, a stress echocardiography or computerized tomography for calcium scoring. Cases and controls were matched for gender and age. Inclusion criteria comprised an age limit of 65 years and being a permanent resident to avoid genetic admixture. Principal Component Analysis (PCA) 4 was used for analysis of population stratification for possible genetic admixture and detection of significant genetic outliers (< 5%). 4 The study was approved by the Hospital ethics committee according to the Declaration of Helsinki and all patients provided written informed consent.

Data collection
Data was collected from all subjects in a standardized file comprising demographic, clinical characteristics and TRFs traditional risk factors (gender, age, level of exercise, smoking status, arterial hypertension, dyslipidemia, diabetes, family history of CAD, body mass index (BMI), heart rate and pulse wave velocity (PWV).
"Smokers" referred to current smokers or subjects with less than 5 years of smoking cessation. 5 Essential hypertension was considered when patients, at the entry into this study, were already diagnosed and/or had been on antihypertensive medication for more than 3 months or newly diagnosed hypertensives with systolic blood pressure (SBP)/diastolic blood pressure (DBP) ≥ 140/90 mmHg measured on at least 3 occasions. 6 Dyslipidemia was defined for control population as low-density lipoprotein (LDL) > 140 mg/dL, high-density lipoprotein (HDL) < 45 mg/dL for women and < 40 mg/dL for men, Triglycerides > 150 mg/dL and apolipoprotein (Apo) B > 100 mg/dL. For patients (at high risk) dyslipidemia was considered when LDL > 100, HDL < 45 mg/dL for women and < 40 mg/dL for men, triglycerides > 150 mg/dL, Apo B > 100 mg/dL and non-HDL (total cholesterol-HDL) > 130 mg/dL. 7 Subjects were classified as having diabetes if they were taking oral anti-diabetic medication or insulin or if their fasting plasma glucose was higher than 7.0 mmol/L or 126 mg/dL. 8 Subjects were considered to have a family history of premature cardiovascular disease (CVD) if the father or brother had been diagnosed with CVD under the age of 55 or mother or sister under the age of 65.
The definition of other TRFs was based on the standard criteria, as previously reported. 9,10 Biochemical analysis Blood samples were extracted after 12 hours' fasting. Biochemical analyses were performed at the Central Laboratory of the Hospital, according to standard techniques. In order to determine total cholesterol, HDL, LDL, triglycerides and glucose, blood samples were placed in dry tubes, centrifuged half an hour later at 3,500 g and subsequently quantified by an enzymatic technique using an "AU 5400" (Beckman Coulter) autoanalyzer. Biochemical markers such as lipoprotein-a -Lp(a), (Apo B), and high-sensitivity C-reactive protein (hs-CRP) were quantified by immunoturbidimetry also using an "AU 5400" (Beckman Coulter) automatic system.

Single Nucleotide Polymorphisms (SNP) selection
Two parallel approaches were employed to identify SNPs for the GRS. In the first approach, we searched the National Human Genome Research Institute database, which included SNPs identified by means of GWAS and catalogued based on phenotype and/or trait. We searched for the keywords: "coronary artery disease", "coronary disease", "myocardial infarction" and "early myocardial infarction." The second approach included SNPs that were identified through candidate gene approaches, included in a published GRS for CAD.
Including criteria included genes described in previous studies with an Odds Ratio (OR) for CAD ≥ 1.1 and a minor allele frequency (MAF) > 5%. Genes with low Hardy-Weinberg equilibrium (p < 0.002) (after Bonferroni correction) were excluded.

Genetic analyses
Genetic analyses were performed at the Human Genetics Lab of the University of Madeira. Genomic DNA was extracted from 80 µl of peripheral blood using a standard phenol-chloroform method. A TaqMan allelic discrimination assay for genotyping was performed using labelled probes and primers pre-established by the supplier (TaqMan SNP Genotyping Assays, Applied Biosystems).
All reactions were done on an Applied Biosystems 7300 Real Time PCR System and genotypes were determined using the 7300 System SDS Software (Applied Biosystems, Foster City, USA) without any prior knowledge of individual's clinical data. Quality of genotyping techniques was controlled by the inclusion of one non-template control (NTC) in each plate of 96 wells. All SNPs TaqMan assays had blind duplicates accounting for 20% of all samples. Some SNP genotypes were randomly confirmed by conventional direct DNA sequencing, as 10-15% of all samples were re-amplified for sequencing. Call rates for SNPs in the GRS were 98%-100% and a minimum 95% call rate was set for quality control.

Computation of the GENETIC RISK SCORE
We have tested several models to construct the GRS using both non-weighted and weighted scores, taking into consideration each pattern of inheritance for each gene locus. An additive score (AGRS) was generated, i.e., for each one of the 31 variants a score of 0, 1, and 2 was defined as there were 0, 1 or 2 risk alleles, by calculating the accumulated sum of the risk alleles in these variants. Each individual could be assigned a GRS of 0-62. Additionally, a multiplicative GRS (MGRS) was calculated by multiplying the relative risk for each genotype.
Validation of the risk score calculation was performed in a random sample of 597 patients (20%).

Statistical analysis
Categorical variables were expressed by frequencies and percentages and compared by the Chi-squared test or Fisher's exact test. Continuous variables were expressed as mean ± standard deviation (SD) or median (1st quartile -3rd quartile) and compared by Student's t-test (unpaired) or Mann-Whitney, as appropriate. The Kolmogorov-Smirnov test and the Levene´s test were used to test the assumption of normality and the homogeneity of the variables. All analyses were considered significant when p values were less than 0.05.
Binary logistic regression was used to determine the combined and separate effects of the variables on the risk for angiographic CAD. GRS was modeled using as a continuous variable and as quartiles, using the first quartile as the reference category. Multivariate analyses were used to adjust for 7 covariates also reported to be associated with CAD. We plotted receiver operating characteristic (ROC) curves and calculated the area under the curve (AUC) for logistic regression models including TRFs without and with GRS (quartiles). Pairwise comparison of ROC curves was performed using the Delong test. 11 The model calibration was tested with Hosmer-Lemeshow goodness-of-fit test. A P-value less than 0.05 was considered statistically significant. Collinearity between the variables was measured by assessment of tolerance and variance inflation factor (VIF).
Associations of SNPs with CAD were considered significant at p < 0.05 and in aggregate with GRS models at p < 0.0015 applying Bonferroni correction.
For MAF of 30%, the study had 70% power to detect an OR for CAD of 1.3 and > 90% for OR ≥ 1.35, for 2-sided alpha of < 0.05 for 2,000 cases and 1,000 controls. Power calculations used G power Statistical Power Analyses.
The potential of GRS to improve individual risk stratification then was measured using the net reclassification improvement (NRI) method, 12 defined as the percentage of subjects in each subgroup changing categories when the new model of GRS (in quartiles) was added. The integrated discrimination improvement (IDI), defined as the incremental improvement prognostic value of GRS, was compared between cases and controls. NRI was computed by categorical and non-categorical (continuous) variables using the PredictABEL package available in R software (version 3.

Results
Baseline characteristics of the population Table 1 shows the baseline characteristics of our population. As expected, cases and controls showed no significant differences concerning gender and age, since this was a selection criterion. Higher frequency of dyslipidemia, diabetes, hypertension, physical inactivity, smoking habit, alcohol consumption, and family history of premature cardiovascular disease was found in CAD patients when compared to the controls (p < 0.0001). Also, PWV, BMI and waist-to-height ratio were higher in cases than in controls, with statistical significance (p < 0.05) ( Table 1). The other biochemical variables analyzed such as hemoglobin, leucocytes, fibrinogen, homocysteine and hs-CRP > 3 showed significantly higher levels in the coronary patients group when compared to the controls (p < 0.05) ( Table 1).

Computation and analysis of Genetic Risk Score
Deviation from Hardy-Weinberg equilibrium for the 33 genotypes at individual loci were assessed using the Chi-squared test and p < 0.002 with Bonferroni correction for all SNPs included. LPA gene variant was excluded for further analyses due to its low Hardy-Weinberg p-value (p < 0.002). Linkage disequilibrium for the mutually adjusted SNPs within the genes was studied. CDKN2B gene was excluded because of the strong linkage disequilibrium with another selected SNP, rs1333049, which resides in the 9p21 region. The remaining 31 SNPs were included for further analysis (Supplementary Table 1).
In this study, the MGRS had the highest AUC value for assessing the risk for CAD disease with a specificity of 62.3% and sensitivity of 54% (data not shown) and therefore this model was computed in the subsequent analyzes (Supplementary Table 2).
The MGRS of 31 SNPs was significantly higher in CAD cases than in controls (0.67 ± 0.73 vs 0.48 ± 0.53; p < 0.0001), even by quartile and gender discrimination ( Table 2).
A normal distribution of risk alleles in the total sample set including cases and controls is shown in Figure 1. While CAD patients exhibited lower GRS values, risk alleles were more prevalent in this group than in controls. In CAD patients, a mean of 27 risk alleles was seen in 52% of the individuals, and a mean of 26 risk alleles was found in 53% of controls ( Figure 1).
When analyzed in deciles, GRS showed that the increase in the number of risk alleles was significantly associated with CAD A logistic regression analysis was performed with GRS quartiles, using the first as the reference category. Results showed an increase in CAD risk with statistical significance across the 2 nd , 3 rd and 4 th quartiles with respective ORs and CIs of 1 The reduced contribution of dyslipidemia on CAD risk may be due to standard use of statins in CAD patients. Extended adjustment for cofounding variables (gender, age, heart rate, PWV, low exercise level, BMI and family history of CAD) revealed modest increases in the OR for TRFs and the 2 nd and 3 rd quartiles of GRS.
We used VIF to test for multi-collinearity among the variables included in our GRS adjusted logistic regression model. Tolerance and VIF were respectively > 0.1 and < 10 attesting for no significant collinearity between variables included in the adjustment model.  Two ROC curves were plotted based on the TRFs without and with the GRS (Figure 3). The first ROC curve estimated an AUC of 0.72, which increased to 0.74 when the GRS was added, revealing a better fit of the model (p < 0.0001) (Figure 3).
The NRI and its p value were used to make conclusions about improvements in prediction performance gained by adding a set of biomarkers to an existing risk prediction model. The addition of GRS quartiles to TRF improved the risk classification of the models (Table 4). This new marker provided a continuous NRI of 31% (95% CI: 23.8-38.3%; p < 0.0001) with 14.6% reclassification of CAD patients and 16.4% of healthy control population (Table 4).   NRI was also computed using categorical variables and applied to this case-control study and was defined as the percentage of subjects changing categories in each subgroup when adding the new marker (CAD quartile score). Movement towards a better category (higher in patients than in controls) was calculated to address a potential impact for clinical use. NRI showed higher improvement capacity in reclassifying 19.5% of patients from the 50-75% category to the highest risk (75-100%) category. Likewise, 14.1% of healthy controls were moved down into a lower risk category, from 25-50% risk category to < 25% one (Table 5).

Discussion
Several years ago, polymorphisms involved in specific biological pathways, relevant to coronary atherosclerosis, were genotyped to determine their association with CAD. This candidate gene approach revealed about 30 high-confidence SNPs loci with significant effects on atherosclerosis. 13 However, following traditional candidate gene approach has generated many conflicting results or with weak associations; replication studies are necessary for consistent validation of these results.   After the development of high capacity arrays in 2008, 15 GWAS examined millions of polymorphisms simultaneously in several ethnical subpopulations with a case-control design. The standardized minimum significance level set at 1x10 -5 added reliability to cardiovascular genetics and put it into perspective. 16 In 2007, Samani et al. 17 first identified chromosomal loci that were strongly associated with CAD in the Wellcome Trust Case Control Consortium (WTCCC) study (which involved 1,926 case subjects with CAD and 2938 controls) and looked for replication in the German MI (Myocardial Infarction) Family Study. 17 In the following years, a surprisingly large number of gene variants were consistently reported to be associated with CAD. The 9p21 variant was the most frequently gene variant reported across populations. The huge consortium of Wellcome Trust and three other European research groups joined for the CARDIOGRAM project that confirmed, in a very large sample (> 22,000 cases) of individuals of European ancestry, a 29% increase in risk for MI per copy of the rs1333049 9p21 variant (p = 2×10 -² 0 ). 18 Our research group replicated this 9p21 variant analysis in the Portuguese population and found a CC genotype prevalence of 35.7% in CAD patients, with an adjusted OR of 1.34, p = 0.010. The adjusted OR for TRF of CC genotype was 1.7 (p = 0.018) and CG genotype of OR = 1.5, p = 0.048. The authors concluded that although the mechanism underlying the risk is still unknown, the robustness of this risk allele in risk stratification for CAD has been consistent, even in very different populations. The presence of the CC or CG genotype may thus prove to be useful for predicting the risk of developing CAD in the Portuguese population. 19 The most recent meta-analysis of GWAS for (CAD) identified 46 genome-wide loci with significant association and 104 genome-wide loci potentially associated with increased risk. 20,21 In our study, we found a gradual and continual increase in CAD risk with increasing number of CAD risk alleles carried. Individuals in the bottom decile are naturally protected and subjects in top decile of the GRS had a CAD risk of 2.472 (1.755 -3.482). Even though the score distribution overlaps between cases and controls, the GRS is significantly associated with CAD risk and can be used to identify subjects at highest risk in terms of lifestyle or therapeutic interventions.

Figure 3 -ROC curves based on the baseline model (traditional risk factors, TRFs) and after adding the genetic risk score (GRS) (quartiles) in predicting the risk for coronary artery disease. The two curves are based on logistic regression models incorporating conventional risk factors (diabetes, dyslipidemia, smoking and hypertension) with and without the GRS. AUC indicates area under curve. The Delong test compares the difference between the two AUCs (p < 0.0001).
Our results are similar to others reports in Caucasians populations where GRS with 13, 29 or 109 SNPs 22-24 were independent and marginally increased the predictive power of TRF conferred either by AUC increases, C-index changes or more modern discriminative statistical methods like reclassification measures or improved discrimination.
We report a higher OR for the 4 th quartile of GRS (2.59) compared to 1.66 reported by Ripatti et al. in the highest quintile. 22 When comparing the relative weight of the GRS in the multivariate logistic analysis we found slightly lower OR than smoking, hypertension, and dyslipidemia. In Ripatti´s 22 cohort, a weighted GRS was also an independent predictor even after adjusting for age, sex and TRFs in a Northern European population-based trial. The relative risk of the GRS based on 13-SNP was also lower than that of dyslipidemia and comparable to the effects of hypertension. 22 An increased power to TRFs definition has been given in this study. For instance, we have used a broad dyslipidemia term including Apo B levels as indicated by 2016 lipid guidelines. 7 Moreover, we have not considered ex-smokers until 5 years of cessation to account for the risk for CV disease events decrease be comparable to a nonsmoker. 5 Thanassoulis et al. 24 demonstrated that adding to a 13 SNP-based GRS, 89 SNPs associated with modifiable risk factors did not increase the power of the GRS reporting a HR of 1.01 (95% CI 0.99 -1.03; p = 0.48). This revealed that the weak association of polymorphisms with CAD risk factors in GRS analysis could be masked by the relative stronger effect of other polymorphisms. Considering the lack of a significant association of lipid profiles with CAD risk, Jansen et al. reported in 2015 that several SNPs associated with type 2 diabetes mellitus were related with CAD risk. 25 Recently, Webb et al. identified 6 new loci associated with CAD at genome-wide significance. The study confirmed a pleiotropy between lipid traits, blood pressure phenotypes, body mass index, diabetes, and smoking behavior. 26 Our GRS is an assembly of risk factors and non-risk factors-related SNP, reinforcing the genotype-phenotype interactions.

Limitations of this study
The main clinical utility of the GRS in our population is a modest improvement in risk stratification. GRS seems to be a better indicator of patients at a higher than average risk for DAC as compared with TRF stratification. The number and type of SNPs included is limited in our study and a larger number of GWAS hit SNPs should be included in further studies. Nevertheless, the increasing capability of analyzing multiple SNPs in GRS so far have not been translated into increasing ability of risk prediction.
Finally, this study did not include a gene-gene (G-G) and gene-environment (G-E) analysis. It is expected that, as better statistical significance arises from those interplays, the G-G and G-E incorporation in GRS plus TRF will increase our ability to accurately and individually predict risk.

Conclusions
We conclude that a multilocus GRS based on multiple variants of genetic risk was associated to an increased cardiovascular risk in a Portuguese population. We found that a GRS calculated with the 31 studied SNPs was significantly associated to CAD and that 25% of individuals who carry the greatest risk alleles have, approximately,