Shared polymorphisms and modifiable behavior factors for myocardial infarction and high cholesterol in a retrospective population study

Abstract Genetic and environmental (behavior, clinical, and demographic) factors are associated with increased risks of both myocardial infarction (MI) and high cholesterol (HC). It is known that HC is major risk factor that may cause MI. However, whether there are common single nucleotide polymorphism (SNPs) associated with both MI and HC is not firmly established, and whether there are modulate and modified effects (interactions of genetic and known environmental factors) on either HC or MI, and whether these joint effects improve the predictions of MI, is understudied. The purpose of this study is to identify novel shared SNPs and modifiable environmental factors on MI and HC. We assess whether SNPs from a metabolic pathway related to MI may relate to HC; whether there are moderate effects among SNPs, lifestyle (smoke and drinking), HC, and MI after controlling other factors [gender, body mass index (BMI), and hypertension (HTN)]; and evaluate prediction power of the joint and modulate genetic and environmental factors influencing the MI and HC. This is a retrospective study with residents of Erie and Niagara counties in New York with a history of MI or with no history of MI. The data set includes environmental variables (demographic, clinical, lifestyle). Thirty-one tagSNPs from a metabolic pathway related to MI are genotyped. Generalized linear models (GLMs) with imputation-based analysis are conducted for examining the common effects of tagSNPs and environmental exposures and their interactions on having a history of HC or MI. MI, BMI, and HTN are significant risk factors for HC. HC shows the strongest effect on risk of MI in addition to HTN; gender and smoking status while drinking status shows protective effect on MI. rs16944 (gene IL-1β) and rs17222772 (gene ALOX) increase the risks of HC, while rs17231896 (gene CETP) has protective effects on HC either with or without the clinical, behavioral, demographic factors with different effect sizes that may indicate the existence of moderate or modifiable effects. Further analysis with the inclusions of gene–gene and gene–environmental interactions shows interactions between rs17231896 (CETP) and rs17222772 (ALOX); rs17231896 (CETP) and gender. rs17237890 (CETP) and rs2070744 (NOS3) are found to be significantly associated with risks of MI adjusted by both SNPs and environmental factors. After multiple testing adjustments, these effects diminished as expected. In addition, an interaction between drinking and smoking status is significant. Overall, the prediction power in successfully classifying MI status is increased to 80% with inclusions of all significant tagSNPs and environmental factors and their interactions compared with environmental factors only (72%). Having a history of either HC or MI has significant effects on each other in both directions, in addition to HTN and gender. Genes/SNPs identified from this analysis that are associated with HC may be potentially linked to MI, which could be further examined and validated through haplotype-pairs analysis with appropriate population stratification corrections, and function/pathway regulation analysis to eliminate the limitations of the current analysis.


Introduction
Risk prediction and assessment are central parts of common complex disease prevention, including coronary heart disease (CHD) and its sequels, such as acute myocardial infarction (MI). Refining prediction strategies remains important for targeting treatment recommendations. [1][2][3][4] MI is a leading cause of death throughout the world. Approximately 450,000 people in the United States die from coronary disease per year. [5] The risk of MI increases with age, while the actual incidence is dependent on predisposing risk factors for atherosclerosis. [6] Potential risk factors of MI and atherosclerotic coronary artery disease have been reported as hyperlipidemia, diabetes mellitus, hypertension (HTN), tobacco use, male gender, age, and family history of atherosclerotic arterial disease; genetics. [7][8][9] Furthermore, family history of heart attacks, lack of physical activity, alcohol consumption, obesity, and stress are also considered as potential associated factors. [10][11][12][13] It was noted that having a history of high cholesterol (HC) was associated with a history of MI. [14][15][16] The genetics on either HC [17][18][19][20][21] or MI [22,23] have been examined. There are a number of studies that reported several of the known MI/CHD loci that are also associated with lipids traits and vice versa. [24] However, whether there are common genotypes or polymorphisms associated with both MI and HC is not confirmed or firmly established, [25][26][27][28] which are important and ideal drug or intervention target for precision medicine. [29][30][31] Furthermore, the genetic variants may play an important, but under-recognized role in modulating the effect of environmental exposures on the risk of MI and HC. [32][33][34][35] Currently, whether the joint and modulate effects (including the interactions of both genetic and environmental factors) improve the predictions of MI is understudied. In this article, we will examine whether there are common genetic variants that contribute to the HC and MI from a selected set of genetic variants for retrospective western population studies. Furthermore, we will evaluate the prediction power of these combined effects together with the identified significant environmental factors, which target to test the central hypothesis that a combination of common single nucleotide polymorphisms (SNPs) and environmental factors contributes to the risk of HC as well as MI.

Data source, design, study population, and variables
The study sample was randomly selected from the general populations of Erie and Niagara counties in New York State falling into age group 35 to 79 years matched by age within 5-year difference. [36] The study was approved by the ethical committees (institutional review board of the University at Buffalo), and all study participants signed informed consent. [37] For the current analyses, we only used partial data of 1837 Caucasian participants that have been genotyped, of which 818 having history of MI (209 women and 609 men) and 1019 with no history of MI (608 men and 411 women). The data set includes 29 environmental variables collected through participants' clinic visits, which include health behavior or lifestyle (smoking status, alcohol consumption) in addition to demographic (gender), clinical [body mass index (BMI)], systolic blood pressure (SBP), diastolic blood pressure (DBP), HTN, family history for heart disease, triglycerides (TG), high-density lipoprotein (HDL), low-density lipoprotein (LDL), and total cholesterol. Diabetic patients were excluded from participating in the study.

Main outcome measures
MI cases are defined as people who used to have MI (once or multiple) before or during data collection period, and the controls are defined as people who have never had MI by the time of data collection. HC, having a history of HC (once or multiple), are used as a primary lipid outcome measure given some missing issues for other lipid measures for these data.
2.4. Data analysis 2.4.1. Imputation-based data analysis for missing data and data preprocessing. Missing data analysis was conducted to examine missing pattern and mechanism, for example, missing of BMI was complete at random (P = .178 for little MCAR test). Multiple imputation was performed for near significant variables (P < .10), missing 10%. These include BMI (2% missing), smoking status (0.3%), HTN (0.1%), Meds for high HTN (6.7%), HC (1.1%), family history for heart disease (8.5%), drinking status in past 1 to 2 years (0.4%), rs3917356 (IL-1b1) (2.2%), rs2069825 (IL6) (2.9%), ADDU-000614 (ADDU) (1.1%), rs17237890 (CETP) (1.7%), rs2070744 (NOS) (2.3%), and rs2077647 (ESR1) (1.9%). There were 3 cases with outliers in amount of ethanol (z = 23.73, 14.33, 11.60), which were deleted, as the amount of ethanol was significantly different by MI. "Drinking status in past 1-2yrs" was recoded to make the sample size in each category less imbalanced: "Lifetime abstainers," "Irregular abstainers," and "Non-current drinker" were categorized as "non-current drinker," "current non-weekly drinker," "current weekly drinker" as "current drinker." 2.4.2. Association analysis and predictions. The unadjusted univariate association between all factors and MI was calculated and correlation between explanatory variables that may indicate collinearity were assessed pre hoc using Pearson correlation for continuous variables, Chi-square test for categorical variables, and t test or analysis of variance (ANOVA) for continuous and categorical variables. Generalized linear models (GLMs) were performed with the dominant genetic effect coding. Besides the genetic factors, demographic, clinical, and lifestyle factors are included as covariates and confounding factors in the multivari- able modeling processes, and adjusted odds ratios (ORs) and 95% confidence intervals (95% CIs) were calculated. The model significance was examined using the Omnibus test and the Hosmer and Lemeshow tests. Goodness of fit of model was assessed using deviance/df, AIC, BIC, the model with smaller deviance/df, AIC and BIC values fits the data better. Nagelkerke pseudo R 2 values were reported for the model performance. The presences of outliers (absolute standardized residual values ≥3) that may affect the stability of the model were assessed and removed using plot of leverage value, Cooks distance, and visual inspection. All 2-way interactions between significant explanatory variables were tested and retained in the final model if they significantly improved the model with P < .05 based on the x 2 test of model-fit improvement. For the prediction power of the model, random sampling was conducted to select 70% of the total sample as training data to build up the model, and the other 30% of the sample as testing data; AIC and classification/prediction accuracy are reported (see Fig. 1).

MI case-control.
In the MI case, 64.9% had a history of HC, while in the non-MI group, only 31.3% had HC, which indicated that HC was associated with MI. These numbers are much higher (due to the age class differences) than the numbers the National Health and Nutrition Examination Survey data reported for 2011 to 2012, which estimated 12.9% of U.S. adults aged 20 years and over (11.1% of men and 14.4% of women; 17.1% of non-Hispanic whites) had HC. [31][32][33][34][35]41] There were significant differences in gender, smoking (lifetime total pack years), drinking status, and amount of ethanol, BMI, SBP, DBP, TG, total cholesterol, HDL, LDL, high blood cholesterol, whether HTN and high blood cholesterol was treated by meds, and family history on heart disease between MI cases and controls (all P < .05). Within the MI case group, two-thirds of participants had HC (65.9%), while one-third of participants in the control group had HC (31.6%), which is significantly different (x 2 = 210.895, P < .001). These descriptive statistics confirm that HC is associated with MI for this Caucasian population. Summary of environmental characteristics for the total sample, the MI cases, and controls is summarized in Table 1. Table 2 provides a summary of genetic characteristics in MI cases and control, as well as the full list of pre-selected and genotyped gene/SNPs. The unadjusted univariate association between tagSNPs and MI calculated using x 2 tests are included in Table 2.

Univariate analysis and multivariable for MI
Risk of MI was associated with SNPs (P < .05): rs3917356 (IL-1b gene), rs17231513 (CETP_1), rs17237890 (CETP_3), and rs2070744 (NOS_2), while s2069825 (IL6), ADDU-000614 (ADDU), and rs9340799 (ESR1) were near significantly different by MI case and control groups (P < .10). None of the rest SNPs was significantly or nearly significantly associated with a risk of MI, which were excluded from multivariable analysis. Finally, 14 genetic and environmental variables (P < .10), including BMI, smoking status, HBP, Meds for HTN, HC, family history for heart disease, drinking status in past 1 to 2 years, rs3917356 (IL-1b), rs2069825 (il6), ADDU-000614 (ADDU), rs17237890 (CETP), rs2070744 (NOS), and rs2077647 (ESR1), were included in the follow-up multivariable analysis after multiple imputation.   to be significantly associated with risk of MI, which are also the most significant SNPs in the unadjusted association analysis. Two other significant SNPs from unadjusted analysis diminish due to the inclusion of conditional or modifiable effects.

With both main effects and interactions of gene and environmental factors.
After entering all 2-way interaction effects, the interaction between drinking and smoking status was the only significant effect in the model, which improved the total percent of variance explained by 0.4%. With the interaction effect added, 41% of total variance in risk of MI was explained (Nagelkerke R 2 = 0.410) (see Table 3 Model 2   Table 4 provides the univariate analysis and the unadjusted risk using x 2 tests, ORs, and 95% CIs for HC for each selected individual genes/SNPs. When adjusted by other genes (SNPs) (see Table 5 Table 6), the risk of HC for rs17231896 (CETP) is 33% lower (OR = 0.67, 95% CI: 0.51-0.88). The increase of effect size (from 21% to 33%) without or with the inclusion of the environmental factors may indicate modifiable effects exist. Table 7 provides the adjusted Table 3 Significant clinical, behavior, demographic, and genetic factors associated with the risk of MI (N = 1834).  Summary of genetic characteristics in MI cases and control subjects (N = 1837), the unadjusted univariate association between SNPs and MI calculated using Chi-squared tests.

Genes/SNPs
Control n (%) Case n (%) Total n (%) Missing n (%) ORs and 95% CIs for HC predicted by significant genetic, demographic, behavioral, and clinical factors, with both main and interactions effects included. Both BMI and HTN are significant risk factors for HC. With the inclusions of gene-gene and gene-environmental interactions, the multiplicative effects between rs17231896 (CETP) and rs17222772 (Alox); rs17231896 (CETP) and gender; MI and age; rs9340799 (ESR1) and family history of heart diseases were found significantly associated with HC. As expected, after multiple testing (e.g., Bonferroni or false discover rate) adjustments, the significant SNPs for MI or HC diminished to be statistically nonsignificant.

Discussion
Having a history of either HC or MI has a significant effect on each other in both directions. Common environmental (e.g., smoke, HTN, gender) and genetic factors (gene CETP) for both HC and MI are identified from separate analysis with or without adjustment of other significant factors. The association between CETP polymorphism (rs17231896, rs17237890, and rs17231513) has protective effects on HC, which is consistent and replicated with the meta-analysis study from ARIC data using genome-wide association study for the association between CETP and HDL, in which 3 SNPs from CETP, rs708272, (OR = 0.95), rs5882 (OR = 0.94), and rs1800775 (OR = 0.95) have been found to be significantly associated with HDL adjusted by both gene and environmental interactions. [32] From a functional content point of view, CETP is known to regulate the process of transporting cholesterol from the peripheral arteries to the liver, which helps reduce the risk of CHD. Other studies have also found that CETP expression is regulated by multiple functional SNPs, affecting splicing and transcription, with increased or decreased CETP function, although they focus on selected different variants (rs247616, rs173539). [42][43][44] The modulations and joint effects among genes, clinical, behavioral, and other factors on HC and MI are complex. A study from 1995 demonstrated how the CETP gene regulating above process is influenced (positively) by alcohol, but subsequent studies have not been able to fully replicate the result. [45] Our analysis confirms that alcohol and smoking habits have Table 4 Unadjusted odds ratios and 95% confidence intervals for HC by genes/SNPs (N = 1837).

Table 6
Adjusted odds ratios and 95% confidence intervals for HC predicted by genetic, demographic, behavior, and clinical factors with main effects (N = 1837).  With inclusions of genes and their interactions, these joint and modulating genetic and lifestyle, clinical factors improve the prediction power for MI and HC. Moreover, both gene-gene and gene-environmental interaction effects were found to be significantly associated with MI and HC, which could be further examined through pathway/regulations and function analysis. [46] It is known that appropriate multiplicity adjustment (either false discovery rate or Bonferroni procedure) is crucial to guarantee the replicability and reproducibility of findings, which should be conducted in large-scale genome-wide association analysis to avoid the potential false positives for multi-trait association analysis. [47] Despite the protective effect of CETP on HC, rs17237890 (different SNP of CETP) has shown the increased risk on MI when other significant lifestyle factors were included such as smoking and alcohol assumptions. Future haplotype-pair analysis (to reduce the number of tests and increase the expected effect using the haplotypes instead of single genotypes) could be conducted for further validating plausibly gene (i.e., CETP) without using all the gene set (for better power) to see whether consistently associated with all the studied phenotypes. [40,48] In addition, within populations of European Caucasian decent, population stratification issue will be assessed and weighted using genomic control with the inflation factor to avoid the spurious association findings. [49,50] The common alleles/genes associated with both MI and HC discovered include rs16944 (IL-1b) with an independent effect without other genetic or environmental factors. However, when adding others, some effects either diminished to statistically nonsignificant or effect size modified, which may indicate some under-recognized role of modulate and modifiable effect of environmental exposures for the genes on the risk of both conditions. The diminished effects (OR with or close to zero) or other potentially missing heritable components of disease etiology may be better further fully explored either through epigenetics analysis [46] with longitudinal data or more complex network and pathway-based analysis, for example, using Bayesian networks for analyzing the direct and indirect probabilistic causal associations to dissect the complex relationships among the significant factors (age, gender, BMI, HTN, smoking, alcohol drinking status, CETP gene, CHD/MI).

95% confidence interval
One limitation of this study is the notable missing data for lipid measures, such as total cholesterol, HDL, LDL, and TG (missing ranged from 44.4% to 46.9%), which is why having a history of HC (once or multiple) is chosen as the primary lipid outcome measure in this paper. Many published works regarding lipid-SNP associations use those measures as primary outcome.
Furthermore, for HC treated by meds (statin treatment or other cholesterol-lowering therapies) variable, only 26.6% of participants reported presence or absence such medication, which does not make it feasible to include it in the estimation model. Cases of familial dyslipidemia are not collected for this study and these related lipid variables may change estimated effects and should be addressed and taken into account in future studies. Table 7 Adjusted odds ratios and 95% confidence intervals for HC predicted by genetic, demographic, behavior, and clinical factors, with main and interactions effects included (N = 1837).