Comparing GWAS Results of Complex Traits Using Full Genetic Model and Additive Models for Revealing Genetic Architecture

Most of the genome-wide association studies (GWASs) for human complex diseases have ignored dominance, epistasis and ethnic interactions. We conducted comparative GWASs for total cholesterol using full model and additive models, which illustrate the impacts of the ignoring genetic variants on analysis results and demonstrate how genetic effects of multiple loci could differ across different ethnic groups. There were 15 quantitative trait loci with 13 individual loci and 3 pairs of epistasis loci identified by full model, whereas only 14 loci (9 common loci and 5 different loci) identified by multi-loci additive model. Again, 4 full model detected loci were not detected using multi-loci additive model. PLINK-analysis identified two loci and GCTA-analysis detected only one locus with genome-wide significance. Full model identified three previously reported genes as well as several new genes. Bioinformatics analysis showed some new genes are related with cholesterol related chemicals and/or diseases. Analyses of cholesterol data and simulation studies revealed that the full model performs were better than the additive-model performs in terms of detecting power and unbiased estimations of genetic variants of complex traits.

Cholesterol is an important complex lipid contain sterol nucleus. Elevated level of cholesterol increased risk of cardiovascular disease including coronary heart disease, stroke and peripheral vascular disease. Moreover, it has also been linked to diabetes and high blood pressure. Several association studies with different approaches were previously performed to analyze cholesterol trait for a number of different populations [1][2][3] . For example, single-locus model to detect main effects, and two-locus model for epistasis were used to analyze cholesterol trait of Framingham Heart Study data 4 . Teslovich et al. reported 39 genome-wide significant loci related with total cholesterol (TC) through meta-analysis of several GWASs 5 . Most of the previous GWASs of cholesterol were individually tested for additive genetic effect of each locus and Bonferroni correction threshold of multiple tests had been used to determine genome-wide significance.
Usefulness of multi-loci approaches compared to single locus approaches were largely described in QTL mapping era [6][7][8][9][10][11][12][13][14][15][16] . In GWASs, multi-loci association studies also have better performance as compared to the single-locus association studies [16][17][18] . Multi-loci association studies may identify some genetic variants that jointly have significant effects but individually make only a small contribution 19 . Single locus mapping approaches might fail to detect loci due to lack of controlling genetic background variants. Popular single locus mixed model approaches use additive genetic relationship matrix to control genetic background and genetic relatedness. Yang et al. showed constructing genetic relationship matrix by excluding the chromosome that being tested, might improve analysis results 20 . However, genetic background control by using genetic relationship matrix is insufficient for large effects of several causal loci 17 . Again, single locus additive model approaches rarely explain more than a small proportion of the heritable variation 21 . Instead of individually testing the additive effects of loci, a full genetic model with genetic effects of multiple loci, epistasis and environment interaction can be analyze by linear mixed model approaches implemented in software QTXNetwork 22 .
In this study, we analyzed total cholesterol of multi-ethnic populations that including European-Americans (E-A), Chinese-Americans (C-A), African-Americans (A-A), and Hispanic-Americans (H-A). Four different association approaches were used: full model and multi-loci additive model through QTXNetwork, single locus (Table 1) for detected quantitative trait SNPs (QTSs) through full model with their significant genetic and ethnic interaction effects, mostly due to epistasis effects ( = .
h 8 57% I 2 ), additive effects ( = . h 8 06% A 2 ), and dominance by ethnic interactions ( = . h 6 54% DE 2 ). The estimated heritability due to non-additive effects was larger than estimated heritability due to additive effects only. We observed missing heritability for multi-loci additive model including only additive and additive by ethnic interaction effects (Table 1). Total estimated heritability for cholesterol using the multi-loci additive model was only 13.91%, which is less than half of the total heritability of full model. Genetic and ethnic specific effects of QTSs. Two distinct steps were used to complete cholesterol data analyses. At first, we extracted a set of SNPs that were significantly identified through 1D, 2D and 3D scan by GMDR-GPU module and then used a linear mixed model approach to explore and build up an optimum model of genetic architecture of trait. Actually, the first step is a filtering step that assists to reduce computational time for further analysis. In the second step, we tested for experimental-wise statistical significance (α EW = 0.05) of extracted set of SNPs using full model approach to reveal complex genetic architecture of trait. We plotted results of cholesterol data analyses by using full model and multi-loci additive model approaches in Fig. 1.
a 4 94, −log 10 P EW > 34). In addition to additive effect, QTS rs629301 also had significant ethnic specific dominance effects negative for E-A population ( = − . de 3 85 1 ) but positive for H-A population ( = . de 5 40 4 ). Gene CELSR2 located on chromosome 1p13.3 and its protein encodes a member of flamingo subfamily, part of cadherin superfamily; and APOB is a protein-coding gene, located at chromosome 2p24.1. Several variants near gene regions of CELSR2-PSRC1-SORT1 and APOB-TRDR15 have been reported to be associate with cholesterol in a number of previous GWASs 4, 5,25 . CELSR2 in the cholesterol gene cluster shows a significant association with coronary artery disease and its single nucleotide polymorphism regulates plasma cholesterol levels 25 . The QTS rs6465748 at 9.6 kb 5′ of MYH16 had highly significant ethnic specific additive effects for African American (A-A) and Hispanic American (H-A) cohort in addition to its additive main effect. That implies the additive effect of QTS rs6465748 was significantly different in these two ethnic groups as compared to other two groups, whereas additive effect was the smallest for A-A population ( + = − . a ae 7 78 3 ) but the largest for H-A population ( + = . a ae 1 22 4 ). Heterozygote C/T of rs7694118 had significant dominance and dominance by ethnic interaction effects. Another locus T/C of rs10768634 had highly significant dominance effect only.
aa 7 46, −log 10 P EW > 13) was from A/A of rs12246594 in gene SORCS1 × G/G of rs12595211 in gene ETFA. SORCS1 encodes one family member of vacuolar protein sorting 10 (VPS10) domain-containing receptor proteins, located at chromosome 10q25.1. Variants of SORCS1 gene have been reported associate with type-2 diabetes 26,27 , and Alzheimer's diseases 28,29 , while these diseases were reported to associate with cholesterol [30][31][32] . ETFA is located on chromosome 15q24.3 that participates in catalyzing the initial step of the mitochondrial fatty acid beta-oxidation. Negative epistasis interactions between rs2264802 in gene SKAP2 and rs9548318 in gene UFM1 had highly significant additive × additive ( = − . aa 4 71, −log 10 P EW > 26) and additive × dominance ( = − . ad 6 14, −log 10 P EW > 12). SKAP2 encodes protein contains an amino terminal coiled-coil domain for self-dimerization, a plecskstrin homology (PH) domain required for interactions with lipids at the membrane. Positive epistasis effects were detected between rs478442 near APOB and rs7694118 near PCDH10 ( = . aa 2 38 of A/A × C/C, = . da 3 59 of A/C × C/C). The APOB gene is interact with the LDL receptor, and is fundamental for the regulation of plasma cholesterol in humans 33 . Genetic analysis of NMR-lipoprotein fractions in humans had shown the gene PCDH10 related with LDL cholesterol 34 . Bioinformatics analysis of candidate genes corresponding to QTSs. Bioinformatics analyses were applied by using Biopubinfo (http://ibi.zju.edu.cn/biopubinfo/) for candidate genes detected using full model approach. Biopubinfo is a search engine for tracing public biological information mainly covers data from life sciences and medical sciences, and includes a concept ontology database derived from several sources (UMLS, OLS, BioOntology.org). The candidate genes corresponding to detected QTSs were used as seeds for searching related pathway, functions, genes, chemical and drug information, protein-protein interactions, gene-disease association etc. Figure 2A shows relationship of four candidate genes (CELSR2, SKAP2, ETFA and UFM1) with chemical cyclosporine, whereas cyclosporine has significant relationship with blood cholesterol levels and cause of increasing blood cholesterol 35 . There were relationships between genes APOB and CETP with cholesterol; genes CETP and MMP13 with chemical chloranium, chloride ion and hydrochloric acid; genes MMP13 and ETFA with formic acid and formyloxidanium (Fig. 2B). Gene-disease association reports the relationships of candidate genes with several diseases (Fig. 2), and the diseases might have relationship with cholesterol. Genes SORCS1 and SKAP2 have relationship with diabetes type 1, diabetes mellitus type 2, cardiovascular diseases, and Edema ( Fig. 2A). Absorption and low synthesis of cholesterol might observes in patients with type-1 diabetic compared with non-diabetic control subjects 36 . The high-normal range with moderately elevated levels of total cholesterol and hemoglobin A1c defines a high-risk group for the progression to diabetic nephropathy and for clinical events related to arteriosclerotic cardiovascular disease 37 . Several experimental studies discussed about potential relationship of cholesterol with diabetes mellitus type-2 [38][39][40][41][42] . Total cholesterol is the most important risk factor associated with clinically significant macular edema (CSME) 43 . Gene-disease association ( Fig. 2A) shows SORCS1, SKAP2, and CELSR2 genes are related with genetic susceptibility, and genes SORCS1 and PCDH10 are related with tobacco-use disorder. Protein-protein interaction was observed between CELSR2 and PCDH10. Gene ontology search shows complex network among genes in terms of function, e.g. SORCS1, UFM1, and CELSR2 are related through protein binding; SORCS1, CELSR2, PCDH10 are related through integral to membrane etc. (Fig. 2).
Association study with additive genetic models. Association analysis with full model identified several significant dominance and epistasis effects corresponding to QTSs for cholesterol ( Table 2). We reanalyzed cholesterol data by ignoring dominance and epistasis effects and observed substantial difference in analysis results as compared to the full model approach. For example, some new loci were identified, and some full-model detected loci were disappeared (Table S1, Fig. 1). Multi-loci additive model identified 5 new QTSs, but did not detect 4 QTSs compared to the full model approach. We observed no ethnic specific effects highly significant in this case. Two QTSs (rs7694118 and rs10768634) had only highly significant dominance effects detected by full model approach, and were not detectable by multi-loci additive model due to ignoring non-additive effects. However, two QTSs (rs6465748 and rs10483461) had highly significant additive and ethnic specific additive effect through full model, but also had not been detected through multi-loci additive model. Most probable reason is due to ignoring dominance and epistasis effects. In general, phenotypic traits follow joint or multivariate distributions. If, we ignore some important variants from genetic model and fit the remain variants with phenotype, the effects of ignoring factors could reduce detection power of loci. By analyzing cholesterol data through multi-loci additive model, we identified some QTSs that were not detected by full model. Probably, those QTSs were falsely detected due to ignoring dominance and epistasis effects or full model approach failed to detect them.
Association study through PLINK and GCTA. We also analyzed cholesterol data using single locus approaches by adjusting for population stratifications and genetic relatedness through principle components and genetic relationship matrix (GRM). For PLINK analysis, sex and top 10 PCs were included in the analysis as covariates to account the potential variations due to population stratification and sex differences within Multi-Ethnic Study of Atherosclerosis (MESA) population. For GCTA analysis, GRM was additionally used to control genetic relatedness. We separately analyzed cholesterol data for two subsequent Examinations (Exam-1 and Exam-3) of MESA population. Genomic inflation factors (λ GC ). which were calculated based on median of Chi-square distribution of whole-genome P-values, were 1.02 and 1.03 for Exam-1 and Exam-3 respectively after adjusting the population stratification by 10 principle components (PCs). Again, Q-Q plots of the GWASs P-values from PLINK and GCTA analyses showed good-fit to the uniform distribution (Figs S1B~S4B). Therefore, 10 PCs had efficiently controlled the population stratifications.
For GCTA analysis, we used MLMe 20 approach that ignores the testing chromosome in time of constructing GRM and in addition the 10 PCs and sex were used as cofactors. We used linkage disequilibrium (LD) based Bonferroni corrected threshold to determine genome-wide significance for PLINK and GCTA analyses. By setting "--indep-pairwise 50 5 0.75" option in PLINK v1.07, we removed the SNPs of high LD (r 2 > 0.75) with the extracted set of SNPs, and calculated the approximate number of the independent tests. After the LD based pruning, there were 458,716 SNPs and therefore Bonferroni corrected threshold for genome-wide significance at α = 0.05 level of significance was −log 10 P = 6.96.
We inspected additive and dominance effects using PLINK, however none of the dominance effects were genome-wide significant. Therefore, we recalculated the effects of all SNPs for additive effects only, and detected only two loci for Exam-1 and one locus for Exam-3, respectively with genome-wide significant for  Table 2. Identified QTSs for cholesterol using full model approach. Chr_SNP_Alleles: chromosome_SNP_ major/minor alleles; Gene: the near or holder genes corresponding to QTSs; Effect: genetic effects of QTSs, a = additive effect, d = dominance effect, aa = additive-additive epistasis effect, ad = additive-dominance epistasis effect, da = dominance-additive epistasis effect, ae 3 = A-A specific additive effect, ae 4 = H-A specific additive effect, de 1 = E-A specific dominance effect, de 3 = A-A specific dominance effect, de 4 = H-A specific dominance effect; Estimate: the estimated genetic effects; −log 10 P EW : minus log 10 (experimental-wise P-value); h 2 (%) refers heritability in percentage.
Scientific RepoRts | 7:38600 | DOI: 10.1038/srep38600 additive effects. Number of non-missing individual observations was larger in Exam-1 (N = 5277) as compared to Exam-3 (N = 4572), and therefore the power of SNP detection was better in Exam-1 as compared to Exam-3. The QTS rs629301 (−log 10 P = 12.10) located at chromosome-1 and the QTS rs478442 (−log 10 P = 6.97) located at chromosome-2 was genome-wide significant for Exam-1 (Table S2, Fig. S1A) and only the QTS rs629301 (−log 10 P = 8.37) was genome-wide significant for Exam-3 (Table S3, Fig. S2A). Analysis with GCTA detected only one significant locus, the QTS rs629301 with genome-wide significance (−log 10 P = 11.56 for Exam 1 and −log 10 P = 8.20 for Exam 3) for the both Exams (Table S2 and Table S3, Fig. S3A and Fig. S4A). Remarkably, the QTSs rs629301 and rs478442 were also detected by using full model and multi-loci additive model approach. Again, most of the candidate QTSs detected through full model approach had significant effects in PLINK and GCTA analyses for both Exams, although their effects were not genome-wide significant (Table S2 and Table S3). The significance of the candidate QTSs in both Exams might refer their robustness of associations with cholesterol data.
Simulation studies. Under the simulated scenario-I, calculated powers of detecting loci using full model were high or moderate for most of the loci (Table S4). Again, calculated powers of detecting effects were high or moderate for most of the effects corresponding to loci (Table 3). Full model approach detected two loci (rs629301 and rs478442) with full power, nine loci with high power (> 70%), five loci with moderate power (> 40%) (Table S4). We observed detection powers were low for the effects of loci that were not highly significant in real data analysis using the full model approach. For example, estimated effects of the QTS rs7624679 were not highly significant in real data analysis and therefore in simulation study the detection powers of the effects were also low (9% for additive effect and only 2% for dominance by ethnic interaction effects). Again, QTS rs629301 had highly significant additive effect (−log 10 P EW > 34) but low significant additive by ethnic interaction effect (−log 10 P EW > 2.70), whereas detection powers were 100% for additive effect but only 15% for additive by ethnic interaction effect. Full model approach can obtain unbiased estimates of parameters and produce only small FDR (3.7%) under the simulated scenario-I.
We used multi-loci additive model approach ignoring dominance and epistasis effects to analyze the simulated traits. In this case, calculated FDR was equal to 15.5%, indicating that significantly increase of advocate FDR could be due to ignoring dominance and epistasis effects from reduced model. However, most of the detected false loci using full model and multi-loci additive model had very high LD (r 2 > 0.85) with the true loci, might be representative of true loci. We observed most of the false loci were very near to the true loci and had similar effect size. If we consider the false loci as the representative of true loci, then false discovery rates were close to zero for the full model and multi-loci additive model. However, multi-loci additive model identified more nearby false loci as compared to the full model. Again, estimation of genetic parameters was biased upwards or downwards for some loci (Table 4). Total 13 loci were detected out of 17 true loci (4 loci were failed to detect), and none of the   ethnic specific effects were significantly detected. None of the falsely identified loci had high LD with the four loci that were failed to detect in 100 simulations using multi-loci additive model.
It was suggested that additive model approach could result in biasedness or deficiency if dominance and epistasis effects have impacts on complex traits. By comparing the results from full model and multi-loci additive model in the contexts of real and simulated data analyses, we can reveal the possible explanations of differences in results from these two approaches for cholesterol data analysis. From simulation results, we observed that multi-loci additive model failed to detect QTSs rs6465748 and rs10483461 for 100 simulations, suggesting that estimates of the genetic effects of those QTSs did not reach to experimental-wise significance through additive model approach, although they had true additive effect. In real data analyses, we also observed those QTSs were not detected through additive model but detected by full model (Table 3). Therefore, ignoring important genetic factors could reduce power of detecting QTSs. QTSs with only dominance effects were not detected in simulated data analyses by multi-loci additive model. It was revealed that reduce model could not detect some true QTSs due to ignoring dominance and epistasis effects, and result in biased estimations and increase FDR.
For PLINK and GCTA analyses, calculated powers under the simulated scenario were high and moderate for two SNPs (rs629301 and rs478442) identified in real data analysis using single locus approaches (Tables S2~S4). Calculated powers for others SNPs were quite low and mostly not detectable using single locus approaches (Fig. 3). Therefore, the simulation results supported the real data analyses using different genetic model approaches, and revealed usefulness of the full model approach to analyze complex traits.
Comparisons between single-locus approaches revealed that detection powers of PLINK analysis were better than GCTA analysis (Table S4). Therefore, over controlling also can decrease power for detecting true QTSs. By comparing loci detection powers of different approaches, it was revealed that full model approach is more powerful to dissect genetic architecture of complex traits (Fig. 3).
Under scenario-II, we assume causal markers are controlling simulated traits in additive fashion with their additive and additive by ethnic interactions. In this case, both full model and multi-loci additive model could   provide unbiased estimate of genetic parameters (Table S5). False discovery rate was also similar for these two approaches, 4.85% for full model approach and 3.96% for multi-loci additive model approach. Under this scenario, additive model identified all ethnic specific genetic effects that support our hypothesis (Table S5). Because, under this scenario additive model approach did not have extra noise rather than random errors in estimating parameters. Detection powers were similar for full model and multi-loci additive model approaches. Under this scenario-II, single locus approach of PLINK and GCTA could also suffer in decreasing detection power for most of the true loci (Table S6).

Discussions
Genetic effects of multiple loci might differ in different human ethnicity due to various lifestyles and environmental exposures, and therefore analyzing gene by ethnicity interactions might help to reveal better knowledge about complex traits and personalization of treatment after a disease 44,45 . Our study provides underlying genetic mechanism of total cholesterol based on MESA population, and demonstrates how genetic effects of multiple loci might differ across the ethnic groups (gene by ethnicity interactions). Association mapping with full model and multi-loci additive genetic model includes gene by ethnicity interactions as random effects could estimate genetic effects of genes in specificities. In this study, full model approach estimated around 14% heritability due to ethnic specific effects that illustrate the importance of analyzing ethnic specific effects of genetic variants. Different genetic effects (e.g. additive, dominance, and epistasis) are also expected to have influence on complex traits [46][47][48] , however most of the association studies ignore dominance and epistasis effects 49,50 . Full model approach estimated around 11.24% heritability for dominance and epistasis effects and 10.21% heritability for ethnic specific dominance and epistasis effects. Multi-loci additive model approach ignored dominance and epistasis effects under assumption that only additive effects of multiple loci control the trait. However, ignoring dominance and epistasis effects might create biased in analysis results and cause missing heritability. In this study, total estimated heritability for cholesterol trait was 33.64% by using full genetic model approach, but was only 13.91% by using multi-loci additive model approach. Estimated heritability was 21.92% through GCTA for Exam-1 data, but only one SNP was detectable with genome-wide significance. It was suggested that there might have some other causal loci controlling the cholesterol trait of MESA population. For full model and multi-loci additive model approaches, we used repeated measures from two-year examinations (Exam-1 and Exam-3) for subjects as replications to improve power of detecting small effects of multiple causal loci. Comparisons of full model and multi-loci additive model provide insight of usefulness of full model and the consequences of ignoring dominance and epistasis effects. In simulation study under two different scenarios, we observed that full model approach could obtain unbiased estimation of genetic parameters with only a small fraction of FDR (3.70~4.85%) with repeated measures. Again, under scenario-II, we observed that the full model and multi-loci additive model could obtain unbiased estimates of the genetic parameters with similar FDR, indicating robustness of the full model approach even only additive genetic effects of multiple loci controlling phenotypic traits. Multi-loci additive model provided biased estimates of several genetic parameters, failed to detect several true loci and ethnic specific genetic effects of loci in 100 simulations under the scenario-I, describing the deficiencies of the additive model approach for complex trait analysis.
Detection power of QTSs was very low for single locus approaches (PLINK and GCTA) under two different simulated scenarios. This result could describe the reasons of difference in results of single locus approaches from full model approach in our real data analyses. Separately analyses of data sets for Exams-1 and Exams-3 through PLINK and GCTA can detect only two different loci associated with cholesterol (Table S2 and Table S3). In simulation study, we observed that single locus approach detected QTS rs629301 with high power and QTS rs478442 with moderate power, but detection powers for other loci were negligible. Again, average detection power of multi-loci additive model approach was lower than the full genetic model approach. It was revealed that multi-loci additive model approach failed to detect some QTSs of cholesterol trait due to ignoring dominance and epistasis effects. Therefore, detection power of full model approach could be dramatically increased not only due to using repeated measures but also including dominance and epistasis effects into the model. Association using full model approach identified the variants of three known genes CELSR2, APOB and CETP that were reported in previous association studies for cholesterol 4,25 . Additionally, the full model approach identified several new genes associated with cholesterol. Bioinformatics analysis revealed a complex networks through functions, protein-protein interactions or pathway interactions among the detected genes (Fig. 2). Disease-phenotype association study showed some newly detected genes associate with several cardiovascular diseases (Fig. 2). By searching functions of some newly detected genes, it seems that they have associations with cholesterol. For example, previous experimental analyses showed the gene UFM1 increase the macrophage cholesterol efflux, which might due to the increased expression of ATP-binding cassette transporters A1 (ABCA1) and G1 (ABCG1) 51 ; ABCA1 gene was reported to be associate with total cholesterol in previous genome-wide association study (GWAS) 5 . Previous GWAS reported the 400 kb upstream variant of UMF1 to be associate with plasma lipoprotein levels 52 . Highly significant epistasis effects between the variants of SKAP2 and UFM1 genes were detected for additive-by-additive and dominance related interactions in our study, where bioinformatics ontology search showed that they are related through cytoplasm. ETFA (electron transfer flavoprotein A) has functions for multiple acyl-coenzyme A dehydrogenation deficiency (MADD), and is related with lipid storage myopathies (LSMs) 53 . Zebrafish mutant dark Xavier (dxa vu463 ) in the ETFA gene has swollen and hyperplastic neural progenitor cells, hepatocytes, and kidney tubule cell as well as elevations in triacylglycerol, cerebroside sulfate and cholesterol levels 54 . Therefore, detected genes through full model approach might be biologically plausible for cholesterol trait.
Scientific RepoRts | 7:38600 | DOI: 10.1038/srep38600 Methods Data. Multi-Ethnic Study of Atherosclerosis (MESA) data used in this study were downloaded from dbGaP (database of Genotypes and Phenotypes, http://www.ncbi.nlm.nih.gov/gap). MESA is a prospective population-based study focusing on characterization of subclinical cardiovascular disease and the risk factors that enable prediction of the progression of CVD55. Study participants of four ethnic groups include 6,500 men and women, nearly in equal numbers, who are aged 45~84 years and free of clinical CVD at baseline, and initially recruited in 2000 from six US communities: Baltimore, MD; Chicago, IL, Forsyth County, NC; Los Angeles County, CA, Northern Manhattan, NY; and St. Paul, MN. The recruited participants are 38% European-American (E-A), 28% African-American (A-A), 22% Hispanic American (H-A), and 12% Asian, predominantly of Chinese descent, American (C-A). More details of sampling design and study procedures have described previously by Bild et al. 55 . We excluded SNPs with MAF < 0.05, call rate < 90% for analysis by using PLINK-v1.07. We did not test for Hardy-Weinberg equilibrium due to heterogeneity structure of MESA population (Fig. S5A). After applying the above QC filtering criteria, a total of 714,211 SNPs were included in analysis. LD based SNP pruning was done only for calculating approximate number of independent test to set genome-wide significance threshold of single locus mapping approach. Genotype clusters of ethnic groups were observed using 3D scatter plot based on first 3 Eigen vectors (Fig. S5A).
Phenotype data from two different examinations (Exam-1: July 2000-July 2002; and Exam-3: January 2004-July 2005) were used in this study 55 . We observed the random heterogeneity among the distributions of cholesterol according to ethnic groups (Fig. S5B). Again, significant sex differences observed within each ethnic group. Therefore, sex was used as block and ethnic effects were used as random factors to control confounding due to sex and ethnic effects in our analyses. Cholesterol data for MESA population were analyzed using QTXNetwork and checked for outliers using residuals of the model. For full model and multi-loci additive model approaches, we discarded phenotypic outliers using standardized residual analysis, and reanalyzed data.
Statistical Analyses. Two distinct approaches were used for genome-wide association analyses of cholesterol trait: generalized multi-factor dimensionality reduction (GMDR) method to scan 714,211 SNPs by 1D for main effects, 2D and 3D for epistasis interactions using module GMDR-GPU 56 where μ is the population mean; s k is the fixed effect of the k-th individual (0 for female, 1 for male); a i is the additive effect of the i-th locus with coefficient x A ik (1 for QQ, 0 for Qq, − 1 for qq); d i is the dominance effect of the i-th locus with coefficient x D ik (1 for Qq, 0 for QQ and qq); aa ij , ad ij , da ij and dd ij are the digenic epistasis effects with coefficients x AA ijk (1 for QQ × QQ and qq × qq, − 1 for QQ × qq and qq × QQ, and 0 for others), x AD ijk (1 for QQ × Qq, − 1 for qq × Qq, and 0 for others), x DA ijk (1 for Qq × QQ, − 1 for Qq × qq, and 0 for others) and x DD ijk (1 for Qq × Qq, and 0 for others); e h is the effect of the h-th ethnic population (1 for E-A, 2 for C-A, 3 for A-A, 4 for H-A); ae ih is the additive × race interaction effect of the i-th locus in the h-th ethnic population with coefficient u AE ihk ; de ih is the dominance × race interaction effect of the i-th locus in the h-th ethnic population with coefficient u DE ihk ; aae ijh , ade ijh , dae ijh and dde ijh are the digenic epistasis × race interaction effects in the h-th ethnic population with coefficient u AAE ijhk , u ADE ijhk , u DAE ijhk and u DDE ijhk ; and ε hk is the residual effect of the k-th individual in the h-th ethnic population. In this model, we have constraints for random variables with normal distributions of zero mean and variances σ v 2 . The linear mixed model and its distribution can be expressed in matrix notation,  where y is an n × 1 column vector of phenotypic values and n is the sample size of observations; μ is the population mean, b u is the u-th vector of fixed effects; X u is the known incidence matrix relating to the u-th fixed effects; e v is the v-th vector of random effects with distribution σ ∼MVN e 0 I The linear mixed model with Henderson method III 57 was used to construct the F-statistic test for association analysis. Permutation test was conducted by a total of 2,000 times for calculating the critical F-value to control the experiment-wise type I error (α EW < 0.05). The QTS effects were estimated by using the MCMC (Markov Chain Monte Carlo) algorithm with 20,000 Gibbs sample iterations 6,22,58,59 . The critical experiment-wise P value (P EW -value) for genetic effects by controlling the experiment-wise type I error (P EW < 0.05) was thus calculated. More details of procedure about the approaches were described in S1 text.
Simulation Design. Association analyses of cholesterol data by using full model and additive models revealed different results. We conducted Monte-Carlo simulations to perceive possible explanations of differences in results using full model and additive models. Simulated phenotypic data were generated under two distinctive scenarios: (I) simulated traits controlling by additive, non-additive genetic effects and ethnic specific genetic effects; (II) simulated traits controlling by additive and their ethnic interaction effects. We generated data for repeated measures of individuals and analyzed by treating as replications through full model and multi-loci additive model approaches. Under scenario-I, estimated genetic effects of significant loci obtained from full model approach were used as true loci and parameters to generate phenotypic data, and estimated error variance used to generate random errors. In Table 2, we presented only outcomes for the genetic effects of QTSs that were experimental-wise highly significant (P EW ≤ 1 × 10 −5 ) through full model approach. However, this approach also detected some experimental-wise significant effects with P EW < 0.05. In simulation study, we used all perceived effects of loci; because small effects of several variants also might have influence on complex traits. Under scenario-II, we simulated data in similar way as scenario-I nevertheless using perceived genetic architecture of cholesterol trait from multi-loci additive model approach. Therefore, loci truly had only additive and additive-by-ethnic interaction effects in scenario-II.
For association analyses through full model and multi-loci additive model approaches, false positive rates (FDR) were estimated as the ratio of falsely identified loci with respect to the total number of detected loci for each simulated trait, detection powers of loci were the detection rate of each locus, and detection power of the effects were the detection rate of the effects in 100 simulations. Powers of single-locus additive model approaches were estimated in similar way. We estimated empirical confidence interval of each effects from output of 100 simulations, and described estimation is unbiased if parameter belonging to the empirical confidence interval otherwise estimation is biased. We did not arrange empirical confidence intervals in table, however biased estimates were marked by " + " or " − " sign at the right side of estimated value, whereas " + " refers overestimation and " − " refers underestimation of parameters (Table 3 and Table 4).