Exhaustive Genome-Wide Search for SNP-SNP Interactions Across 10 Human Diseases

The identification of statistical SNP-SNP interactions may help explain the genetic etiology of many human diseases, but exhaustive genome-wide searches for these interactions have been difficult, due to a lack of power in most datasets. We aimed to use data from the Resource for Genetic Epidemiology Research on Adult Health and Aging (GERA) study to search for SNP-SNP interactions associated with 10 common diseases. FastEpistasis and BOOST were used to evaluate all pairwise interactions among approximately N = 300,000 single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) ≥ 0.15, for the dichotomous outcomes of allergic rhinitis, asthma, cardiac disease, depression, dermatophytosis, type 2 diabetes, dyslipidemia, hemorrhoids, hypertensive disease, and osteoarthritis. A total of N = 45,171 subjects were included after quality control steps were applied. These data were divided into discovery and replication subsets; the discovery subset had > 80% power, under selected models, to detect genome-wide significant interactions (P < 10−12). Interactions were also evaluated for enrichment in particular SNP features, including functionality, prior disease relevancy, and marginal effects. No interaction in any disease was significant in both the discovery and replication subsets. Enrichment analysis suggested that, for some outcomes, interactions involving SNPs with marginal effects were more likely to be nominally replicated, compared to interactions without marginal effects. If SNP-SNP interactions play a role in the etiology of the studied conditions, they likely have weak effect sizes, involve lower-frequency variants, and/or involve complex models of interaction that are not captured well by the methods that were utilized.

Gene-gene interaction, also known as epistasis, is the phenomenon where the phenotypic effect of variation at one genetic locus depends on variation at other loci (Cordell 2002). Epistasis is thought to be biologically prevalent and to contribute to the missing heritability problem for complex human disease (Lunzer et al. 2010;Huang et al. 2012;Zuk et al. 2012;Hemani et al. 2013). As a result, a significant amount of effort has been invested into attempting to identify epistatic effects in human traits Wei et al. 2014;Murk et al. 2015). The study of epistasis is challenged by its high-dimensional nature, which on a genome-wide scale can result in a very large search space and significant difficulties in attaining sufficient statistical power to detect effects. To overcome this challenge, most studies of epistasis have restricted their search space to relatively small numbers of single nucleotide polymorphisms (SNPs), often focusing on those located in previously identified trait-relevant candidate genes (Murk et al. 2015). However, this type of approach will exclude the vast majority of the genetic search space and thus potentially miss a large number of interacting genes.
We aimed to conduct well-powered, genome-wide searches for epistasis at the pairwise SNP level (i.e., to identify SNP-SNP interactions), using data from the GERA study (Hoffmann et al. 2011a,b). This study involved a large number of subjects (N = 78,486 initially available) who were genotyped using genome-wide SNP arrays, and who had health outcome data available for a number of electronic medical records-derived medical conditions. In addition, we sought to identify SNP characteristics that were enriched among replicated interactions, which could inform strategies to reduce the search space in smaller datasets.
To perform the exhaustive searches for interaction, two different analytical approaches were used: FastEpistasis and BOOST. FastEpistasis is a computationally efficient but imprecise method to screen for interactions, and is intended to be used in conjunction with follow-up analysis by slower but more precise methods, such as logistic regression (Purcell et al. 2007;Ueki and Cordell 2012). BOOST is another computationally efficient method that has been found to have greater power than FastEpistasis for some models of interaction, particularly models that do not have main effects (Wan et al. 2010). The use of these two different methods could therefore allow greater flexibility in detecting multiple different forms of interaction.

Study dataset
Authorized access to data from the GERA study was obtained (dbGaP Study Accession: phs000674.v1.p1), and use of these data was approved by the Yale Human Investigation Committee. This study is described in detail elsewhere (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/ study.cgi?study_id=phs000674.v1.p1) (Hoffmann et al. 2011a,b;Kvale et al. 2015). Briefly, the study was comprised of 78,486 subjects who were members of the Kaiser Permanente Medical Care Plan, Northern California Region, and who participated in the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) survey study. All subjects were 18 or more years of age at the start of the RPGEH survey in 2007, were members of the aforementioned medical care plan for at least 2 yr prior, and provided broad consent for the use of their data in research. Genotype data were derived from custom-designed Affymetrix Axiom genome-wide SNP microarrays (Hoffmann et al. 2011a,b). Four different chips were designed, and subjects were assigned to a particular chip depending on his or her race/ethnicity categorization (http://www.ncbi.nlm.nih.gov/projects/ gap/cgi-bin/GetPdf.cgi?id=phd004309). In the present study, we included all subjects who were genotyped on the EUR ("Non-Hispanic White") chip (counts before quality control: N = 62,318 subjects and N = 670,176 SNPs available).
Health outcomes were binary-coded (yes or no) variables representing various medical conditions, as derived from Kaiser Permanente Electronic Medical Records. These conditions were defined based on a subject having at least two medical diagnoses within an ICD-9-based disease category. These conditions and the ICD-9 codes that were used to define them are listed here: http://www.ncbi.nlm.nih.gov/projects/gap/ cgi-bin/GetPdf.cgi?id=phd004308. Individual-level diagnostic codes were not available (i.e., only the derived binary variables were present in the dataset). We excluded from potential analysis the conditions "Cancer" and "Psychiatric," as it was judged that these conditions were too broadly defined for the objectives of the present study. We also excluded any conditions for which we had less than 50% power to detect an interaction (based on Quanto-derived estimations to detect an interaction OR $ 1.50, at an a of 10 212 ; see power estimation methods below). After these exclusions, the conditions "asthma," "allergic rhinitis," "cardiac disease," "depression," "dermatophytosis," "diabetes, type 2," "dyslipidemia," "hypertensive disease," "hemorrhoids," and "osteoarthritis" (N = 10 conditions) were included as outcomes for analysis.
Additional variables included from the dataset included birth year category, sex, and genetic structure principal components (10 components total). The principal components were previously calculated and provided by the original GERA study investigators; details of this are reported here: http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/ GetPdf.cgi?id=phd004309. Birth year category was defined as one of 14 possible time periods (category 1: birth year # 1923; categories 2-14: 5-year categories, starting with 2 being the period 1924-1928). We defined "minimum age" as the difference in years between the time of survey (2007) and the last year of the birth year category. For example, the minimum age for a subject with a birth year category of 2 was defined to be 2007 2 1928 = 79 years.

Sample quality control (QC)
Subjects were excluded if they (i) had an ambiguous SNP-estimated sex; (ii) had a subject call rate , 95%; (iii) were selected for exclusion due to relatedness with other subjects; (iv) had a reported race/ethnicity other than "white" or did not have principal components available for the EUR chip; and (v) were a PCA outlier, defined as having a value . 6 standard deviations on any one of the 10 principal components. Ambiguous SNP-estimated sex was defined as having an X-chromosome homozygosity estimate (F) between 0.2-0.8. There were 17 subjects for whom the unambiguous SNP-estimated sex differed from the reported sex; in such cases, the SNP-estimated sex was considered to be the true sex. To assess subject relatedness, pi-hat (proportion identical by descent) was estimated using the dataset after being pruned for linkage disequilibrium (LD) (N = 358,590 SNPs). Pi-hat was calculated for every possible pair of subjects; for pairs with pi-hat $ 0.20, one subject of the pair was randomly excluded. The numbers of subjects excluded at each QC step are listed in Supplemental Material, Table S1.
Case/control definitions Cases and controls were defined after sample quality control procedures were completed. For each of the 10 outcomes included for analysis, cases were subjects who were coded as having the respective condition, while controls were subjects without it. Controls were excluded if their birth year category exceeded the 99.5 percentile birth year category of cases (except in the case of asthma, allergic rhinitis, and dermatophytosis). For example, for osteoarthritis cases, the 99.5 percentile birth year category was 10 (minimum age of this category: 39 years); therefore, all controls with a birth year category of 11 or higher (i.e., those younger than 39 years of age) were excluded. This was performed in order to exclude controls that were not of a sufficient age to be of comparable risk for the condition. No exclusions based on age were made for asthma, allergic rhinitis, and dermatophytosis, since these conditions are common in children. The number of controls excluded based on age are shown in Table S2. In the final analyses, across the different conditions, the numbers of cases ranged from 5549-24,047, and the numbers of controls ranged from 20,018-39,154.

SNP quality control
Ten copies of the dataset, one for each included outcome condition, were created, and SNP QC procedures were performed within each conditionspecific dataset. SNPs were excluded if they (i) could not be mapped to the hg19 reference genome; (ii) had a SNP call rate , 98% overall or in cases or controls separately; (iii) had a minor allele frequency (MAF) , 0.15 (this was specified because there was little power to detect an interaction involving SNPs that were less common than this); (iv) were nonautosomal; or (v) had a test for deviation from Hardy-Weinberg equilibrium with P-value , 10 25 , among controls. The numbers of SNPs excluded at each QC step are listed in Table S3.

Discovery and replication datasets
After sample and SNP QC, each outcome-specific dataset was randomly divided into a discovery and a replication dataset. These divisions were made such that the targeted size of each replication dataset was 1000 cases and 3000 controls. These numbers were chosen because they provided . 80% power for nominal replication (based on Quantoderived estimations to detect an interaction OR $ 1.50, at an a of 0.05; see power estimation methods below). The final numbers of subjects in each discovery and replication dataset are shown in Table S2.

Genomic inflation
At the level of marginal (individual-SNP) effects, test statistic inflation due to population stratification or other sources was assessed via calculations of the genomic inflation factor l. Values of l greater than 1.05 were considered evidence of test statistic inflation. Marginal test statistics were derived using genome-wide single-SNP logistic regression analyses performed within all 10 sets of discovery and replication datasets. The cardiac disease (l = 1.14), dermatophytosis (l = 1.23), type 2 diabetes (l = 1.08), dyslipidemia (l = 1.11), and hypertensive disease (l = 1.30) data showed evidence of inflation (Table S4). For cardiac disease and dermatophytosis, this inflation was largely removed after adjusting for the first two principal components. For the other conditions, adjustment for the first two components either had no effect (type 2 diabetes) or reduced but did not remove the inflation [dyslipidemia (l = 1.07) and hypertensive disease (l=1.10)]. Adjustment for the full 10 principal components yielded no further improvement in inflation (data not shown). All principal components were those that were provided with the GERA dataset. Since subjects were genotyped using one of two different kits, we also sought to determine whether the type of kit had any effect on test statistic inflation. In our final analyses (after all exclusions), 1.5% of all subjects were genotyped on the less commonly used kit; when these subjects were excluded, the observed values of l decreased by 0.02 or less, which was determined to be a negligible amount (data not shown). As a result, no exclusions based on kit type were made in the final analyses.
Interaction test statistic inflation was visually evaluated using quantile-quantile plots. To do this, a random selection of approximately 10,000 SNPs was obtained from each condition-specific dataset, and all possible pairs of SNPs (approximately 5 · 10 7 pairs) were tested for interaction using the FastEpistasis analytical approach (described below). Observed vs. expected-under-the-null 2log P-values were plotted, and no evidence of inflation was observed ( Figure S1; shown only for the discovery datasets).

Analytical approaches
All SNPs were diallelic. The analytical referent allele was the major allele (by allele frequency in the discovery dataset), while the nonreferent allele (which may also be known as the "alternative" or "coded" allele) was the minor allele. To assess marginal effects, logistic regression models were used, with SNPs coded additively (based on the number of copies of the nonreferent allele). Unadjusted and adjusted (for birth year category, age, and the first two principal components) analyses were performed for all condition-specific datasets. For ranking of SNPs by marginal effect P-value, the adjusted analyses in the discovery datasets were used.
To assess interactions on a genome-wide scale, all pairwise interactions among all postQC SNPs were assessed in each condition-specific discovery dataset using two different approaches: FastEpistasis and BOOST (as implemented in the version of Plink listed below). To avoid spurious evidence of interaction due to linkage disequilibrium between SNPs, an interaction was excluded if its two SNPs were located within 1 Mb of each other. From these genome-wide analyses, interactions with an interaction P-value , 10 27 were selected for follow-up. For interactions selecting from FastEpistasis analyses, the follow-up consisted of reanalysis of the selected interactions using logistic regression modeling, in both the discovery and replication datasets. Unadjusted and adjusted (for birth year category, age, and the first two principal components) logistic regression analyses were performed. In these models, SNPs were coded additively, and the nonreferent allele was the minor allele in the discovery dataset. Models included main effects for each SNP and an interaction term for the SNPs; statistical significance was based on an explicit test of the interaction term. For interactions selected from BOOST analyses, the follow-up consisted of analysis of the selected interactions in the replication dataset, also using BOOST. For final ranking of interactions by P-value, the adjusted analyses of the logistic regression modeling, or the BOOST analyses, from the discovery datasets were used.

Statistical significance and replication
Nominal significance was defined as P , 0.05. Bonferroni correction was used to determine strict statistical significance. For marginal effects, genome-wide significance was defined as P-value , 10 27 (i.e., 0.05 corrected for tests of the approximately 300,000 SNPs included in each condition-specific marginal analysis). Marginal effect significance was declared based on P-values from the analyses that were adjusted for birth year category, sex, and the first two principal components. For interactions, genome-wide significance was defined as interaction P-value , 10 212 (i.e., 0.05 corrected for tests of the approximately 4.5 · 10 10 interactions evaluated in each condition-specific analysis). Interaction significance was declared based on P-values from the logistic regression (adjusted) or BOOST analyses. For the enrichment analyses, significance was defined as P-value , 0.0004 (i.e., 0.05 corrected for 120 enrichment tests).
For marginal effect analyses and interactions assessed with logistic regression (i.e., interactions that were followed-up after the FastEpistasis analyses), an effect was considered replicated if it had both a P-value , 0.05 and a consistent direction of effect (with that of the discovery dataset, based on a marginal effect odds ratio or interaction odds ratio, respectively) in the replication dataset. For interactions assessed using BOOST, an interaction was considered replicated only if it had a P-value , 0.05 in the replication dataset (directions of effects were not generated in these analyses and thus were not be compared).
Analytical software Quality control procedures, marginal effects testing, and genomewide interaction tests (FastEpistasis and BOOST) were performed using Plink 1.90 b (https://www.cog-genomics.org/plink2) (Chang et al. 2015). To assess the accuracy of epistasis tests made using Plink 1.90, we compared its test results to those of similar analyses made using Plink 1.07 for a sample of interactions, and found concordant results ( Figure S2). Logistic regression tests for epistasis were performed using CASSI Genome-Wide Interaction Analysis Software v2.50 (https://www. staff.ncl.ac.uk/richard.howey/cassi/).

Data and reagent availability
File S1 provides a description of all interactions selected for follow-up (https://figshare.com/s/01e151ea20ecb5cdb8db; DOI: 10.6084/m9. figshare.3113551). File S2 provides additional methodological detail. Figure S1 depicts quantile-quantile plots for interactions in the discovery datasets. Figure S2 provides a comparison of epistasis analysis results derived from Plink 1.07 and Plink 1.90. Table S1 contains subject quality control data. Table S2 contains disease-specific subject counts. Table S3 contains SNP quality control data. Table S4 contains genomic inflation data for marginal effects. Table S5 contains database search terms for identifying disease-related genes. Table  S6 contains SNP annotation counts. Table S7 contains power analyses for the discovery datasets. Table S8 contains power analyses for the replication datasets. Table S9 contains the penetrance table  for epiSIM power simulations. Table S10 contains estimates of phenotypic variance explained by additive genetic variance of the included SNPs. Table S11, Table S12, Table S13, Table S14,  Table S15, Table S16, Table S17, Table S18, Table S19, and Table  S20 describe the top 10 marginal effects for each studied condition. Table S21, Table S22, Table S23, Table S24, Table S25, Table S26,  Table S27, Table S28, Table S29, and Table S30 describe the enrichment analyses for each condition. Table S31 describes the Bio-Grid interactions.

Dataset description
The 10 conditions included as outcomes for analysis, and case/control characteristics for each condition, are shown in Table 1. A majority of the subjects were women, and the median age of the subjects was at or above 59 years of age. All subjects were classified as being of white race/ethnicity. Assuming a logistic regression model, most of the discovery data offered greater than 80% power to detect an interaction with genome-wide significance (assessing all possible pairwise interactions among approximately N = 300,000 SNPs, or N = 4.5 · 10 10 interactions), assuming an interactions odds ratio of $ 1.50 for SNPs with MAF of $ 0.15 (Table S7). The exceptions to this are the data for depression, dermatophytosis, and type 2 diabetes, which only had greater than 50% power. All replication datasets had greater than 80% power to detect the same kind of interaction at a nominal significance of P , 0.05 (Table S8). Estimates of the proportion of phenotypic variance explained by additive genetic variance of the included SNPs ranged from 3.6% (dermatophytosis) to 24.6% (diabetes, type 2) (Table S10).

Genome-wide search for interactions
The discovery datasets of all 10 conditions were subjected to exhaustive genome-wide searches for pairwise interactions, using both the FastEpistasis (with follow-up via logistic regression) and BOOST analytical approaches. Only one interaction met the criteria for being declared genome-wide significant (interaction P , 10 212 from the adjusted logistic regression analysis or from the BOOST analysis). This was the interaction between rs4456135 and rs12162346 for the outcome of dermatophytosis (P = 4.29 · 10 213 , logistic regression); however, this was nonsignificant and in opposite direction in the replication dataset (Table 3). For each condition, the most significant interaction, and the most significant nominally replicated interaction, are listed in Table 3 and Table 4 for FastEpistasis and BOOST, respectively. The rank number (by interaction significance) of the most significant nominally replicated interactions ranged from 1 (hemorrhoids; BOOST analysis) to 100 (cardiac disease; FastEpistasis analysis). All followed-up interactions (i.e., those with n FastEpistasis or BOOST P , 10 27 in the discovery datasets) are described in File S1.

Enrichment analysis
We next attempted to determine if there was an enrichment of nominally replicated interactions with particular SNP annotations (based on six possible categories; see the Supplemental Materials), among all followed-up interactions. These results are shown in Table S21, Table  S22, Table S23, Table S24, Table S25, Table S26, Table S27, Table  S28, Table S29, and Table S30. The only annotation category that showed evidence for enrichment was that of "marginal" (SNPs with a marginal effect P , 0.05 in the discovery dataset of the respective condition). Both cardiac disease and hypertensive disease showed strictly significant (P , 0.0004) evidence for this type of enrichment, while dyslipidemia also showed nominally significant evidence (P , 0.05) for the same. For example, in cardiac disease (analyzed by logistic regression; Table S23), among interactions where both SNPs had the "marginal" annotation, 20.0% were nominally replicated; while among interactions where neither of the SNPs had this annotation, only 2.5% were nominally replicated. However, no annotation strategy could have been used to successfully narrow the search space in order to find a replicated, significant interaction. This can be seen in Table S21, Table S22, Table S23, Table  S24, Table S25, Table S26, Table S27, Table S28, Table S29, and Table  S30, comparing entries in the columns "Min Int P" with prefix "R" (the smallest P-value among all replicated interactions matching the stated annotation type) and "Thresh" (a hypothetical significance threshold, defined as 0.05 corrected for the number of interactions that would have resulted had the search been limited to interactions matching the stated annotation type). For example, in cardiac disease, there were N = 16,284 SNPs with the "marginal" annotation (Table S6). An exhaustive pairwise analysis of just these SNPs would result in 1.3 · 10 8 evaluated interactions and a significance threshold of P , 3.77 · 10 210 . The most significant replicated interaction among all of these had an actual P-value of 1.09 · 10 27 (Table S23), and thus would not have been considered significant. Therefore, although we can detect a possible enrichment among interactions with marginal effects, it may not be possible to use this filtering approach alone to identify which of the interactions are actually be genuine; effect sizes may be too small to detect.

BioGRID interactions
Finally, we sought to determine whether any interaction that met the criteria for follow-up involved genes (or their products) that have been previously reported to physically or genetically interact, as curated in the BioGRID database. Across all analyzed conditions, there were three BioGRID interactions among the followed-up interactions for the FastEpistasis analyses, and two for the BOOST analyses (Table S31). However, none of these interactions were nominally replicated.

DISCUSSION
To our knowledge, this study presents one of the largest (by subject count) searches for SNP-SNP interactions yet conducted for human disease. Despite this, and the use of two different analytical methods, we failed to detect a significant, replicable interaction after exhaustively searching through 45 billion possible interactions in each of 10 different complex diseases. One possible explanation for this is that no interactions exist for these conditions involving SNPs with a MAF $ 0.15 operating under the interaction models specified in the power analyses. If SNP-SNP interactions do contribute to the conditions that were analyzed, then they are likely of weak effect and/or involve lower-frequency SNPs, which will n Table 2  Most significant marginal effects, by condition, ranked by significance in each discovery adjusted analysis. RSID, reference SNP cluster ID; A1, nonreferent allele; A0, referent allele; OR, odds ratio; C.I., confidence interval; P, P-value; Genome-wide sig.?, whether or not the P-value from the discovery adjusted analysis was less than 10 27 ; Rep?, whether or not the marginal effect was nominally replicated; Anno., annotation assigned to the respective SNP, coded as follows; R, regulatory; D, disease-gene; G, any-gene; EX, exonic. M (marginal) not shown, since all listed SNPs have that annotation. a Although not formally assigned a gene annotation due to its distance, this SNP is located 6.9 kb from the HLA-DQB1 gene.
n Interactions were first analyzed with FastEpistasis and then subjected to a follow-up analysis with logistic regression. For each condition, the most significant of all interactions is listed, followed by the most significant interaction that was nominally replicated. Interactions were ranked by significance in the adjusted logistic regression analysis (discovery). Numbers in the leftmost column indicate the overall ranks of the interactions for the respective condition. Blanks in the "Anno" or "Gene" columns indicate no annotation or gene assigned to the respective SNP. SNP, single nucleotide polymorphism; Rep.?, whether or not the interaction was nominally replicated; Anno1/Anno2, annotation assigned to SNP1 or SNP2, respectively; Gene1/Gene2, gene assigned to SNP1 and SNP2, respectively; RSID, reference SNP cluster ID; Chr, chromosome number; A1, nonreferent allele; A0, referent allele. FE, P, P-value from the FastEpistasis analysis; Unadj., P, P-value from the unadjusted logistic regression analysis; Adj., OR, interaction odds ratio and 95% confidence interval, from the adjusted logistic regression analysis; Adj., P, P-value from the adjusted logistic regression analysis; All "adjusted" analyses were adjusted for the first two principal components, birth year category, and sex; G, anygene; D, disease-gene; M, marginal; R, regulatory.
be very challenging to study. Statistical power drops off rapidly as the interaction odds ratio goes from 1.50-1.25; even at a modest significance level of P , 10 27 , we did not have power to detect an interaction with an odds ratio of 1.25 for any condition (Table S7). In addition, SNP annotation by functionality (exonic function, regulatory function, and disease-relevant eQTL) and gene assignment (disease-relevant gene and any gene) failed to result in any appreciable enrichment in replicated interactions, suggesting that these methods alone may not be useful in trimming the epistasis search space. Only SNPs with marginal significance showed evidence for enrichment, for some outcomes. Interestingly, this is consistent with a recent suggestion that "epistatic effects that seem to be statistically robust have large marginal effects" (Wei et al. 2014). Furthermore, among all interactions that met follow-up criteria, few involved genes were reported to interact in the BioGRID database, and none of these few were nominally replicated. This suggests that robust statistical SNP-SNP interactions do not correspond to gene pairs whose products are known to interact physically (since the majority of interactions listed in the BioGRID data were identified based on protein-protein interactions).
Apart from the nonexistence of strong (large effect size) interactions, there are a number of possible alternative explanations for our observed lack of detection. The first is that some of the included conditions may have been too broadly defined to capture a strong association signal, since they were defined based on having two or more of several possible ICD-9 codes within a diagnostic class, and it was not possible to ascertain narrower phenotypes. This might particularly have been a problem for the outcome of "cardiac disease," which was constructed out of a large of number of disparate codes (http://www.ncbi.nlm.nih.gov/projects/ gap/cgi-bin/GetPdf.cgi?id=phd004308). Moreover, it may be possible that associations will differ depending on age of onset, which we did not have information on. The etiology of some of the conditions (e.g., dermatophytosis and hemorrhoids, which had few entries in the Genetic Association Database and the DisGenNET database; Table S5) may simply have a weaker genetic basis than others, which could also explain a lack of association. For half of the conditions (allergic rhinitis, asthma, type 2 diabetes, dyslipidemia, and hypertensive disease), strong marginal effects involving well-known disease-relevant genes were found, which supports the validity of the analytical approach for these conditions. However, no strictly significant marginal effects were found for the other half (cardiac disease, depression, dermatophytosis, hemorrhoids, and osteoarthritis), which could be attributable to the aforementioned problems. This is also suggested by the observation that the additive genetic variance captured by the included SNPs was low for some conditions (Table S10). In addition, the datasets for depression, dermatophytosis, and type 2 diabetes had lower power in general than the other conditions (Table S7), which could also explain a lack of n interactions for these conditions in particular. Furthermore, we excluded interactions involving SNPs located within 1 Mb of one another, to prevent the analysis of SNPs in linkage disequilibrium; however, this may have also excluded genuine interactions involving close-proximity SNPs. Additionally, power was estimated assuming a limited number of interaction models and no noise in the data; as such, it is likely that the analytical approaches (FastEpistasis, BOOST, and logistic regression) had low power to detect interaction models that are not captured well by these methods. Finally, epistasis may involve higher orders of interaction that may not be apparent in the second-order (pairwise) interactions that were examined in this study.
Limitations of the enrichment analysis include the fact that the exonic and regulatory annotations only considered directly typed SNPs, and that the eQTL and disease gene annotations assumed that typed SNPs were within a short distance of an eQTL (6 1 kb) or a disease gene (6 5 kb). Although these choices were made in order to limit the number of annotated SNPs, they could potentially miss SNPs that were in linkage disequilibrium with these features but located a farther distance away. In addition, the disease gene annotations relied on literature curation databases that may have different levels of completeness or validity for different conditions. It is also possible that other annotation methods not considered here (such as the use of Biofilter (Bush et al. 2009)) may have greater success than we found.
Given the development of statistical methodologies that enable the computationally efficient evaluation of vast numbers of interactions, and the increasing availability of very large genetic association databases, it is becoming increasingly feasible to search for SNP-SNP interactions on a genome-wide scale. However, based on the experiences found in the present study, these interactions are likely to have small effect sizes, involve low-frequency variants, and/or involve complex models of interaction, which may require alternative methods of detection.