A kernel machine method for detecting effects of interaction between multidimensional variable sets: An imaging genetics application☆
Introduction
Genetic components play a significant role in most brain-related illnesses. The discovery of genetic effects can elucidate the biological pathways and processes underlying neurological disorders, and ultimately yield prevention and treatment strategies. In the field of imaging genetics, this goal is approached by using quantitative brain image derived measurements as intermediate or endophenotypes (Biffi et al., 2010, Ge et al., 2014, Gottesman and Shields, 1972, Gottesman and Gould, 2003, Meyer-Lindenberg and Weinberger, 2006, Sabuncu et al., 2012), which are biomarkers of disease, and are believed to be closer to the disease process and have a simpler genetic architecture than clinical diagnoses.
However, heritability analyses and genome-wide association studies (GWAS) (Visscher et al., 2012) of complex genetic phenotypes ranging from human height (Yang et al., 2010), body mass index, von Willebrand factor (Yang et al., 2011), and schizophrenia (Lee et al., 2012b), to various volume-, surface- or connection-based brain measurements computed from structural, functional or diffusion images (Thompson et al., 2013), indicate that phenotypic variation cannot be solely explained by genetics. The interactions between genetic and non-genetic variables such as disease risk factors, environmental exposures and epigenetic markers may play an important role in the variation of complex phenotypes (Sullivan et al., 2012), and the influence of genetic variants on the likelihood, development, and progression of a brain illness may be indirect and interactive. The presence of interactions implies that genetics can modulate the effects of various risk factors on the disease, producing variations across subjects even exposed to the same environment. Alternatively, the effect of the genotype on outcomes can depend on one or more risk factors or environmental exposures. For example, Caspi et al. (2002) reported that the effect of maltreatment of children from birth to adulthood on the development of antisocial behavior is moderated by a functional polymorphism in the MAOA gene. The genotype of a locus known as 5-HTTLPR located in the promoter region of the serotonin transporter gene was found to moderate the influence of stressful life events on depression (Caspi et al., 2003). Therefore, identifying potential genetic interactions with non-genetic variables can be critical in understanding the true relationship between genotype and phenotype.
Thanks to recent advances in genotyping technology, it is now possible to investigate genetic interaction effects involving specific genetic risk factors, candidate genes, or even the entire genome, in unrelated individuals. Current statistical methods to test for interactions largely utilize multiple linear regression models with quantitative phenotypes, or logistic regression models with binary outcomes, in both the genetics community (Aschard et al., 2011, Kraft et al., 2007, Paré et al., 2010), and the imaging community (e.g., psychophysiological interactions analysis (Friston et al., 1997)). In these analyses, both main effects are typically univariate variables, and the interaction is modeled by their product. Although a number of recent papers have tried to improve the power of the classical univariate interaction test (Hsu et al., 2012, Mukherjee and Chatterjee, 2008, Murcray et al., 2011), they suffer from two main drawbacks when detecting interactions between genetic variants and non-genetic variables. First, converging evidence has shown that many complex brain disorders are polygenic and influenced by up to thousands of genetic variants with small effects (Purcell et al., 2009, Sullivan et al., 2012). Analyzing each individual locus may not identify any reliable results with a small to moderate sample size, which is typical in imaging genetic studies. And second, it is now not uncommon to collect a large number of disease risk factors, environmental variables, or epigenetic markers in a single study. The product of all possible pairs of genetic variants and non-genetic variables may be dauntingly large, which dramatically increases the burden of computation and multiple testing corrections. More critically, Lin et al. (2013) showed that if the main effects of a set of genetic variants are associated with the phenotype, testing each single genetic variant for interactions can be biased.
In this paper, inspired by Li and Cui (2012), we present a semiparametric kernel machine based method to detect interactions between multidimensional variable sets. Kernel machine based methods have been previously used in association studies between single nucleotide polymorphism (SNP) sets and complex diseases or imaging phenotypes (Kwee et al., 2008, Liu et al., 2007, Wu et al., 2010, Wu et al., 2011), and have been applied to voxel-wise genome-wide association studies to obtain boosted statistical power (Ge et al., 2012, Stein et al., 2010). Here, to jointly model the genetic and non-genetic variables, and their interactions, we extend the original kernel machine based method, and include three appropriately selected kernels in the model; one for genetic variants, one for non-genetic variables, and a third one, which is the Hadamard product of the genetic and non-genetic kernel, for the interaction effect. The genetic kernel provides a biologically-informed way to capture epistasis in a set of SNPs and model their joint effect on the phenotype. SNP sets can be formed by SNPs located in or near a gene, within a gene pathway or a haplotype structure; risk SNPs identified by previous studies or other a priori biological information (Wu et al., 2010). Examining the collective contribution of SNPs further opens possibilities to investigate cumulative effects of rare variants (Wu et al., 2011), and often provides improved reproducibility, biologically informed insights, and increased power relative to univariate methods. The non-genetic kernel allows for modeling the joint effect of multiple variables. By using a connection to linear mixed effects models, the interaction effect can be tested by a variance component score test (Lin, 1997, Liu et al., 2007). The proposed method thus offers a flexible framework to account for epistatic effects, multiple non-genetic factors, and test for the overall interaction effect between sets of multidimensional variables.
As a demonstration of application, we applied the proposed method to detect the interaction effects between candidate late-onset Alzheimer's disease (AD) risk genes and cardiovascular disease (CVD) risk factors including age, gender, body mass index (BMI), hypertension, current smoking status and diabetes, on hippocampal volume derived from structural brain magnetic resonance imaging (MRI) scans, which is associated with AD risk and future AD progression (Sperling et al., 2011).
AD, the most common form of dementia, is characterized by memory loss, cognitive decline, and other symptoms. The cause and progression of AD are not well understood. As a disease that often co-occurs with AD in the elderly population, vascular pathology is among the potential factors to increase the risk of AD. In particular, increasing evidence shows that many CVD risk factors including hypertension, smoking and diabetes are associated with cognitive decline and neurodegeneration, and may increase the risk and accelerate the progression of AD (Helzner et al., 2009, Kivipelto et al., 2001, Lo et al., 2012, Luchsinger et al., 2005, Purnell et al., 2009). For example, the neurovascular hypothesis of AD suggests that neurovascular dysfunction reduces the clearance of amyloid beta (Aβ) peptide across the blood–brain barrier, which could initiate a series of pathological processes and ultimately lead to neuronal injury and loss (Zlokovic, 2005). Moreover, recent studies have identified that the interaction within multiple CVD risk factors, and the interaction between CVD risk factors and the apolipoprotein E (APOE) polymorphism, the largest genetic determinant of late-onset AD susceptibility, may significantly influence the risk and progression of AD (Borenstein et al., 2005, Irie et al., 2008, Purnell et al., 2009, Qiu et al., 2003). We therefore hypothesized that genetic components play a role in the development and progression of AD in the presence of CVD risk factors and events. Testing for the interactions between AD risk genes and CVD risk factors on hippocampal volume may shed light on the underlying mechanisms of AD-related neurodegeneration, and suggest potential therapeutic treatment as many CVD risk factors are largely modifiable.
The remainder of the paper is organized as follows. In the Materials and methods section, we present the kernel machine based method and the statistical test for interaction detection between multidimensional variable sets. Simulation studies are then introduced to evaluate the proposed method. In the Results section, simulation results, as well as our findings on the real data are shown, and compared to alternative interaction detection methods. The advantages and weaknesses of the method, and the implication of the findings, are summarized in the Discussion section. Some theoretical aspects of the kernel method and supplementary analyses are provided in the Appendix.
Section snippets
The model
We assume that there are N unrelated subjects under investigation. yi, i = 1, ⋅ ⋅⋅, N, is a quantitative phenotype for the i-th subject, such as an image derived disease marker. We are interested in detecting the interaction between a collection of genetic variants and a set of non-genetic variables such as disease risk factors, environmental exposures, or epigenetic markers. In particular, let Gi = [Gi,1, ⋅ ⋅⋅, Gi,L]⊺ denote the L SNP markers, where Gi,s, s = 1, ⋅ ⋅⋅, L, is the genotype coded to be the number
Simulation results
Table 2 shows the simulation results for the overall and interaction score tests. Here we used a nominal p-value threshold of 0.05. In more than 99% of the situations, the ReML algorithm converged within 50 iterations (convergence was declared when the difference between successive log ReML likelihoods was smaller than 10− 4), the maximum number of iterations we set in this simulation study, and in most cases it converged very quickly within 10 iterations and a few seconds with a MATLAB
Discussion
In this paper, we have proposed a kernel machine based method to test for interactions between multidimensional variable sets. Compared to traditional collapsing and PCA-based methods, the proposed method provides a more flexible and biological plausible way to model epistasis between genetic variants, accommodates multiple factors that potentially moderate genetic effects, and can test for complex interaction effects between multidimensional variable sets. Although multivariate methods
Acknowledgments
Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, and the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer's Association; Alzheimer's Drug Discovery Foundation; BioClinica, Inc.; Biogen
References (81)
- et al.
Spatiotemporal linear mixed effects modeling for the mass-univariate analysis of longitudinal neuroimage data
NeuroImage
(2013) - et al.
Developmental and vascular risk factors for Alzheimer's disease
Neurobiol. Aging
(2005) - et al.
CR1 genotype is associated with entorhinal cortex volume in young healthy adults
Neurobiol. Aging
(2011) - et al.
Cortical surface-based analysis: I. Segmentation and surface reconstruction
NeuroImage
(1999) Freesurfer
NeuroImage
(2012)- et al.
Cortical surface-based analysis: II: inflation, flattening, and a surface-based coordinate system
NeuroImage
(1999) - et al.
Whole brain segmentation: automated labeling of neuroanatomical structures in the human brain
Neuron
(2002) - et al.
Psychophysiological and modulatory interactions in neuroimaging
NeuroImage
(1997) - et al.
Increasing power for voxel-wise genome-wide association studies: the random field theory, least square kernel machines and fast permutation procedures
NeuroImage
(2012) - et al.
The genetic architecture of Alzheimer's disease: beyond APP, PSENs and APOE
Neurobiol. Aging
(2012)
Some results on Tchebycheffian spline functions
J. Math. Anal. Appl.
A powerful and flexible multilocus association test for quantitative traits
Am. J. Hum. Genet.
Optimal unified approach for rare-variant association testing with application to small-sample case–control whole-exome sequencing studies
Am. J. Hum. Genet.
Cardiovascular disease contributes to Alzheimer's disease: evidence from large-scale genome-wide association studies
Neurobiol. Aging
PLINK: a tool set for whole-genome association and population-based linkage analyses
Am. J. Hum. Genet.
Event time analysis of longitudinal neuroimage data
NeuroImage
Toward defining the preclinical stages of Alzheimer's disease: recommendations from the national institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease
Alzheimers Dement.
Voxelwise genome-wide association study (vGWAS)
NeuroImage
Effect of complement CR1 on brain amyloid burden during aging and its modification by APOE genotype
Biol. Psychiatry
Genetics of the connectome
NeuroImage
Five years of GWAS discovery
Am. J. Hum. Genet.
The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception
Alzheimers Dement.
Powerful SNP-set analysis for case–control genome-wide association studies
Am. J. Hum. Genet.
Rare-variant association testing for sequencing data with the sequence kernel association test
Am. J. Hum. Genet.
Neurovascular mechanisms of Alzheimer's neurodegeneration
Trends Neurosci.
An integrated map of genetic variation from 1,092 human genomes
Nature
Theory of reproducing kernels
Trans. Am. Math. Soc.
Genome-wide meta-analysis of joint tests for genetic and gene–environment interaction effects
Hum. Hered.
Statistical analysis of longitudinal neuroimage data with linear mixed effects models
NeuroImage
Genetic variation and neuroimaging measures in Alzheimer disease
Arch. Neurol.
Genetic variation at CR1 increases risk of cerebral amyloid angiopathy
Neurology
Role of genotype in the cycle of violence in maltreated children
Science
Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene
Science
CR1 is associated with amyloid plaque burden and age-related cognitive decline
Ann. Neurol.
General cardiovascular risk profile for use in primary care: the Framingham Heart Study
Circulation
Automatically parcellating the human cerebral cortex
Cereb. Cortex
Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer's disease
Mol. Psychiatry
Imaging genetics — towards discovery neuroscience
Quant. Biol.
The endophenotype concept in psychiatry: etymology and strategic intentions
Am. J. Psychiatr.
Schizophrenia genetics: a twin study vantage point
Cited by (31)
Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!
2021, Computational and Structural Biotechnology JournalCitation Excerpt :Other studies have also started to analyse proteomics and neuroimaging-based features as potential biomarkers of the basis for computing essential cell functions to identify the best proteomic model for the diagnosis, monitoring, and prediction of complex neurological disorders [101,102]. Research focused on multivariate modeling of gene-environment interactions has recently emerged, revealing significant interaction effects between candidate genetic variants and multiple environmental factors [103–106]. These methods may represent the starting point of designs focused on the integration of multivariate imaging gene-environment interactions open up new sources of analysis by means of which to gain an understanding of the conditional mechanisms through which genes, environment, and brain features interact to predict brain diseases and neurological conditions [107,108].
A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia
2018, Journal of Neuroscience MethodsCitation Excerpt :Recently, positive definite kernel based methods have become an effective tool in imaging genetics. For example, they have been used for identifying genes associated with diseases (Li and Cui, 2012; Ge et al., 2015; Alam et al., 2016a,b). Kernel methods offer useful ways to learn how a large collection of genetic variants are associated with complex phenotypes, to help explore the relationship between genetic markers and a disease state (Camps-Valls et al., 2007; Yu et al., 2011; Alam, 2014; Alam and Fukumizu, 2015; Schölkopf et al., 1998; Kung, 2014).
Strategies for integrated analysis in imaging genetics studies
2018, Neuroscience and Biobehavioral ReviewsCitation Excerpt :sCMs have also been applied in IG to achieve better discrimination of disease status based on multiple imaging and multiple genetic features. These methods include machine learning techniques, which can be also used to define gene-sets or pathways that best predict the imaging phenotype (Cao et al., 2013), Kernel machine-based methods (Ge et al., 2015, 2012; Zhang et al., 2014) and Bayesian methods (Batmanghelich et al., 2013; Stingo et al., 2013; Zhe et al., 2014). In general, sCMs apply a sparse representation coefficient during classification, which contains very important discriminating information.
Introduction
2018, Imaging GeneticsRecent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials
2017, Alzheimer's and DementiaCitation Excerpt :Lack of these risk alleles was estimated to decrease AD incidence by 8%. CR1 and EPHA1 interacted with cardiovascular disease risk factors to reduce hippocampal volume [189]. Cardiovascular risk dominated the genetic risk of these loci in terms of interaction effect such that at low genetic risk, high cardiovascular risk factors had a more detrimental effect (Fig. 10).
- ☆
Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
- 1
JWS and MRS contributed equally.