Elsevier

NeuroImage

Volume 109, 1 April 2015, Pages 505-514
NeuroImage

A kernel machine method for detecting effects of interaction between multidimensional variable sets: An imaging genetics application

https://doi.org/10.1016/j.neuroimage.2015.01.029Get rights and content

Highlights

  • Novel kernel machine based method for detecting interaction effects

  • Method can model epistatic effects.

  • Method can accommodate multiple environmental variables.

  • We show novel gene–cardiovascular risk interaction relevant to Alzheimer's.

Abstract

Measurements derived from neuroimaging data can serve as markers of disease and/or healthy development, are largely heritable, and have been increasingly utilized as (intermediate) phenotypes in genetic association studies. To date, imaging genetic studies have mostly focused on discovering isolated genetic effects, typically ignoring potential interactions with non-genetic variables such as disease risk factors, environmental exposures, and epigenetic markers. However, identifying significant interaction effects is critical for revealing the true relationship between genetic and phenotypic variables, and shedding light on disease mechanisms. In this paper, we present a general kernel machine based method for detecting effects of the interaction between multidimensional variable sets. This method can model the joint and epistatic effect of a collection of single nucleotide polymorphisms (SNPs), accommodate multiple factors that potentially moderate genetic influences, and test for nonlinear interactions between sets of variables in a flexible framework. As a demonstration of application, we applied the method to the data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) to detect the effects of the interactions between candidate Alzheimer's disease (AD) risk genes and a collection of cardiovascular disease (CVD) risk factors, on hippocampal volume measurements derived from structural brain magnetic resonance imaging (MRI) scans. Our method identified that two genes, CR1 and EPHA1, demonstrate significant interactions with CVD risk factors on hippocampal volume, suggesting that CR1 and EPHA1 may play a role in influencing AD-related neurodegeneration in the presence of CVD risks.

Introduction

Genetic components play a significant role in most brain-related illnesses. The discovery of genetic effects can elucidate the biological pathways and processes underlying neurological disorders, and ultimately yield prevention and treatment strategies. In the field of imaging genetics, this goal is approached by using quantitative brain image derived measurements as intermediate or endophenotypes (Biffi et al., 2010, Ge et al., 2014, Gottesman and Shields, 1972, Gottesman and Gould, 2003, Meyer-Lindenberg and Weinberger, 2006, Sabuncu et al., 2012), which are biomarkers of disease, and are believed to be closer to the disease process and have a simpler genetic architecture than clinical diagnoses.

However, heritability analyses and genome-wide association studies (GWAS) (Visscher et al., 2012) of complex genetic phenotypes ranging from human height (Yang et al., 2010), body mass index, von Willebrand factor (Yang et al., 2011), and schizophrenia (Lee et al., 2012b), to various volume-, surface- or connection-based brain measurements computed from structural, functional or diffusion images (Thompson et al., 2013), indicate that phenotypic variation cannot be solely explained by genetics. The interactions between genetic and non-genetic variables such as disease risk factors, environmental exposures and epigenetic markers may play an important role in the variation of complex phenotypes (Sullivan et al., 2012), and the influence of genetic variants on the likelihood, development, and progression of a brain illness may be indirect and interactive. The presence of interactions implies that genetics can modulate the effects of various risk factors on the disease, producing variations across subjects even exposed to the same environment. Alternatively, the effect of the genotype on outcomes can depend on one or more risk factors or environmental exposures. For example, Caspi et al. (2002) reported that the effect of maltreatment of children from birth to adulthood on the development of antisocial behavior is moderated by a functional polymorphism in the MAOA gene. The genotype of a locus known as 5-HTTLPR located in the promoter region of the serotonin transporter gene was found to moderate the influence of stressful life events on depression (Caspi et al., 2003). Therefore, identifying potential genetic interactions with non-genetic variables can be critical in understanding the true relationship between genotype and phenotype.

Thanks to recent advances in genotyping technology, it is now possible to investigate genetic interaction effects involving specific genetic risk factors, candidate genes, or even the entire genome, in unrelated individuals. Current statistical methods to test for interactions largely utilize multiple linear regression models with quantitative phenotypes, or logistic regression models with binary outcomes, in both the genetics community (Aschard et al., 2011, Kraft et al., 2007, Paré et al., 2010), and the imaging community (e.g., psychophysiological interactions analysis (Friston et al., 1997)). In these analyses, both main effects are typically univariate variables, and the interaction is modeled by their product. Although a number of recent papers have tried to improve the power of the classical univariate interaction test (Hsu et al., 2012, Mukherjee and Chatterjee, 2008, Murcray et al., 2011), they suffer from two main drawbacks when detecting interactions between genetic variants and non-genetic variables. First, converging evidence has shown that many complex brain disorders are polygenic and influenced by up to thousands of genetic variants with small effects (Purcell et al., 2009, Sullivan et al., 2012). Analyzing each individual locus may not identify any reliable results with a small to moderate sample size, which is typical in imaging genetic studies. And second, it is now not uncommon to collect a large number of disease risk factors, environmental variables, or epigenetic markers in a single study. The product of all possible pairs of genetic variants and non-genetic variables may be dauntingly large, which dramatically increases the burden of computation and multiple testing corrections. More critically, Lin et al. (2013) showed that if the main effects of a set of genetic variants are associated with the phenotype, testing each single genetic variant for interactions can be biased.

In this paper, inspired by Li and Cui (2012), we present a semiparametric kernel machine based method to detect interactions between multidimensional variable sets. Kernel machine based methods have been previously used in association studies between single nucleotide polymorphism (SNP) sets and complex diseases or imaging phenotypes (Kwee et al., 2008, Liu et al., 2007, Wu et al., 2010, Wu et al., 2011), and have been applied to voxel-wise genome-wide association studies to obtain boosted statistical power (Ge et al., 2012, Stein et al., 2010). Here, to jointly model the genetic and non-genetic variables, and their interactions, we extend the original kernel machine based method, and include three appropriately selected kernels in the model; one for genetic variants, one for non-genetic variables, and a third one, which is the Hadamard product of the genetic and non-genetic kernel, for the interaction effect. The genetic kernel provides a biologically-informed way to capture epistasis in a set of SNPs and model their joint effect on the phenotype. SNP sets can be formed by SNPs located in or near a gene, within a gene pathway or a haplotype structure; risk SNPs identified by previous studies or other a priori biological information (Wu et al., 2010). Examining the collective contribution of SNPs further opens possibilities to investigate cumulative effects of rare variants (Wu et al., 2011), and often provides improved reproducibility, biologically informed insights, and increased power relative to univariate methods. The non-genetic kernel allows for modeling the joint effect of multiple variables. By using a connection to linear mixed effects models, the interaction effect can be tested by a variance component score test (Lin, 1997, Liu et al., 2007). The proposed method thus offers a flexible framework to account for epistatic effects, multiple non-genetic factors, and test for the overall interaction effect between sets of multidimensional variables.

As a demonstration of application, we applied the proposed method to detect the interaction effects between candidate late-onset Alzheimer's disease (AD) risk genes and cardiovascular disease (CVD) risk factors including age, gender, body mass index (BMI), hypertension, current smoking status and diabetes, on hippocampal volume derived from structural brain magnetic resonance imaging (MRI) scans, which is associated with AD risk and future AD progression (Sperling et al., 2011).

AD, the most common form of dementia, is characterized by memory loss, cognitive decline, and other symptoms. The cause and progression of AD are not well understood. As a disease that often co-occurs with AD in the elderly population, vascular pathology is among the potential factors to increase the risk of AD. In particular, increasing evidence shows that many CVD risk factors including hypertension, smoking and diabetes are associated with cognitive decline and neurodegeneration, and may increase the risk and accelerate the progression of AD (Helzner et al., 2009, Kivipelto et al., 2001, Lo et al., 2012, Luchsinger et al., 2005, Purnell et al., 2009). For example, the neurovascular hypothesis of AD suggests that neurovascular dysfunction reduces the clearance of amyloid beta (Aβ) peptide across the blood–brain barrier, which could initiate a series of pathological processes and ultimately lead to neuronal injury and loss (Zlokovic, 2005). Moreover, recent studies have identified that the interaction within multiple CVD risk factors, and the interaction between CVD risk factors and the apolipoprotein E (APOE) polymorphism, the largest genetic determinant of late-onset AD susceptibility, may significantly influence the risk and progression of AD (Borenstein et al., 2005, Irie et al., 2008, Purnell et al., 2009, Qiu et al., 2003). We therefore hypothesized that genetic components play a role in the development and progression of AD in the presence of CVD risk factors and events. Testing for the interactions between AD risk genes and CVD risk factors on hippocampal volume may shed light on the underlying mechanisms of AD-related neurodegeneration, and suggest potential therapeutic treatment as many CVD risk factors are largely modifiable.

The remainder of the paper is organized as follows. In the Materials and methods section, we present the kernel machine based method and the statistical test for interaction detection between multidimensional variable sets. Simulation studies are then introduced to evaluate the proposed method. In the Results section, simulation results, as well as our findings on the real data are shown, and compared to alternative interaction detection methods. The advantages and weaknesses of the method, and the implication of the findings, are summarized in the Discussion section. Some theoretical aspects of the kernel method and supplementary analyses are provided in the Appendix.

Section snippets

The model

We assume that there are N unrelated subjects under investigation. yi, i = 1,  ⋅⋅, N, is a quantitative phenotype for the i-th subject, such as an image derived disease marker. We are interested in detecting the interaction between a collection of genetic variants and a set of non-genetic variables such as disease risk factors, environmental exposures, or epigenetic markers. In particular, let Gi = [Gi,1,  ⋅⋅, Gi,L] denote the L SNP markers, where Gi,s, s = 1,  ⋅⋅, L, is the genotype coded to be the number

Simulation results

Table 2 shows the simulation results for the overall and interaction score tests. Here we used a nominal p-value threshold of 0.05. In more than 99% of the situations, the ReML algorithm converged within 50 iterations (convergence was declared when the difference between successive log ReML likelihoods was smaller than 10 4), the maximum number of iterations we set in this simulation study, and in most cases it converged very quickly within 10 iterations and a few seconds with a MATLAB

Discussion

In this paper, we have proposed a kernel machine based method to test for interactions between multidimensional variable sets. Compared to traditional collapsing and PCA-based methods, the proposed method provides a more flexible and biological plausible way to model epistasis between genetic variants, accommodates multiple factors that potentially moderate genetic effects, and can test for complex interaction effects between multidimensional variable sets. Although multivariate methods

Acknowledgments

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, and the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer's Association; Alzheimer's Drug Discovery Foundation; BioClinica, Inc.; Biogen

References (81)

  • G. Kimeldorf et al.

    Some results on Tchebycheffian spline functions

    J. Math. Anal. Appl.

    (1971)
  • L.C. Kwee et al.

    A powerful and flexible multilocus association test for quantitative traits

    Am. J. Hum. Genet.

    (2008)
  • S. Lee et al.

    Optimal unified approach for rare-variant association testing with application to small-sample case–control whole-exome sequencing studies

    Am. J. Hum. Genet.

    (2012)
  • G. Liu et al.

    Cardiovascular disease contributes to Alzheimer's disease: evidence from large-scale genome-wide association studies

    Neurobiol. Aging

    (2014)
  • S. Purcell et al.

    PLINK: a tool set for whole-genome association and population-based linkage analyses

    Am. J. Hum. Genet.

    (2007)
  • M.R. Sabuncu et al.

    Event time analysis of longitudinal neuroimage data

    NeuroImage

    (2014)
  • R.A. Sperling et al.

    Toward defining the preclinical stages of Alzheimer's disease: recommendations from the national institute on Aging-Alzheimer's Association workgroups on diagnostic guidelines for Alzheimer's disease

    Alzheimers Dement.

    (2011)
  • J.L. Stein et al.

    Voxelwise genome-wide association study (vGWAS)

    NeuroImage

    (2010)
  • M. Thambisetty et al.

    Effect of complement CR1 on brain amyloid burden during aging and its modification by APOE genotype

    Biol. Psychiatry

    (2013)
  • P.M. Thompson et al.

    Genetics of the connectome

    NeuroImage

    (2013)
  • P.M. Visscher et al.

    Five years of GWAS discovery

    Am. J. Hum. Genet.

    (2012)
  • M.W. Weiner et al.

    The Alzheimer's Disease Neuroimaging Initiative: a review of papers published since its inception

    Alzheimers Dement.

    (2013)
  • M.C. Wu et al.

    Powerful SNP-set analysis for case–control genome-wide association studies

    Am. J. Hum. Genet.

    (2010)
  • M.C. Wu et al.

    Rare-variant association testing for sequencing data with the sequence kernel association test

    Am. J. Hum. Genet.

    (2011)
  • B.V. Zlokovic

    Neurovascular mechanisms of Alzheimer's neurodegeneration

    Trends Neurosci.

    (2005)
  • 1000 Genomes Project Consortium

    An integrated map of genetic variation from 1,092 human genomes

    Nature

    (2012)
  • N. Aronszajn

    Theory of reproducing kernels

    Trans. Am. Math. Soc.

    (1950)
  • H. Aschard et al.

    Genome-wide meta-analysis of joint tests for genetic and gene–environment interaction effects

    Hum. Hered.

    (2011)
  • L. Bernal-Rusiel et al.

    Statistical analysis of longitudinal neuroimage data with linear mixed effects models

    NeuroImage

    (2013)
  • A. Biffi et al.

    Genetic variation and neuroimaging measures in Alzheimer disease

    Arch. Neurol.

    (2010)
  • A. Biffi et al.

    Genetic variation at CR1 increases risk of cerebral amyloid angiopathy

    Neurology

    (2012)
  • A. Caspi et al.

    Role of genotype in the cycle of violence in maltreated children

    Science

    (2002)
  • A. Caspi et al.

    Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene

    Science

    (2003)
  • L.B. Chibnik et al.

    CR1 is associated with amyloid plaque burden and age-related cognitive decline

    Ann. Neurol.

    (2011)
  • R.B. D'Agostino et al.

    General cardiovascular risk profile for use in primary care: the Framingham Heart Study

    Circulation

    (2008)
  • B. Fischl et al.

    Automatically parcellating the human cerebral cortex

    Cereb. Cortex

    (2004)
  • S.J. Furney et al.

    Genome-wide association with MRI atrophy measures as a quantitative trait locus for Alzheimer's disease

    Mol. Psychiatry

    (2010)
  • T. Ge et al.

    Imaging genetics — towards discovery neuroscience

    Quant. Biol.

    (2014)
  • I.I. Gottesman et al.

    The endophenotype concept in psychiatry: etymology and strategic intentions

    Am. J. Psychiatr.

    (2003)
  • I.I. Gottesman et al.

    Schizophrenia genetics: a twin study vantage point

    (1972)
  • Cited by (31)

    • Multivariate Analysis and Modelling of multiple Brain endOphenotypes: Let's MAMBO!

      2021, Computational and Structural Biotechnology Journal
      Citation Excerpt :

      Other studies have also started to analyse proteomics and neuroimaging-based features as potential biomarkers of the basis for computing essential cell functions to identify the best proteomic model for the diagnosis, monitoring, and prediction of complex neurological disorders [101,102]. Research focused on multivariate modeling of gene-environment interactions has recently emerged, revealing significant interaction effects between candidate genetic variants and multiple environmental factors [103–106]. These methods may represent the starting point of designs focused on the integration of multivariate imaging gene-environment interactions open up new sources of analysis by means of which to gain an understanding of the conditional mechanisms through which genes, environment, and brain features interact to predict brain diseases and neurological conditions [107,108].

    • A kernel machine method for detecting higher order interactions in multimodal datasets: Application to schizophrenia

      2018, Journal of Neuroscience Methods
      Citation Excerpt :

      Recently, positive definite kernel based methods have become an effective tool in imaging genetics. For example, they have been used for identifying genes associated with diseases (Li and Cui, 2012; Ge et al., 2015; Alam et al., 2016a,b). Kernel methods offer useful ways to learn how a large collection of genetic variants are associated with complex phenotypes, to help explore the relationship between genetic markers and a disease state (Camps-Valls et al., 2007; Yu et al., 2011; Alam, 2014; Alam and Fukumizu, 2015; Schölkopf et al., 1998; Kung, 2014).

    • Strategies for integrated analysis in imaging genetics studies

      2018, Neuroscience and Biobehavioral Reviews
      Citation Excerpt :

      sCMs have also been applied in IG to achieve better discrimination of disease status based on multiple imaging and multiple genetic features. These methods include machine learning techniques, which can be also used to define gene-sets or pathways that best predict the imaging phenotype (Cao et al., 2013), Kernel machine-based methods (Ge et al., 2015, 2012; Zhang et al., 2014) and Bayesian methods (Batmanghelich et al., 2013; Stingo et al., 2013; Zhe et al., 2014). In general, sCMs apply a sparse representation coefficient during classification, which contains very important discriminating information.

    • Introduction

      2018, Imaging Genetics
    • Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials

      2017, Alzheimer's and Dementia
      Citation Excerpt :

      Lack of these risk alleles was estimated to decrease AD incidence by 8%. CR1 and EPHA1 interacted with cardiovascular disease risk factors to reduce hippocampal volume [189]. Cardiovascular risk dominated the genetic risk of these loci in terms of interaction effect such that at low genetic risk, high cardiovascular risk factors had a more detrimental effect (Fig. 10).

    View all citing articles on Scopus

    Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

    1

    JWS and MRS contributed equally.

    View full text