Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter December 19, 2016

Generalized partial linear varying multi-index coefficient model for gene-environment interactions

  • Xu Liu , Bin Gao and Yuehua Cui EMAIL logo

Abstract

Epidemiological studies have suggested the joint effect of simultaneous exposures to multiple environments on disease risk. However, how environmental mixtures as a whole jointly modify genetic effect on disease risk is still largely unknown. Given the importance of gene-environment (G×E) interactions on many complex diseases, rigorously assessing the interaction effect between genes and environmental mixtures as a whole could shed novel insights into the etiology of complex diseases. For this purpose, we propose a generalized partial linear varying multi-index coefficient model (GPLVMICM) to capture the genetic effect on disease risk modulated by multiple environments as a whole. GPLVMICM is semiparametric in nature which allows different index loading parameters in different index functions. We estimate the parametric parameters by a profile procedure, and the nonparametric index functions by a B-spline backfitted kernel method. Under some regularity conditions, the proposed parametric and nonparametric estimators are shown to be consistent and asymptotically normal. We propose a generalized likelihood ratio (GLR) test to rigorously assess the linearity of the interaction effect between multiple environments and a gene, while apply a parametric likelihood test to detect linear G×E interaction effect. The finite sample performance of the proposed method is examined through simulation studies and is further illustrated through a real data analysis.

Acknowledgement

The authors wish to thank three anonymous referees for their constructive comments that greatly improved the manuscript. This work was supported in part by grants from National Science Foundation (DMS-1209112 and IOS-1237969) and from National Natural Science Foundation of China (31371336), and by Program for Innovative Research Team of Shanghai University of Finance and Economics.

Appendix: Proofs

Notations: For any vector 𝝃=(ξ1,,ξs)T, denote ||𝝃||=max1ls|ξl|. For any nonzero matrix As×s, denote its Lr norm as ||A||r=max𝝃s,𝝃0||A||r||𝝃||r1. For any matrix A=(Aij)i,j=1s,t, denote ||A||=maxiisj=1t|Aij. Let C(p)[a,b]={ψ:ψ(p)C[a,b]} be the space of the pth-order smooth functions. Denote the space of Lipschitz continuous functions for any fixed constant c0 as Lib([a,b],c0)={ψ:|ψ(x1)ψ(x2)|c0|x1x2|,x1,x2[a,b]}. The following assumptions are required to show the consistency and asymptotic normality of our estimators.

  1. For each l = 0, 1, the density function fU(βl)() of random variable U(𝜷l)=𝜷lTX is bounded away from 0 on Ωl and there exists a constant 0<c0< such that fU(βl)()Lib([a,b],c0) for 𝜷l in the neighborhood of 𝜷l0, where Ωl={𝜷lTX,X𝒳} and 𝒳 is a compact support of X.

  2. The nonparametric function mlC(r)[a,b], l = 0, 1.

  3. The variance var(Y|V=v)<c1 for some 0<c1<.

  4. 3g(u)/u3 and 2V(u)/u2 are continuous.

  5. There exist constants 0<c21c22< such that czQ(x)=E(Z~Z~T|X=x)Cz for all x𝒳.

  6. The kernel function K() is a symmetric density function with compact support [1,1] and KLib([a,b],cK) for some constant cK.

  7. The function u3K(u) and u3K(u) are bounded and u4K(u)du<.

Let Yz,i=YiZiT𝜶00ZiT𝜶10Gi, Yz=(Yz,1,,Yz,n)T, e=(ε1,,εn)T, 𝕏=(X1,,Xn)T, =(Z1,,Zn)T, G=(G1,,Gn)T, 𝔾=(f1n,G) and Λ^(𝜽)=(𝜶^T,λ^(𝜷)T)T. Let Z~=(Z~1,,Z~n)T with Z~i=(ZiT,ZiTGi)T, defined in Section 2.1. Let 𝚯 be the parametric space of 𝜽. Let qj(x)=(j/xj)Q{g1(x),y}, j = 1, 2, 3. Then q1(x)={yg1(x)}ρ1(x) and q2(x)={yg1(x)}ρ1(x)ρ2(x), and ρj(x)={dg1(x)/dx}j/V{g1(x)}. Denote qk(η~i) by qk{η~(Vi;𝜽0,λ(𝜷0))}, k = 1,2, i = 1, ⋯, n. Let qk=(qk(η~1),,qk(η~n))T and Wq2 be a diagonal matrix with diagonal elements q2{η~}. Define

(A.1)U(𝜷)=E[q2(η~i)Di(𝜷)Di(𝜷)T],U^(𝜷)=1nD(𝜷)TWq2D(𝜷),U(Z~,𝜷)=E[q2(η~i)Di(Z~,𝜷)Di(Z~,𝜷)T],U^(Z~,𝜷)=1nD(Z~,𝜷)TWq2D(Z~,𝜷)

where Di(𝜷)=(Di,sl(𝜷l)𝔾il,1sJn,l=0,1)T and D(𝜷)=(D1(𝜷),,Dn(𝜷))T, which is a n×2Jn matrix.

Proof of Theorem 1: Because the observations V1,,Vn are independent, by Lindeberg-Feller central limit theorem, it is easy to prove that

n1/2i=1n(Z~iP~(Zi)Xm,iP~(Xi))q1{η~i}N(,Σ~),

where Σ=E[ρ2{η(V,𝜽0)}ϕ(V,𝜷0)2]. Then combining Lemma S.6 in the Supplementary material, the proof of Theorem 1 can be completed by Slusky’s theorem.

Proof of Theorem 2: Because for any ul[al,bl], Bs,l(ul), s=1,,Jn,l=0,1, have the banded first derivatives, by (S.9) and (S.7) in the Supplementary material, and Theorem 1, we have, for any ul[al,bl],

|m~l(ul,𝜷^)m~l(ul,𝜷0)|=|D(𝜷^)Tλ^(𝜷^)D(𝜷0)Tλ(𝜷0)||D(𝜷0)T{λ^(𝜷^)λ(𝜷0)}|+|{D(𝜷^)D(𝜷0)}Tλ^(𝜷^)||n1D(𝜷0)TU^(𝜷0)1D(𝜷0)Tq1{η~}|+Op(n1/2)=Op(N/n).

Then, combining Lemma S.4 in the Supplementary material, we have

supul[al,bl]|m~l(ul,𝜷^)ml(ul)|supul[al,bl]|m~l(ul,𝜷^)m~l(ul,𝜷0)|+supul[al,bl]|m~l(ul,𝜷0)ml(ul)|=Op(N/n+Nr).

This completes the proof of Theorem 2.

Proof of Theorem 4: Due to nhl5=O(1), we have nhln2/5=oo(1). By Theorem 3, we have

nhl{m^l(ul,𝜷^)ml(ul)bl(ul)hl2}=nhl{m^lO(ul,𝜷^)ml(ul)bl(ul)hl2}+op(1).

Thus Theorem 4 can be shown straightforwardly by Lemma S.7 in the Supplementary material.

Proof of Theorem 5: It is easy to see that

n1(H0)n1O(H0)=i=1nq1(η^H0O(Vi;𝜽^)){η^H0(Vi;𝜽^)η^H0O(Vi;𝜽^)}1/2i=1nq2(η^H0O(Vi;𝜽^)){η^H0(Vi;𝜽^)η^H0O(Vi;𝜽^)}2=Op(1),

and similarly

n1(H1)n1O(H1)=i=1nq1(η^H1O(Vi;𝜽^)){η^H1(Vi;𝜽^)η^H1O(Vi;𝜽^)}1/2i=1nq2(η^H1O(Vi;𝜽^)){η^H1(Vi;𝜽^)η^H1O(Vi;𝜽^)}2=Op(1),

As the same proof of Lemma S.12 in the Supplementary material, T1=T1O+o(1) holds by showing following claims that

n1(H1)n1(H0)=n1O(H1)n1O(H0)+Op(1).

Thus, T1 and T1O have the same distribution, which completes the proof of Theorem 5 by Lemma S.12 in the Supplementary material.

Proof of Theorem 7: This proof is similar to Liu et al. (2016). So accordingly, we only provide a sketch of a proof here. Let Bn=AΓn1AT, where where A is defined in Section 3.2 and

Γn=i=1nq2(η(Vi;𝜽0))𝚿^i(X,Z)𝚿^i(X,Z)T,with𝚿^i(X,Z)=(Xm^,iP~n(Xi)Z~iP~n(Zi)).

By Lemma S.5 in the Supplementary material, T3 can be decomposed as

2(n3(H1)n3(H0))= 2i=1nq1(η^3,H0(Vi;𝜽^)){η^3,H1(Vi;𝜽^)η^3,H0(Vi;𝜽^)}i=1nq2(η^3,H0(Vi;𝜽^)){η^3,H1(Vi;𝜽^)η^3,H0(Vi;𝜽^)}2+op(1)= 2i=1nq1(η^3,H0O(Vi;𝜽^)){η^3,H1O(Vi;𝜽^)η^3,H0O(Vi;𝜽^)}i=1nq2(η^3,H0O(Vi;𝜽^)){η^3,H1O(Vi;𝜽^)η^3,H0O(Vi;𝜽^)}2+op(1)=(𝜽^H0𝜽^H1)TΓn(𝜽^H0𝜽^H1)+op(1),

where

η^3,H0(Vi;𝜽^)=m^0(𝜷^0,H0TXi,𝜽^H0)+𝜶^0,H0TZi+(m^1(𝜷^1,H0TXi,𝜽^H0)+𝜶^1,H0TZi)Gi,η^3,H1(Vi;𝜽^)=m^0(𝜷^0,H1TXi,𝜽^H1)+𝜶^0,H1TZi+(m^1(𝜷^1,H1TXi,𝜽^H1)+𝜶^1,H1TZi)Gi,η^3,H0O(Vi;𝜽^)=m^0O(𝜷^0,H0TXi,𝜽^H0)+𝜶^0,H0TZi+(m^1O(𝜷^1,H0TXi,𝜽^H0)+𝜶^1,H0TZi)Gi,η^3,H1O(Vi;𝜽^)=m^0O(𝜷^0,H1TXi,𝜽^H1)+𝜶^0,H1TZi+(m^1O(𝜷^1,H1TXi,𝜽^H1)+𝜶^1,H1TZi)Gi.

As the estimators for 𝜽 under the null and alternative hypotheses have the relation

𝜽^H0=𝜽^H1+Γn1ATBn1(𝜸A𝜽^H1),

we have

T3=(𝜸A𝜽^H1)TBn1AΓn1i=1n𝚿^i(X,Z)2Γn1ATBn1(𝜸A𝜽^H1)+op(1)=(𝜸A𝜽^H1)TBn1(𝜸A𝜽^H1)+op(1)

Therefore, under null hypothesis, T3χk2 and under alternative hypothesis T3 asymptotically follows a noncentral χ2 distribution with k degrees of freedom and noncentrality parameter ϕ. This completes the proof of Theorem 7.

References

Cai, Z., J. Fan and R. Li (2000): “Efficient estimation and inferences for varying-coefficient models,” J. Am. Stat. Assoc., 95, 888–902.10.1080/01621459.2000.10474280Search in Google Scholar

Carroll, R. J., J. Fan, I. Gijbels and M. P. Wand (1997): “Generalized partially linear single-index models,” J. Am. Stat. Assoc., 92, 477–489.10.1080/01621459.1997.10474001Search in Google Scholar

Carroll, R. J., D. Ruppert and A. H. Welsh (1998): “Local estimating equations,” J. Am. Stat. Assoc., 93, 214–227.10.1080/01621459.1998.10474103Search in Google Scholar

Carpenter, D. O., K. Arcaro and D. C. Spink (2002): “Understanding the human health effects of chemical mixtures,” Environ. Health. Perspect., 110(suppl 1), 25–42.10.1289/ehp.02110s125Search in Google Scholar PubMed PubMed Central

Cheverud, J. (2001): “A simple correction for multiple comparisons in interval mapping genome scans,” Heredity, 87, 52–58.10.1046/j.1365-2540.2001.00901.xSearch in Google Scholar PubMed

Colditz, G. A. and S. E. Hankinson (2005): “The Nurse’s Health Study: lifestyle and health among women,” Nat. Rev. Cancer, 5, 388–396.10.1038/nrc1608Search in Google Scholar PubMed

Cornelis, M. C., A. Agrawal, J. W. Cole, N. N. Hansel, K. C. Barnes, T. H. Beaty, S. N. Bennett, L. J. Bierut, E. Boerwinkle, K. F. Doheny, B. Feenstra, E. Feingold, M. Fornage, C. A. Haiman, E. L. Harris, M. G. Hayes, J. A. Heit, F. B. Hu, J. H. Kang, C. C. Laurie, H. Ling, T. A. Manolio, M. L. Marazita, R. A. Mathias, D. B. Mirel, J. Paschall, L. R. Pasquale, E. W. Pugh, J. P. Rice, J. Udren, R. M. van Dam, X. Wang, J. L. Wiggs, K. Williams, K. Yu and GENEVA Consortium. (2010) “The Gene, Environment Association Studies consortium (GENEVA): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions,” Genet. Epidemiol., 34, 364–372.10.1002/gepi.20492Search in Google Scholar PubMed PubMed Central

de Boor, C. (1978): A practical guide to splines, Springer, New York.10.1007/978-1-4612-6333-3Search in Google Scholar

Falconer, D. S. (1952): “The problem of environment and selection,” Am. Naturalist, 86, 293–299.10.1086/281736Search in Google Scholar

Fan, J. and J. Jiang (2005): “Nonparametric inferences for additive models,” J. Am. Stat. Assoc., 100, 890–907.10.1198/016214504000001439Search in Google Scholar

Fan, J., C. Zhang and J. Zhang (2001): “Generalized likelihood ratio statistics and Wilks phenomenon,” Ann. Stat., 29, 153–193.Search in Google Scholar

Guo, C., H. Yang and J. Lv (2016): “Generalized varying index coefficient models,” J. Comput. Appl. Math., 300, 1–17.10.1016/j.cam.2015.11.025Search in Google Scholar

Li, Y., N. Wang and R. J. Carroll (2010): “Generalized functional linear models with semiparametric single-index interactions,” J. Am. Stat. Assoc., 105, 621–633.10.1198/jasa.2010.tm09313Search in Google Scholar

Liang, H., X. Liu, R. Li and C. L. Tsai (2010): “Estimation and testing for partially linear singleindex models,” Ann. Stat., 38, 3811–3836.10.1214/10-AOS835Search in Google Scholar

Liu, X., Y. Cui and R. Li (2016): “Partial linear varying multi-index coefficient model for integrative gene-environment interactions,” Stat. Sinica, 26, 1037–1060.10.5705/ss.202015.0114Search in Google Scholar

Liu, X., H. Jiang and Y. Zhou (2014): “Local empirical likelihood inference for varying-coefficient density-ratio models based on case-control data,” J. Am. Stat. Assoc., 109, 635–646.10.1080/01621459.2013.858629Search in Google Scholar

Ma, S. and P. X. Song (2015): “Varying index coefficient models,” J. Am. Stat. Assoc., 110, 341–356.10.1080/01621459.2014.903185Search in Google Scholar

Ma, S. and S. Xu (2015): “Semiparametric nonlinear regression for detecting gene and environment interactions,” J. Stat. Plan. Inference, 156, 31–47.10.1016/j.jspi.2014.08.005Search in Google Scholar

Ma, S., L. Yang, R. Romero and Y. Cui (2011): “Varying coefficient model for gene–environment interaction: a non-linear look,” Bioinformatics, 27, 2119–2126.10.1093/bioinformatics/btr318Search in Google Scholar

Rimm, E. B., E. L. Giovannucci, W. C. Willett, G. A. Colditz, A. Ascherio, B. Rosner and M. J. Stampfer (1991): “Prospective study of alcohol consumption and risk of coronary disease in men,” Lancet, 338, 464–468.10.1016/0140-6736(91)90542-WSearch in Google Scholar

Ruppert, D. (1997): “Empirical-bias bandwidths for lcoal polynomial nonparametric regression and density estimation,” J. Am. Stat. Assoc., 92, 1049–1062.10.1080/01621459.1997.10474061Search in Google Scholar

Ruppert, D., S. J. Sheathers and M. P. Wand (1995): “An effective bandwidth selector for local least squares regression,” J. Am. Stat. Assoc., 90, 1257–1270.10.1080/01621459.1995.10476630Search in Google Scholar

Ross, C. A. and W. W. Smith (2007): “Gene–environment interactions in Parkinson’s disease,” Parkinsonism Relat. Disord., 13, S309–S315.10.1016/S1353-8020(08)70022-1Search in Google Scholar

Perry, J. R. B., B. F. Voight, L. Yengo, N. Amin, J. Dupuis, M. Ganser, H. Grallert, P. Navarro, M. Li, L. Qi, V. Steinthorsdottir, R. A. Scott, P. Almgren, D. E. Arking, Y. Aulchenko, B. Balkau, R. Benediktsson, R. N. Bergman, E. Boerwinkle, L. Bonnycastle, N. P. Burtt, H. Campbell, G. Charpentier, F. S. Collins, C. Gieger, T. Green, S. Hadjadj, A. T. Hattersley, C. Herder, A. Hofman, A. D. Johnson, A. Kottgen, P. Kraft, Y. Labrune, C. Langenberg, A. K. Manning, K. L. Mohlke, A. P. Morris, B. Oostra, J. Pankow, A. K. Petersen, P. P. Pramstaller, I. Prokopenko, W. Rathmann, W. Rayner, M. Roden, I. Rudan, D. Rybin, L. J. Scott, G. Sigurdsson, R. Sladek, G. Thorleifsson, U. Thorsteinsdottir, J. Tuomilehto, A. G. Uitterlinden, S. Vivequin, M. N. Weedon, A. F. Wright; MAGIC; DIAGRAM Consortium; GIANT Consortium, F. B. Hu, T. Illig, L. Kao, J. B. Meigs, J. F. Wilson, K. Stefansson, C. van Duijn, D. Altschuler, A. D. Morris, M. Boehnke, M. I. McCarthy, P. Froguel, C. N. Palmer, N. J. Wareham, L. Groop, T. M. Frayling and S. Cauchi. (2012): “Stratifying type 2 diabetes cases by BMI identifies genetic risk variants in LAMA1 and enrichment for risk variants in lean compared to obese cases,” PLoS Genet., 8, e1002741.10.1371/journal.pgen.1002741Search in Google Scholar PubMed PubMed Central

Sepanski, J. H., R. Knickerbocker and R. J. Carroll (1994): “A semiparametric correction for attenuation,” J. Am. Stat. Assoc., 89, 1366–1373.10.1080/01621459.1994.10476875Search in Google Scholar

Sexton, K. and D. Hattis (2007): “Assessing cumulative health risks from exposure to environmental mixtures – three fundamental questions,” Environ. Health. Perspect., 115, 825–832.10.1289/ehp.9333Search in Google Scholar PubMed PubMed Central

Wang, L. and L. Yang (2007): “Spline-backfitted kernel smoothing of nonlinear additive autoregression model,” Ann. Stat., 35, 2474–2503.10.1214/009053607000000488Search in Google Scholar

Wu, C. and Y. Cui (2013): “A novel method for identifying nonlinear gene-environment interactions in case-control association studies,” Hum. Genet., 132, 1413–1425.10.1007/s00439-013-1350-zSearch in Google Scholar PubMed

Xia, Y. C. and W. K. Li (1999): “On single-index coefficient regression models,” J. Am. Stat. Assoc., 94, 1275–1285.10.1080/01621459.1999.10473880Search in Google Scholar

Zimmet, P., K. Alberti and J. Shaw (2001): “Global and societal implications of the diabetes epidemic,” Nature, 414, 782–787.10.1038/414782aSearch in Google Scholar PubMed


Supplemental Material

The online version of this article (DOI: 10.1515/sagmb-2016-0045) offers supplementary material, available to authorized users.


Published Online: 2016-12-19
Published in Print: 2017-3-1

©2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 30.4.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2016-0045/html
Scroll to top button