Abstract
As a promising tool for dissecting the genetic basis of common diseases, expression quantitative trait loci (eQTL) study has attracted increasing research interest. Traditional eQTL methods focus on testing the associations between individual single-nucleotide polymorphisms (SNPs) and gene expression traits. A major drawback of this approach is that it cannot model the joint effect of a set of SNPs on a set of genes, which may correspond to biological pathways. To alleviate this limitation, in this chapter, we propose geQTL, a sparse regression method that can detect both group-wise and individual associations between SNPs and expression traits. geQTL can also correct the effects of potential confounders. Our method employs computationally efficient technique, thus it is able to fulfill large scale studies. Moreover, our method can automatically infer the proper number of group-wise associations. We perform extensive experiments on both simulated datasets and yeast datasets to demonstrate the effectiveness and efficiency of the proposed method. The results show that geQTL can effectively detect both individual and group-wise signals and outperform the state-of-the-arts by a large margin. This book chapter well illustrates that decoupling individual and group-wise associations for association mapping is able to improve eQTL mapping accuracy, and inferring individual and group-wise associations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andrew G, Gao J (2007) Scalable training of L1-regularized log-linear models. In: Proceedings of the 24th international conference on machine learning
Bochner BR (2003) New technologies to assess genotype-phenotype relationships. Nat Rev Genet 4:309–314
Braun R, Buetow K (2011) Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet 7(6):e1002101
Brem RB, Storey JD, Whittle J, Kruglyak L (2005) Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436:701–703
Broman KW, Wu H, Sen S, Churchill GA (2003) R/QTL: QTL mapping in experimental crosses. Bioinformatics 19(7):889–890
Chen X, Shi X, Xu X, Wang Z, Mills R, Lee C, Xu J (2012) A two-graph guided multi-task Lasso approach for eQTL mapping. In: 15th International conference on artificial intelligence and statistics, AISTATS 2012, pp 208–217
Cheng W, Zhang X, Wu Y, Yin X, Li J, Heckerman D, Wang W (2012) Inferring novel associations between SNP sets and gene sets in eQTL study using sparse graphical model. In: ACM conference on bioinformatics, computational biology and biomedicine ’12, pp 466–473
Cheng W, Zhang X, Guo Z, Shi Y, Wang W (2014) Graph-regularized dual Lasso for robust eQTL mapping. Bioinformatics 30(12):i139–i148
Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT (2005) Mapping determinants of human gene expression by regional and genome-wide association. Nature 437:1365–1369
Fusi N, Stegle O, Lawrence ND (2012) Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput Biol 8(1):e1002330
Holden M, Deng S, Wojnowski L, Kulle B (2008) GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24(23):2784–2785
Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57
Joo JW, Sul JH, Han B, Ye C, Eskin E (2014) Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome Biol 15(4):r61
Lander ES (2011) Initial impact of the sequencing of the human genome. Nature 470(7333):187–197
Listgarten J, Kadie C, Schadt EE, Heckerman D (2010) Correction for hidden confounders in the genetic analysis of gene expression. Proc Natl Acad Sci USA 107(38):16465–16470
Listgarten J, Lippert C, Kang EY, Xiang J, Kadie CM, Heckerman D (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics 29(12):1526–1533
Mazumder R, Hastie T, Tibshirani R (2010) Spectral regularization algorithms for learning large incomplete matrices. J Mach Learn Res 11:2287–2322
McClurg P, Janes J, Wu C, Delano DL, Walker JR, Batalov S, Takahashi JS, Shimomura K, Kohsaka A, Bass J, Wiltshire T, Su AI (2007) Genomewide association analysis in diverse inbred mice: power and population structure. Genetics 176(1):675–683
Michaelson J, Loguercio S, Beyer A (2009) Detection and interpretation of expression quantitative trait loci (eQTL). Methods 48(3):265–276
Musani SK, Shriner D, Liu N, Feng R, Coffey CS, Yi N, Tiwari HK, Allison DB (2007) Detection of gene x gene interactions in genome-wide association studies of human population data. Hum Hered 63(2):67–84
Pujana MA, Han J-DJ, Starita LM, Stevens KN, Tewari M et al. (2007) Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39(11):1338–1349
Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28(10):1353–1358
Smith EN, Kruglyak L (2008) Gene-environment interaction in yeast gene expression. PLoS Biol 6(4):e83
The Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nat Genet 25(1):25–29
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B 58(1):267–288
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93
Yang C, Wang L, Zhang S, Zhao H (2013) Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping. Bioinformatics 29(8):1026–1034
Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet 35(1):57–64
Zhu J, Zhang B, Smith EN, Drees B, Brem RB, Kruglyak L, Bumgarner RE, Schadt EE (2008) Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks. Nat Genet 40(7):854–861
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Cheng, W., Zhang, X., Wang, W. (2020). Sparse Regression Models for Unraveling Group and Individual Associations in eQTL Mapping. In: Shi, X. (eds) eQTL Analysis. Methods in Molecular Biology, vol 2082. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0026-9_8
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0026-9_8
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0025-2
Online ISBN: 978-1-0716-0026-9
eBook Packages: Springer Protocols