Skip to main content

GP-Pi: Using Genetic Programming with Penalization and Initialization on Genome-Wide Association Study

  • Conference paper
Book cover Artificial Intelligence and Soft Computing (ICAISC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7895))

Included in the following conference series:

  • 2279 Accesses

Abstract

The advancement of chip-based technology has enabled the measurement of millions of DNA sequence variations across the human genome. Experiments revealed that high-order, but not individual, interactions of single nucleotide polymorphisms (SNPs) are responsible for complex diseases such as cancer. The challenge of genome-wide association studies (GWASs) is to sift through high-dimensional datasets to find out particular combinations of SNPs that are predictive of these diseases. Genetic Programming (GP) has been widely applied in GWASs. It serves two purposes: attribute selection and/or discriminative modeling. One advantage of discriminative modeling over attribute selection lies in interpretability. However, existing discriminative modeling algorithms do not scale up well with the increase in the SNP dimension. Here, we have developed GP-Pi. We have introduced a penalizing term in the fitness function to penalize trees with common SNPs and an initializer which utilizes expert knowledge to seed the population with good attributes. Experimental results on simulated data suggested that GP-Pi outperforms GPAS with statistically significance. GP-Pi was further evaluated on a real GWAS dataset of Rheumatoid Arthritis, obtained from the North American Rheumatoid Arthritis Consortium. Our results, with potential new discoveries, are found to be consistent with literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hirschhorn, J., Daly, M.: Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics 6(2), 95–108 (2005)

    Article  Google Scholar 

  2. Wang, W., Barratt, B., Clayton, D., Todd, J.: Genome-wide association studies: theoretical and practical concerns. Nature Reviews Genetics 6(2), 109–118 (2005)

    Article  Google Scholar 

  3. Nunkesser, R., Bernholt, T., Schwender, H., Ickstadt, K., Wegener, I.: Detecting high-order interactions of single nucleotide polymorphisms using genetic programming. Bioinformatics 23(24), 3280–3288 (2007)

    Article  Google Scholar 

  4. Moore, J., White, B.: Genome-wide genetic analysis using genetic programming: The critical need for expert knowledge. In: Genetic Programming Theory and Practice IV, pp. 11–28 (2007)

    Google Scholar 

  5. Reich, D., Lander, E.: On the allelic spectrum of human disease. TRENDS in Genetics 17(9), 502–510 (2001)

    Article  Google Scholar 

  6. Moore, J., Asselbergs, F., Williams, S.: Bioinformatics challenges for genome-wide association studies. Bioinformatics 26(4), 445–455 (2010)

    Article  Google Scholar 

  7. Martin, M.C.: Genetic programming for real world robot vision. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 1, pp. 67–72. IEEE (2002)

    Google Scholar 

  8. Chen, S.H.: Genetic algorithms and genetic programming in computational finance. Springer (2002)

    Google Scholar 

  9. Langdon, W., Barrett, S.: Genetic programming in data mining for drug discovery. In: Evolutionary Computation in Data Mining, pp. 211–235 (2005)

    Google Scholar 

  10. Lo, L., Chan, T., Lee, K., Leung, K.: Challenges rising from learning motif evaluation functions using genetic programming. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp. 171–178. ACM (2010)

    Google Scholar 

  11. Wong, K., Peng, C., Wong, M., Leung, K.: Generalizing and learning protein-dna binding sequence representations by an evolutionary algorithm. Soft Computing-A Fusion of Foundations, Methodologies and Applications 15(8), 1631–1642 (2011)

    Google Scholar 

  12. Greene, C., White, B., Moore, J.: Sensible initialization using expert knowledge for genome-wide analysis of epistasis using genetic programming. In: IEEE Congress on Evolutionary Computation, CEC 2009, pp. 1289–1296 (2009)

    Google Scholar 

  13. Greene, C., White, B., Moore, J.H.: An expert knowledge-guided mutation operator for genome-wide genetic analysis using genetic programming. In: Rajapakse, J.C., Schmidt, B., Volkert, L.G. (eds.) PRIB 2007. LNCS (LNBI), vol. 4774, pp. 30–40. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  14. Estrada-Gil, J., Fernández-López, J., Hernández-Lemus, E., Silva-Zolezzi, I., Hidalgo-Miranda, A., Jiménez-Sánchez, G., Vallejo-Clemente, E.: Gpdti: A genetic programming decision tree induction method to find epistatic effects in common complex diseases. Bioinformatics 23(13), i167–i174 (2007)

    Google Scholar 

  15. Kira, K., Rendell, L.: A practical approach to feature selection. In: Proceedings of the Ninth International Workshop on Machine Learning, pp. 249–256. Morgan Kaufmann Publishers Inc. (1992)

    Google Scholar 

  16. Hahn, L., Ritchie, M., Moore, J.: Multifactor dimensionality reduction software for detecting gene–gene and gene–environment interactions. Bioinformatics 19(3), 376–382 (2003)

    Article  Google Scholar 

  17. Luke, S., Panait, L., Balan, G., Paus, S., Skolicki, Z., Bassett, J., Hubley, R., Chircop, A.: Ecj: A java-based evolutionary computation research system (2007)

    Google Scholar 

  18. Koza, J., James, P.: Rice, genetic programming (videotape): the movie (1992)

    Google Scholar 

  19. Bleuler, S., Brack, M., Thiele, L., Zitzler, E.: Multiobjective genetic programming: Reducing bloat using spea2. In: Proceedings of the 2001 Congress on Evolutionary Computation, vol. 1, pp. 536–543. IEEE (2001)

    Google Scholar 

  20. Wiskott, L., Fellous, J., Kruger, N., Malsburg, C.: Estimating attributes: analysis and extension of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  21. Moore, J.H., White, B.C.: Tuning relieff for genome-wide genetic analysis. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 166–175. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  22. Urbanowicz, R., Kiralis, J., Sinnott-Armstrong, N., Heberling, T., Fisher, J., Moore, J.: Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData mining 5(1), 16 (2012)

    Google Scholar 

  23. Gorman, J., David-Vaudey, E., Pai, M., Lum, R., Criswell, L.: Particular hla–drb1 shared epitope genotypes are strongly associated with rheumatoid vasculitis. Arthritis & Rheumatism 50(11), 3476–3484 (2004)

    Article  Google Scholar 

  24. Stahl, E.A., et al.: Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet. 42(6), 508–514 (2010)

    Article  Google Scholar 

  25. Png, E., Alisjahbana, B., Sahiratmadja, E., Marzuki, S., Nelwan, R., Balabanova, Y., Nikolayevskyy, V., Drobniewski, F., Nejentsev, S., Adnan, I., et al.: A genome wide association study of pulmonary tuberculosis susceptibility in indonesians. BMC Medical Genetics 13(1), 5 (2012)

    Article  Google Scholar 

  26. Li, S., Wang, L., Berman, M., Kong, Y.Y., Dorf, M.E.: Mapping a dynamic innate immunity protein interaction network regulating type i interferon production. Immunity 35(3), 426–440 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sze-To, HY. et al. (2013). GP-Pi: Using Genetic Programming with Penalization and Initialization on Genome-Wide Association Study. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2013. Lecture Notes in Computer Science(), vol 7895. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38610-7_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38610-7_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38609-1

  • Online ISBN: 978-3-642-38610-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics