Skip to main content

Identification of Individualized Feature Combinations for Survival Prediction in Breast Cancer: A Comparison of Machine Learning Techniques

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2010)

Abstract

The ability to accurately classify cancer patients into risk classes, i.e. to predict the outcome of the pathology on an individual basis, is a key ingredient in making therapeutic decisions. In recent years gene expression data have been successfully used to complement the clinical and histological criteria traditionally used in such prediction. Many “gene expression signatures” have been developed, i.e. sets of genes whose expression values in a tumor can be used to predict the outcome of the pathology. Here we investigate the use of several machine learning techniques to classify breast cancer patients using one of such signatures, the well established 70-gene signature. We show that Genetic Programming performs significantly better than Support Vector Machines, Multilayered Perceptron and Random Forest in classifying patients from the NKI breast cancer dataset, and slightly better than the scoring-based method originally proposed by the authors of the seventy-gene signature. Furthermore, Genetic Programming is able to perform an automatic feature selection. Since the performance of Genetic Programming is likely to be improvable compared to the out-of-the-box approach used here, and given the biological insight potentially provided by the Genetic Programming solutions, we conclude that Genetic Programming methods are worth further investigation as a tool for cancer patient classification based on gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumour and normal colon tissues probed by oligonucleotide arrays. Proc. Nat. Acad. Sci. USA 96, 6745–6750 (1999)

    Article  Google Scholar 

  2. Archetti, F., Lanzeni, S., Messina, E., Vanneschi, L.: Genetic programming for human oral bioavailability of drugs. In: Cattolico, M., et al. (eds.) Proceedings of the 8th annual conference on Genetic and Evolutionary Computation, Seattle, Washington, USA, pp. 255–262 (2006)

    Google Scholar 

  3. Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming and other machine learning approaches to predict median oral lethal dose (LD50) and plasma protein binding levels (%PPB) of drugs. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 11–23. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Archetti, F., Messina, E., Lanzeni, S., Vanneschi, L.: Genetic programming for computational pharmacokinetics in drug discovery and development. Genetic Programming and Evolvable Machines 8(4), 17–26 (2007)

    Article  Google Scholar 

  5. Bojarczuk, C.C., Lopes, H.S., Freitas, A.A.: Data mining with constrained-syntax genetic programming: applications to medical data sets. In: Proceedings Intelligent Data Analysis in Medicine and Pharmacology, vol. 1 (2001)

    Google Scholar 

  6. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  7. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)

    MATH  Google Scholar 

  8. Chu, F., Wang, L.: Applications of support vector machines to cancer classification with microarray data. Int. J. Neural Syst. 15(6), 475–484 (2005)

    Article  MathSciNet  Google Scholar 

  9. Darwin, C.: On the Origin of Species by Means of Natural Selection. John Murray (1859)

    Google Scholar 

  10. Deb, K., Raji Reddy, A.: Reliable classification of two-class cancer data using evolutionary algorithms. Biosystems 72(1-2), 111–129 (2003)

    Article  Google Scholar 

  11. Deutsch, J.M.: Evolutionary algorithms for finding optimal gene sets in microarray prediction. Bioinformatics 19(1), 45–52 (2003)

    Article  Google Scholar 

  12. Friedman, N., Linial, M., Nachmann, I., Peer, D.: Using bayesian networks to analyze expression data. J. Computational Biology 7, 601–620 (2000)

    Article  Google Scholar 

  13. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)

    MATH  Google Scholar 

  14. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  15. Hernandez, J.C.H., Duval, B., Hao, J.-K.: A genetic embedded approach for gene selection and classification of microarray data. In: Marchiori, E., Moore, J.H., Rajapakse, J.C. (eds.) EvoBIO 2007. LNCS, vol. 4447, pp. 90–101. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Holland, J.H.: Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor (1975)

    Google Scholar 

  17. Hong, J.H., Cho, S.B.: The classification of cancer based on dna microarray data that uses diverse ensemble genetic programming. Artif. Intell. Med. 36, 43–58 (2006)

    Article  Google Scholar 

  18. Hsu, A.L., Tang, S.L., Halgamuge, S.K.: An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics 19(16), 2131–2140 (2003)

    Article  Google Scholar 

  19. Koza, J.R.: Genetic Programming. MIT Press, Cambridge (1992)

    MATH  Google Scholar 

  20. Langdon, W.B., Buxton, B.F.: Genetic programming for mining dna chip data from cancer patients. Genetic Programming and Evolvable Machines 5(3), 251–257 (2004)

    Article  Google Scholar 

  21. Liu, J.-J., Cutler, G., Li, W., Pan, Z., Peng, S., Hoey, T., Chen, L., Ling, X.-B.: Multiclass cancer classification and biomarker discovery using ga-based algorithms. Bioinformatics 21, 2691–2697 (2005)

    Article  Google Scholar 

  22. Lu, Y., Han, J.: Cancer classification using gene expression data. Inf. Syst. 28(4), 243–268 (2003)

    Article  MATH  Google Scholar 

  23. Michie, D., Spiegelhalter, D.-J., Taylor, C.-C.: Machine learning, neural and statistical classification. Prentice-Hall, Englewood Cliffs (1994)

    MATH  Google Scholar 

  24. Moore, J.-H., Parker, J.-S., Hahn, L.-W.: Symbolic discriminant analysis for mining gene expression patterns. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 372–381. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  25. Nevins, J.R., Potti, A.: Mining gene expression profiles: expression signatures as cancer phenotypes. Nat. Rev. Genet. 8(8), 601–609 (2007)

    Article  Google Scholar 

  26. Paul, T.K., Iba, H.: Gene selection for classification of cancers using probabilistic model building genetic algorithm. Biosystems 82(3), 208–225 (2005)

    Article  Google Scholar 

  27. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods – Support Vector Learning (1998)

    Google Scholar 

  28. Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming (With contributions by J. R. Koza) (2008), http://lulu.com , http://www.gp-field-guide.org.uk (2008)

  29. Rosskopf, M., Schmidt, H.A., Feldkamp, U., Banzhaf, W.: Genetic programming based DNA microarray analysis for classification of tumour tissues. Technical Report Technical Report 2007-03, Memorial University of Newfoundland (2007)

    Google Scholar 

  30. Haykin, S.: Neural Networks: a comprehensive foundation. Prentice-Hall, London (1999)

    MATH  Google Scholar 

  31. Silva, S.: GPLAB – a genetic programming toolbox for MATLAB, version 3.0 (2007), http://gplab.sourceforge.net

  32. van de Vijver, M.J., He, Y.D., van’t Veer, L.J., Dai, H., Hart, A.A.M., Voskuil, D.W., Schreiber, G.J., Peterse, J.L., Roberts, C., Marton, M.J., Parrish, M., Atsma, D., Witteveen, A., Glas, A., Delahaye, L., van der Velde, T., Bartelink, H., Rodenhuis, S., Rutgers, E.T., Friend, S.H., Bernards, R.: A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347(25), 1999–2009 (2002)

    Article  Google Scholar 

  33. van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A.M., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)

    Article  Google Scholar 

  34. Vanneschi, L.: Theory and Practice for Efficient Genetic Programming. Ph.D. thesis, Faculty of Sciences, University of Lausanne, Switzerland (2004)

    Google Scholar 

  35. Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  36. Weka. A multi-task machine learning software developed by Waikato University (2006), http://www.cs.waikato.ac.nz/ml/weka

  37. Yu, J., Yu, J., Almal, A.A., Dhanasekaran, S.M., Ghosh, D., Worzel, W.P., Chinnaiyan, A.M.: Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9(4), 292–303 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vanneschi, L., Farinaccio, A., Giacobini, M., Mauri, G., Antoniotti, M., Provero, P. (2010). Identification of Individualized Feature Combinations for Survival Prediction in Breast Cancer: A Comparison of Machine Learning Techniques. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12211-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12210-1

  • Online ISBN: 978-3-642-12211-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics