Skip to main content

Characterising the Influence of Rule-Based Knowledge Representations in Biological Knowledge Extraction from Transcriptomics Data

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10199))

Included in the following conference series:

Abstract

Currently, there is a wealth of biotechnologies (e.g. sequencing, proteomics, lipidomics) able to generate a broad range of data types out of biological samples. However, the knowledge gained from such data sources is constrained by the limitations of the analytics techniques. The state-of-the-art machine learning algorithms are able to capture complex patterns with high prediction capacity. However, often it is very difficult if not impossible to extract human-understandable knowledge out of these patterns. In recent years evolutionary machine learning techniques have shown that they are competent methods for biological/biomedical data analytics. They are able to generate interpretable prediction models and, beyond just prediction models, they are able to extract useful knowledge in the form of biomarkers or biological networks.

The focus of this paper is to thoroughly characterise the impact that a core component of the evolutionary machine learning process, its knowledge representations, has in the process of extracting biologically-useful knowledge out of transcriptomics datasets. Using the FuNeL evolutionary machine learning-based network inference method, we evaluate several variants of rule knowledge representations on a range of transcriptomics datasets to quantify the volume and complementarity of the knowledge that each of them can extract. Overall we show that knowledge representations, often considered a minor detail, greatly impact on the downstream biological knowledge extraction process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Bacardit, J., Burke, E.K., Krasnogor, N.: Improving the scalability of rule-based evolutionary learning. Memet. Comput. 1, 55–67 (2009)

    Article  Google Scholar 

  2. Bacardit, J., Garrell, J.M.: Bloat control and generalization pressure using the minimum description length principle for a Pittsburgh approach learning classifier system. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003–2005. LNCS (LNAI), vol. 4399, pp. 59–79. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71231-2_5

    Chapter  Google Scholar 

  3. Bacardit, J., Krasnogor, N.: Empirical evaluation of ensemble techniques for a Pittsburgh learning classifier system. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006–2007. LNCS (LNAI), vol. 4998, pp. 255–268. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88138-4_15

    Chapter  Google Scholar 

  4. Bacardit, J., Stout, M., Hirst, J.D., Valencia, A., Smith, R.E., Krasnogor, N.: Automated alphabet reduction for protein datasets. BMC Bioinform. 10, 6 (2009)

    Article  Google Scholar 

  5. Bacardit, J., Widera, P., Márquez-Chamorro, A., Divina, F., Aguilar-Ruiz, J.S., Krasnogor, N.: Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics 28(19), 2441–2448 (2012)

    Article  Google Scholar 

  6. Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., Bacardit, J.: Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. Plant Cell 23(9), 3101–3116 (2011)

    Article  Google Scholar 

  7. Beer, D.G., Kardia, S.L.R., Huang, C.C., Giordano, T.J., Levin, A.M., Misek, D.E., Lin, L., Chen, G., Gharib, T.G., Thomas, D.G., Lizyness, M.L., Kuick, R., Hayasaka, S., Taylor, J.M.G., Iannettoni, M.D., Orringer, M.B., Hanash, S.: Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nat. Med. 8(8), 816–824 (2002)

    Article  Google Scholar 

  8. Chowdary, D., Lathrop, J., Skelton, J., Curtin, K., Briggs, T., Zhang, Y., Yu, J., Wang, Y., Mazumder, A.: Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. J. Mol. Diagn.: JMD 8(1), 31–39 (2006)

    Article  Google Scholar 

  9. Fainberg, H.P., Bodley, K., Bacardit, J., Li, D., Wessely, F., Mongan, N.P., Symonds, M.E., Clarke, L., Mostyn, A.: Reduced neonatal mortality in Meishan piglets: a role for hepatic fatty acids? PLoS One 7(11), 1–9 (2012)

    Article  Google Scholar 

  10. Glaab, E., Bacardit, J., Garibaldi, J.M., Krasnogor, N.: Using rule-based machine learning for candidate disease gene prioritization and sample classification of cancer gene expression data. PLoS One 7(7), e39932 (2012)

    Article  Google Scholar 

  11. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lander, E.S.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  12. Gordon, G.J., Jensen, R.V., Hsiao, L.L., Gullans, S.R., Blumenstock, J.E., Ramaswamy, S., Richards, W.G., Sugarbaker, D.J., Bueno, R.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62(17), 4963–4967 (2002)

    Google Scholar 

  13. Hemberg, E., Veeramachaneni, K., Dernoncourt, F., Wagy, M., O’Reilly, U.M.: Efficient training set use for blood pressure prediction in a large scale learning classifier system. In: Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, GECCO 2013 Companion, pp. 1267–1274. ACM, New York (2013)

    Google Scholar 

  14. Lazzarini, N., Widera, P., Williamson, S., Heer, R., Krasnogor, N., Bacardit, J.: Functional networks inference from rule-based machine learning models. BioData Min. 9(1), 28 (2016)

    Article  Google Scholar 

  15. Marcozzi, M., Divina, F., Aguilar-Ruiz, J.S., Vanhoof, W.: A novel probabilistic encoding for EAs applied to biclustering of microarray data. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, GECCO 2011, pp. 339–346. ACM, New York (2011)

    Google Scholar 

  16. Martinez-Ballesteros, M., Nepomuceno-Chamorro, I.A., Riquelme, J.C.: Discovering gene association networks by multi-objective evolutionary quantitative association rules. J. Comput. Syst. Sci. 80, 118–136 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Mi, H., Poudel, S., Muruganujan, A., Casagrande, J.T., Thomas, P.D.: Panther version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 44(D1), D336–D342 (2016)

    Article  Google Scholar 

  18. Pomeroy, S.L., Tamayo, P., Gaasenbeek, M., Sturla, L.M., Angelo, M., McLaughlin, M.E., Kim, J.Y.H., Goumnerova, L.C., Black, P.M., Lau, C., Allen, J.C., Zagzag, D., Olson, J.M., Curran, T., Wetmore, C., Biegel, J.A., Poggio, T., Mukherjee, S., Rifkin, R., Califano, A., Stolovitzky, G., Louis, D.N., Mesirov, J.P., Lander, E.S., Golub, T.R.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415(6870), 436–442 (2002)

    Article  Google Scholar 

  19. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  20. Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C.T., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S., Lander, E.S., Aster, J.C., Golub, T.R.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)

    Article  Google Scholar 

  21. Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)

    Article  Google Scholar 

  22. Swan, A.L., Stekel, D.J., Hodgman, C., Allaway, D., Alqahtani, M.H., Mobasheri, A., Bacardit, J.: A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data. BMC Genom. 16(1), S2 (2015)

    Article  Google Scholar 

  23. Urbanowicz, R.J., Granizo-Mackenzie, A., Moore, J.H.: An analysis pipeline with statistical and visualization-guided knowledge discovery for Michigan-style learning classifier systems. IEEE Comp. Int. Mag. 7(4), 35–45 (2012)

    Article  Google Scholar 

  24. Urbanowicz, R.J., Andrew, A.S., Karagas, M.R., Moore, J.H.: Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J. Am. Med. Inform. Assoc. 20(4), 603612 (2013)

    Article  Google Scholar 

  25. Venturini, G.: SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts. In: Brazdil, P.B. (ed.) ECML 1993. LNCS, vol. 667, pp. 280–296. Springer, Heidelberg (1993). doi:10.1007/3-540-56602-3_142

    Chapter  Google Scholar 

  26. Yagi, T., Morimoto, A., Eguchi, M., Hibi, S., Sako, M., Ishii, E., Mizutani, S., Imashuku, S., Ohki, M., Ichikawa, H.: Identification of a gene expression signature associated with pediatric AML prognosis. Blood 102(5), 1849–1856 (2003)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Engineering and Physical Sciences Research Council [EP/N031962/1]. We are grateful to the School of Computing Science of Newcastle University for the access to its High Performance Computing Cluster. We thank the anonymous reviewers for the valuable feedback received.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaume Bacardit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Baron, S., Lazzarini, N., Bacardit, J. (2017). Characterising the Influence of Rule-Based Knowledge Representations in Biological Knowledge Extraction from Transcriptomics Data. In: Squillero, G., Sim, K. (eds) Applications of Evolutionary Computation. EvoApplications 2017. Lecture Notes in Computer Science(), vol 10199. Springer, Cham. https://doi.org/10.1007/978-3-319-55849-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55849-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55848-6

  • Online ISBN: 978-3-319-55849-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics