Skip to main content
Log in

Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar’s test which shows that SVILP performs significantly (p  <  5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Johnson AM, Maggiora GM (1990) Concepts and applications of molecular similarity, eds. Wiley, New York

    Google Scholar 

  2. Bender A, Jenkins JL, Li Q, Adams SE, Cannon EO, Glen RC (2006) Molecular similarity: advances in methods, applications and validations in virtual screening and QSAR. In: Annual reports in computational chemistry, vol 2, pp 141–168

  3. Patterson DE, Cramer RD, Ferguson AM, Clark RD, Weinberger LE (1996) J Med Chem 39:3049

    Article  CAS  Google Scholar 

  4. Bohm HJ, Schneider G (2000) Virtual screening for bioactive molecules ed. Wiley-VCH

  5. Downs GM, Willett P, Fisanick W (1994) J Chem Inf Comput Sci 34:1094

    Article  CAS  Google Scholar 

  6. Estrada E, Uriarte E (2001) Curr Med Chem 8:1573

    CAS  Google Scholar 

  7. Mason JS, Good AC, Martin EJ (2001) Curr Pharm Des 7:567

    Article  CAS  Google Scholar 

  8. Leach AR, Gillet VJ (2003) An introduction to chemoinformatics. Kluwer, Dordrecht

    Google Scholar 

  9. Gasteiger J (2003) Handbook of chemoinformatics, eds. Wiley-VCH, Weinheim

    Google Scholar 

  10. Scitegic Inc. Retrieved from http://www.scitegic.com/

  11. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) Org Biomol Chem 2:3256

    Article  CAS  Google Scholar 

  12. Elsevier MDL, 2440 Camino Ramon, Suite 300, San Ramon, CA 94583, USA. http://www.mdl.com/

  13. Glen RC, Bender A, Arnby CH, Carlsson L, Boyer S, Smith J (2006) IDrugs 9:199

    CAS  Google Scholar 

  14. Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:170

    Article  CAS  Google Scholar 

  15. Mitchell TM (1997) Machine learning, ed. McGraw-Hill, New York

    Google Scholar 

  16. Liu YA (2004) J Chem Inf Comput Sci 44:1823

    Article  CAS  Google Scholar 

  17. Muggleton SH, Lodhi H, Amini A, Sternberg MJE (2006) In: Holmes D, Jain LC (eds) Innovations in machine learning. Springer-Verlag, pp 113–135

  18. Muggleton SH, Lodhi H, Amini A, Sternberg MJE (2005) Proceedings of the 8th international conference on discovery science. Springer-Verlag, 3735:163

  19. Briem H, Lessel UF (2000) Persepect Drug Discovery Design 20:231

    Article  CAS  Google Scholar 

  20. Bender A, Mussa HY, Glen RC, Reiling S (2004) J Chem Inf Comput Sci 44:1708

    Article  CAS  Google Scholar 

  21. Hert J, Willett P, Wilton DJ, Acklin P, Azzaoui K, Jacoby E, Schuffenhauer A (2004) J Chem Inf Comput Sci 44:1177

    Article  CAS  Google Scholar 

  22. Cannon EO, Bender A, Palmer DS, Mitchell JBO (2006) J Chem Inf Model 46:2369

    Article  CAS  Google Scholar 

  23. World Anti-Doping Agency (WADA), Stock Exchange Tower, 800 Place Victoria, (Suite 1700), P.O. Box 120, Montreal, Quebec, H4Z 1B7, Canada. Retrieved from http://www.wada.ama.org

  24. Rodgers S, Glen RC, Bender A (2006) J Chem Inf Model 46:569

    Article  CAS  Google Scholar 

  25. King RD, Muggleton SH, Lewis R, Sternberg MJE (1992) Proc Natl Acad Sci 89:11322

    Article  CAS  Google Scholar 

  26. King RD, Muggleton SH, Srinivasan A, Sternberg MJE (1996) Proc Natl Acad Sci 93:438

    Article  CAS  Google Scholar 

  27. Buttingsrud B, Ryeng E, King RD, Alsberg BK (2006) J Comput Aid Mol Des 20:361

    Article  CAS  Google Scholar 

  28. Pompe U, Kononenko I (1995) Proceedings of the 5th international workshop on inductive logic programming, pp 417–436

  29. Dutra I, Page D, Santos Costa V, Shavlik J (2003) In: Matwin S, Sammut C (eds) Proceedings of the 12th international conference on inductive logic programming, vol 2583. Lecture Notes in Computer Science, Springer-Verlag, pp 48–65

  30. Hoche S, Wrobel S (2001) In: Rouveirol C, Sebag M (eds) Proceedings of the 11th interational conference on inductive logic programming, vol 2157. Lecture Notes In Computer Science, Springer-Verlag, pp 51–64

  31. Bender A, Glen RC (2004) Org Biomol Chem 2:3204

    Article  CAS  Google Scholar 

  32. Barrett SJ, Langdon WB (2006) In: Tiwari A, Knowles J (eds) Applications of soft computing: recent trends, vol 19. Springer-Verlag, pp 99–110

  33. Guha R, Howard MT, Hutchison GR, Murray-Rust P, Rzepa H, Steinbeck C, Wegner J, Willighagen EL (2006) J Chem Inf Model 46(3):991. The Open Babel Package (2006), version 2.0.1. Retrieved from http://openbabel.sourceforge.net/

    Google Scholar 

  34. Quinlan JR (1986) Mach Learn 1:81

    Google Scholar 

  35. A-Razzak M, Glen RC (1992) J Comput Aided Mol Des 6:349

    Article  CAS  Google Scholar 

  36. Muggleton SH (1995) New Generation Comput 13:245

    Article  Google Scholar 

  37. Muggleton SH, Bryant CH (2000) In: Cussens J, Frisch AM (eds) Proceedings of the 10th international conference on inductive logic programming. Springer-Verlag, pp 130–146

  38. Joachims T (1999) Making large-Scale SVM learing practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel Methods-Support Vector Learing, MIT-press, http://svmlight.joachims.org

  39. Siegel S, Castellan NJ Jr (1988) Nonparametric statistics for the behavioral sciences. Boston, MA, McGraw-Hill

    Google Scholar 

  40. McNemar Q (1947) Psychometrica 12:153

    Article  Google Scholar 

  41. Bender A, Glen RC (2005) J Chem Inf Model 45:1369

    Article  CAS  Google Scholar 

Download references

Acknowledgements

E.O. Cannon, R.C. Glen and J.B.O. Mitchell thank Unilever plc and the EPSRC for funding. A. Bender thanks the Education Office of the Novartis Institutes for BioMedical Research for a postdoctoral fellowship. A. Amini, M.J.E. Sternberg and S.H. Muggleton thank the BBSRC for funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John B. O. Mitchell.

Electronic supplementary material

Below is the link to the electronic supplementary material

ESM1 (PDF 28 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cannon, E.O., Amini, A., Bender, A. et al. Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds. J Comput Aided Mol Des 21, 269–280 (2007). https://doi.org/10.1007/s10822-007-9113-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-007-9113-3

Keywords

Navigation