Skip to main content

Protein Sequence Classification Involving Data Mining Technique: A Review

  • Conference paper
  • First Online:
Smart Computing Paradigms: New Progresses and Challenges

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 767))

  • 314 Accesses

Abstract

In the field of bio-informatics, size of the bio-database is increasing at an exponential rate. In this scenario, traditional data analysis procedure fails to classify it. Currently, a lot of classification techniques involving data mining are used to classify biological data, like protein sequence. In this paper, most popular classification techniques, like neural network-based classifier, fuzzy ARTMAP-based classifier, and rough set classifier are reviewed with the proper limitation. The accuracy level and computational time are also been analyzed in this review. At the end, an idea is proposed which can increase the accuracy level with low computational overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T.L. Jason et al., Application of Neural Networks to Biological Data Mining: A case study in Protein Sequence Classification (KDD, Boston, 2000), pp. 305–309

    Google Scholar 

  2. C. Wu, M. Berry, S. Shivakumar, J. Mclarty, Neural Networks for Full-Scale Protein Sequence Classification: Sequence Encoding with Singular Value Decomposition (Kluwer Academic Publishers, Boston, Machine Learning, 1995), pp. 177–193

    Google Scholar 

  3. Z. Zainuddin, M. Kumar, Radial basic function neural networks in protein sequence classification. MJMS 2(2), 195–204 (2008)

    Google Scholar 

  4. P.V. Nageswara Rao, T. Uma Devi, D. Kaladhar, G. Sridhar, A.A. Rao (2009) A probabilistic neural network approach for protein superfamily classification. J. Theor. Appl. Inf. Technol

    Google Scholar 

  5. S. Mohamed, D. Rubin, T. Marwala, in Multi-class Protein Sequence Classification Using Fuzzy ARTMAP. IEEE Conference. (2006) pp. 1676–1680

    Google Scholar 

  6. E.G. Mansoori et al., Generating fuzzy rules for protein classification. Iran. J. Fuzzy Syst. 5(2), 21–33 (2008)

    MathSciNet  MATH  Google Scholar 

  7. E.G. Mansoori, M.J. Zolghadri, S.D. Katebi, Protein superfamily classification using fuzzy rule-based classifier. IEEE Trans. Nanobiosci. 8(1), 92–99 (2009)

    Article  Google Scholar 

  8. S.A. Rahman, A.A. Bakar, Z.A.M. Hussein, in Feature Selection and Classification of Protein Subfamilies Using Rough Sets. International Conference on Electrical Engineering and Informatics. (Selangor, Malaysia, 2009)

    Google Scholar 

  9. Z. Pawlak (2002) Rough set theory and its applications, J. Telecommun. Inf. Technol

    Google Scholar 

  10. R. Yellasiri, C.R. Rao, Rough set protein classifier. J. Theor. Appl. Inform. Technol (2009)

    Google Scholar 

  11. S. Saha, R. Chaki (2012) Application of data mining in protein sequence classification. IJDMS. 4(5)

    Article  Google Scholar 

  12. J.D. Spalding, D.C. Hoyle, Accuracy of String Kernels for Protein Sequence Classification, ICAPR 2005, vol. 3686. (Springer (LNCS), 2005)

    Chapter  Google Scholar 

  13. N.M. Zaki, S. Deri, R.M. Illias, Protein sequences classification based on string weighting scheme. Int. J. Comput. Internet Manage. 13(1), 50–60 (2005)

    Google Scholar 

  14. A.F. Ali, D.M. Shawky, A novel approach for protein classification using fourier transform. IJEAS 6(4), 2010 (2010)

    Google Scholar 

  15. R. Busa-Fekete, A. Kocsor, S. Pongor (2010) Tree-based algorithms for protein classification. Int. J. Comput. Sci. Eng. (IJCSE)

    Google Scholar 

  16. K. Boujenfa, N. Essoussi, M. Limam, Tree-kNN: A tree-based algorithm for protein sequence classification. IJCSE 3, 961–968 (2011)

    Google Scholar 

  17. P. Desai, Sequence Classification Using Hidden Markov Model (2005)

    Google Scholar 

  18. M.M. Rahman, A.U. Alam, A. Al-Mamun, T.E. Mursalin, A more appropriate protein classification using data mining. JATIT, 33–43 (2010)

    Google Scholar 

  19. D. Bolser et al., Visualization and graph-theoretic analysis of a large-scale protein structural interactome. BMC Bioinformatics 4, 1–11 (2003)

    Article  Google Scholar 

  20. C. Caragea, A. Silvescu, P. Mitra, Protein sequence classification using feature hashing. Proteome Sci. 10(Suppl 1), S14 (2012)

    Article  Google Scholar 

  21. X.M. Zhao et al., A Novel Hybrid GA/SVM System for Protein Sequences Classification, IDEAL 2004, vol. 3177. (Springer(LNCS), 2004), pp. 11–16

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suprativ Saha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saha, S., Bhattacharya, T. (2020). Protein Sequence Classification Involving Data Mining Technique: A Review. In: Elçi, A., Sa, P., Modi, C., Olague, G., Sahoo, M., Bakshi, S. (eds) Smart Computing Paradigms: New Progresses and Challenges. Advances in Intelligent Systems and Computing, vol 767. Springer, Singapore. https://doi.org/10.1007/978-981-13-9680-9_17

Download citation

Publish with us

Policies and ethics