skip to main content
10.1145/2952744.2952753acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicair-cacreConference Proceedingsconference-collections
research-article

Benchmark of feature selection techniques with machine learning algorithms for cancer datasets

Published:13 July 2016Publication History

ABSTRACT

Classification is a technique based on machine learning used to classify each item in a set of data into a set of predefined classes or group. It is widely used in medical field to classify the medical data. In producing better classification result, feature selection been applied in many of the classification work as part of preprocessing step, where a subset of feature been used rather than the whole features from particular dataset. Feature selection eliminates irrelevant attribute to obtain high quality features that may contribute in enhancing classification process and producing better classification results. This study is conducted with the intention to focus on feature selection techniques as a method that helps classifiers producing better classification performance with the most significant features. During the experiments, a comparison between benchmark feature selection methods based on three cancer datasets and four well recognized machine learning algorithms has been made. This paper then analyzes the performance of all classifiers with and without feature selection in term of ROC and F-Measure. The study found that although there are no single feature selection method can satisfy all datasets, the results still effectively support the fact that feature selection helps in increasing the classifier performance with existence of minimum number of features.

References

  1. Cancer Research UK. (n.d.). Retrieved March 15, 2016, from http://www.cancerresearchuk.org/Google ScholarGoogle Scholar
  2. Fedele, S. "Diagnostic aids in the screening of oral cancer," Head & Neck Oncology, vol. 1, pp. 5, January 2009.Google ScholarGoogle Scholar
  3. Han, J. and Kamber, M. 2000. Data Mining; Concepts and Techniques. Morgan Kaufmann Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jaganathan, P. and Kuppuchamy, R. 2013. A Threshold Fuzzy Entropy based Feature Selection for Medical Database Classification. Computers in Biology and Medicine, 43 (2013), 2222--2229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sohail, A. S. M., Rahman, M. M., Bhattacharya, P., Krishnamurthy, S. and Mudur. S. P. 2010. Retrieval and Classification of Ultrasound Images of Ovarian Cysts Combining Texture Features and Histogram Moments. 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 288--291. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Lin, J. J. and Chang, P. C. 2010. A Particle Swarm Optimization based Classifier for Liver Disorders Classification. 2010 International Conference on Computational Problem-Solving (ICCP), 63--65.Google ScholarGoogle Scholar
  7. Ranganatha, S. Pooja Raj, H. R., Anusha, C. and Vinay, S. K. 2013. Medical Data Mining and Analysis for Heart Disease Dataset using Classification Techniques. Challenges in Research and technology in the Coming Decades. National Conference (CRT 2013), 1--5. http://dx.doi.org/10.1049/cp.2013.2485.Google ScholarGoogle Scholar
  8. Song, Y., Huang, J., Zhou, D., Zha, H. and Giles, C. L. 2007. Informative K-Nearest Neighbor Pattern Classification. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, pp. 248--264.Google ScholarGoogle Scholar
  9. Vaithiyanathan, V., Rajeswari, K., Tajane, K. and Pitale, R. 2013. Comparison of Different Classification Techniques using Different Datasets. International Journal of Advances in Engineering & Technology, May 2013.Google ScholarGoogle Scholar
  10. Al-Aidaroos, K. M., Bakar, A. A. and Othman, Z. (2012). Medical Data Classification with Naive Bayes Approach, International Journal of Advancements in Computing Technology, vol. 11, no. 9, pp. 1166--1174.Google ScholarGoogle Scholar
  11. Nalband, S., Sundar, A., Prince, A. A. and Agarwal, A. 2016. Feature Selection and Classification Methodology for the Detection of Knee Joint Disorders. Computer Methods and Programs in Biomedicine, 127 (2016), 94--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Sheikhpour, R., Sarram, M. A. and Sheikhpour, R. 2016. Particle Swarm Optimization for Bandwidth Determination and Feature Selection of Kernel Density Estimation based Classifiers in Diagnosis of Breast Cancer. Applied Soft Computing, 40 (2016), 113--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Koller, D. and Sahami, M. (1996). Toward Optimal Feature Selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284--292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shilaskar, S. and Ghatol, A. 2013. Feature Selection for Medical Diagnosis: Evaluation for Cardiovascular Diseases. Expert Systems with Applications, 40 (2013), 4146--4153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Azar, A. T., Elshazly, H. I., Hassanien, A. E. and Elkorany, A. M. 2104. A random forest classifier for lymph diseases. Computer Methods and Programs in Biomedicine, 113 (2014), 465--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ashraf, M., Chetty, G. and Tran, T. 2013. Feature Selection Techniques on Thyroid, Hepatitis, and Breast Cancer Datasets. International Journal on Data Mining and Intelligent Information Technology Applications (IJMIA), vol. 3, no. 1, pp. 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jabbar, M. A., Deekshatulu, B. L. and Chandra, P. 2013. Heart Disease Classification using Nearest Neighbour Classifier with Feature Subset Selection. Annals. Computer Science Series, vol 11.Google ScholarGoogle Scholar
  18. Mohd, F., Noor, N. M. N., Bakar, Z. A. and Rajion, Z. A. 2015. Analysis of Oral Cancer Prediction using Features Selection with Machine Learning. The 7th International Conference on Information Technology (ICIT 2015), 283--288. http://doi:10.15849/icit.2015.0058.Google ScholarGoogle Scholar
  19. Data Mining Software in Java. (n.d). Retrieved March 15, 2016, from http://www.cs.waikato.ac.nz/ml/weka/Google ScholarGoogle Scholar
  20. Bae, C., Yeh, W. C., Chung, Y. Y and Liu, S. L. 2010. Feature selection with Intelligent Dynamic Swarm and Rough Set. Expert Systems with Applications, 37 (2010), 7026--7032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chung, Y. Y. and Wahid, N. 2012. A Hybrid Network Intrusion Detection System using Simplified Swarm Optimization, Applied Soft Computing, 12, pp. 3014--3022. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Benchmark of feature selection techniques with machine learning algorithms for cancer datasets

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICAIR-CACRE '16: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering
      July 2016
      150 pages
      ISBN:9781450342353
      DOI:10.1145/2952744

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 July 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader