research-article

Benchmark of feature selection techniques with machine learning algorithms for cancer datasets

Authors:
Munirah Mohd Yusof

Universiti Tun Hussein Onn Malaysia

Universiti Tun Hussein Onn Malaysia
View Profile

,
Rozlini Mohamed

Universiti Tun Hussein Onn Malaysia

Universiti Tun Hussein Onn Malaysia
View Profile

,
Noorhaniza Wahid

Universiti Tun Hussein Onn Malaysia

Universiti Tun Hussein Onn Malaysia
View Profile

ICAIR-CACRE '16: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics EngineeringJuly 2016Article No.: 18Pages 1–5https://doi.org/10.1145/2952744.2952753

Published:13 July 2016Publication History

ICAIR-CACRE '16: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering

Pages 1–5

ABSTRACT

Classification is a technique based on machine learning used to classify each item in a set of data into a set of predefined classes or group. It is widely used in medical field to classify the medical data. In producing better classification result, feature selection been applied in many of the classification work as part of preprocessing step, where a subset of feature been used rather than the whole features from particular dataset. Feature selection eliminates irrelevant attribute to obtain high quality features that may contribute in enhancing classification process and producing better classification results. This study is conducted with the intention to focus on feature selection techniques as a method that helps classifiers producing better classification performance with the most significant features. During the experiments, a comparison between benchmark feature selection methods based on three cancer datasets and four well recognized machine learning algorithms has been made. This paper then analyzes the performance of all classifiers with and without feature selection in term of ROC and F-Measure. The study found that although there are no single feature selection method can satisfy all datasets, the results still effectively support the fact that feature selection helps in increasing the classifier performance with existence of minimum number of features.

References

Cancer Research UK. (n.d.). Retrieved March 15, 2016, from http://www.cancerresearchuk.org/Google Scholar
Fedele, S. "Diagnostic aids in the screening of oral cancer," Head & Neck Oncology, vol. 1, pp. 5, January 2009.Google Scholar
Han, J. and Kamber, M. 2000. Data Mining; Concepts and Techniques. Morgan Kaufmann Publishers. Google ScholarDigital Library
Jaganathan, P. and Kuppuchamy, R. 2013. A Threshold Fuzzy Entropy based Feature Selection for Medical Database Classification. Computers in Biology and Medicine, 43 (2013), 2222--2229. Google ScholarDigital Library
Sohail, A. S. M., Rahman, M. M., Bhattacharya, P., Krishnamurthy, S. and Mudur. S. P. 2010. Retrieval and Classification of Ultrasound Images of Ovarian Cysts Combining Texture Features and Histogram Moments. 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 288--291. Google ScholarDigital Library
Lin, J. J. and Chang, P. C. 2010. A Particle Swarm Optimization based Classifier for Liver Disorders Classification. 2010 International Conference on Computational Problem-Solving (ICCP), 63--65.Google Scholar
Ranganatha, S. Pooja Raj, H. R., Anusha, C. and Vinay, S. K. 2013. Medical Data Mining and Analysis for Heart Disease Dataset using Classification Techniques. Challenges in Research and technology in the Coming Decades. National Conference (CRT 2013), 1--5. http://dx.doi.org/10.1049/cp.2013.2485.Google Scholar
Song, Y., Huang, J., Zhou, D., Zha, H. and Giles, C. L. 2007. Informative K-Nearest Neighbor Pattern Classification. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, pp. 248--264.Google Scholar
Vaithiyanathan, V., Rajeswari, K., Tajane, K. and Pitale, R. 2013. Comparison of Different Classification Techniques using Different Datasets. International Journal of Advances in Engineering & Technology, May 2013.Google Scholar
Al-Aidaroos, K. M., Bakar, A. A. and Othman, Z. (2012). Medical Data Classification with Naive Bayes Approach, International Journal of Advancements in Computing Technology, vol. 11, no. 9, pp. 1166--1174.Google Scholar
Nalband, S., Sundar, A., Prince, A. A. and Agarwal, A. 2016. Feature Selection and Classification Methodology for the Detection of Knee Joint Disorders. Computer Methods and Programs in Biomedicine, 127 (2016), 94--104. Google ScholarDigital Library
Sheikhpour, R., Sarram, M. A. and Sheikhpour, R. 2016. Particle Swarm Optimization for Bandwidth Determination and Feature Selection of Kernel Density Estimation based Classifiers in Diagnosis of Breast Cancer. Applied Soft Computing, 40 (2016), 113--131. Google ScholarDigital Library
Koller, D. and Sahami, M. (1996). Toward Optimal Feature Selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284--292.Google ScholarDigital Library
Shilaskar, S. and Ghatol, A. 2013. Feature Selection for Medical Diagnosis: Evaluation for Cardiovascular Diseases. Expert Systems with Applications, 40 (2013), 4146--4153. Google ScholarDigital Library
Azar, A. T., Elshazly, H. I., Hassanien, A. E. and Elkorany, A. M. 2104. A random forest classifier for lymph diseases. Computer Methods and Programs in Biomedicine, 113 (2014), 465--473. Google ScholarDigital Library
Ashraf, M., Chetty, G. and Tran, T. 2013. Feature Selection Techniques on Thyroid, Hepatitis, and Breast Cancer Datasets. International Journal on Data Mining and Intelligent Information Technology Applications (IJMIA), vol. 3, no. 1, pp. 1--8.Google ScholarCross Ref
Jabbar, M. A., Deekshatulu, B. L. and Chandra, P. 2013. Heart Disease Classification using Nearest Neighbour Classifier with Feature Subset Selection. Annals. Computer Science Series, vol 11.Google Scholar
Mohd, F., Noor, N. M. N., Bakar, Z. A. and Rajion, Z. A. 2015. Analysis of Oral Cancer Prediction using Features Selection with Machine Learning. The 7th International Conference on Information Technology (ICIT 2015), 283--288. http://doi:10.15849/icit.2015.0058.Google Scholar
Data Mining Software in Java. (n.d). Retrieved March 15, 2016, from http://www.cs.waikato.ac.nz/ml/weka/Google Scholar
Bae, C., Yeh, W. C., Chung, Y. Y and Liu, S. L. 2010. Feature selection with Intelligent Dynamic Swarm and Rough Set. Expert Systems with Applications, 37 (2010), 7026--7032. Google ScholarDigital Library
Chung, Y. Y. and Wahid, N. 2012. A Hybrid Network Intrusion Detection System using Simplified Swarm Optimization, Applied Soft Computing, 12, pp. 3014--3022. Google ScholarDigital Library

Index Terms

Benchmark of feature selection techniques with machine learning algorithms for cancer datasets
1. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Genetic algorithms in feature and instance selection

Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Read More
Wide-ranging approach-based feature selection for classification
Abstract
Feature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data ...
Read More
A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
Highlights
- Exploit patterns in filter measure values to auto-identify an optimal feature subset.
Abstract
This paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICAIR-CACRE '16: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering
July 2016
150 pages
ISBN:9781450342353
DOI:10.1145/2952744
Conference Chairs:
Dan Zhang
York University, Canada
,
Xiaoping Liu
Carleton University, Canada
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
data mining
feature selection
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 256
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Benchmark of feature selection techniques with machine learning algorithms for cancer datasets

ICAIR-CACRE '16: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Genetic algorithms in feature and instance selection

Wide-ranging approach-based feature selection for classification

A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Benchmark of feature selection techniques with machine learning algorithms for cancer datasets

ICAIR-CACRE '16: Proceedings of the International Conference on Artificial Intelligence and Robotics and the International Conference on Automation, Control and Robotics Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Genetic algorithms in feature and instance selection

Wide-ranging approach-based feature selection for classification

A new hybrid ensemble feature selection framework for machine learning-based phishing detection system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media