ABSTRACT
Classification is a technique based on machine learning used to classify each item in a set of data into a set of predefined classes or group. It is widely used in medical field to classify the medical data. In producing better classification result, feature selection been applied in many of the classification work as part of preprocessing step, where a subset of feature been used rather than the whole features from particular dataset. Feature selection eliminates irrelevant attribute to obtain high quality features that may contribute in enhancing classification process and producing better classification results. This study is conducted with the intention to focus on feature selection techniques as a method that helps classifiers producing better classification performance with the most significant features. During the experiments, a comparison between benchmark feature selection methods based on three cancer datasets and four well recognized machine learning algorithms has been made. This paper then analyzes the performance of all classifiers with and without feature selection in term of ROC and F-Measure. The study found that although there are no single feature selection method can satisfy all datasets, the results still effectively support the fact that feature selection helps in increasing the classifier performance with existence of minimum number of features.
- Cancer Research UK. (n.d.). Retrieved March 15, 2016, from http://www.cancerresearchuk.org/Google Scholar
- Fedele, S. "Diagnostic aids in the screening of oral cancer," Head & Neck Oncology, vol. 1, pp. 5, January 2009.Google Scholar
- Han, J. and Kamber, M. 2000. Data Mining; Concepts and Techniques. Morgan Kaufmann Publishers. Google ScholarDigital Library
- Jaganathan, P. and Kuppuchamy, R. 2013. A Threshold Fuzzy Entropy based Feature Selection for Medical Database Classification. Computers in Biology and Medicine, 43 (2013), 2222--2229. Google ScholarDigital Library
- Sohail, A. S. M., Rahman, M. M., Bhattacharya, P., Krishnamurthy, S. and Mudur. S. P. 2010. Retrieval and Classification of Ultrasound Images of Ovarian Cysts Combining Texture Features and Histogram Moments. 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 288--291. Google ScholarDigital Library
- Lin, J. J. and Chang, P. C. 2010. A Particle Swarm Optimization based Classifier for Liver Disorders Classification. 2010 International Conference on Computational Problem-Solving (ICCP), 63--65.Google Scholar
- Ranganatha, S. Pooja Raj, H. R., Anusha, C. and Vinay, S. K. 2013. Medical Data Mining and Analysis for Heart Disease Dataset using Classification Techniques. Challenges in Research and technology in the Coming Decades. National Conference (CRT 2013), 1--5. http://dx.doi.org/10.1049/cp.2013.2485.Google Scholar
- Song, Y., Huang, J., Zhou, D., Zha, H. and Giles, C. L. 2007. Informative K-Nearest Neighbor Pattern Classification. 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, pp. 248--264.Google Scholar
- Vaithiyanathan, V., Rajeswari, K., Tajane, K. and Pitale, R. 2013. Comparison of Different Classification Techniques using Different Datasets. International Journal of Advances in Engineering & Technology, May 2013.Google Scholar
- Al-Aidaroos, K. M., Bakar, A. A. and Othman, Z. (2012). Medical Data Classification with Naive Bayes Approach, International Journal of Advancements in Computing Technology, vol. 11, no. 9, pp. 1166--1174.Google Scholar
- Nalband, S., Sundar, A., Prince, A. A. and Agarwal, A. 2016. Feature Selection and Classification Methodology for the Detection of Knee Joint Disorders. Computer Methods and Programs in Biomedicine, 127 (2016), 94--104. Google ScholarDigital Library
- Sheikhpour, R., Sarram, M. A. and Sheikhpour, R. 2016. Particle Swarm Optimization for Bandwidth Determination and Feature Selection of Kernel Density Estimation based Classifiers in Diagnosis of Breast Cancer. Applied Soft Computing, 40 (2016), 113--131. Google ScholarDigital Library
- Koller, D. and Sahami, M. (1996). Toward Optimal Feature Selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 284--292.Google ScholarDigital Library
- Shilaskar, S. and Ghatol, A. 2013. Feature Selection for Medical Diagnosis: Evaluation for Cardiovascular Diseases. Expert Systems with Applications, 40 (2013), 4146--4153. Google ScholarDigital Library
- Azar, A. T., Elshazly, H. I., Hassanien, A. E. and Elkorany, A. M. 2104. A random forest classifier for lymph diseases. Computer Methods and Programs in Biomedicine, 113 (2014), 465--473. Google ScholarDigital Library
- Ashraf, M., Chetty, G. and Tran, T. 2013. Feature Selection Techniques on Thyroid, Hepatitis, and Breast Cancer Datasets. International Journal on Data Mining and Intelligent Information Technology Applications (IJMIA), vol. 3, no. 1, pp. 1--8.Google ScholarCross Ref
- Jabbar, M. A., Deekshatulu, B. L. and Chandra, P. 2013. Heart Disease Classification using Nearest Neighbour Classifier with Feature Subset Selection. Annals. Computer Science Series, vol 11.Google Scholar
- Mohd, F., Noor, N. M. N., Bakar, Z. A. and Rajion, Z. A. 2015. Analysis of Oral Cancer Prediction using Features Selection with Machine Learning. The 7th International Conference on Information Technology (ICIT 2015), 283--288. http://doi:10.15849/icit.2015.0058.Google Scholar
- Data Mining Software in Java. (n.d). Retrieved March 15, 2016, from http://www.cs.waikato.ac.nz/ml/weka/Google Scholar
- Bae, C., Yeh, W. C., Chung, Y. Y and Liu, S. L. 2010. Feature selection with Intelligent Dynamic Swarm and Rough Set. Expert Systems with Applications, 37 (2010), 7026--7032. Google ScholarDigital Library
- Chung, Y. Y. and Wahid, N. 2012. A Hybrid Network Intrusion Detection System using Simplified Swarm Optimization, Applied Soft Computing, 12, pp. 3014--3022. Google ScholarDigital Library
Index Terms
- Benchmark of feature selection techniques with machine learning algorithms for cancer datasets
Recommendations
Genetic algorithms in feature and instance selection
Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...
Wide-ranging approach-based feature selection for classification
AbstractFeature selection methods have been issued in the context of data classification due to redundant and irrelevant features. The above features slow the overall system performance, and wrong decisions are more likely to be made with extensive data ...
A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
Highlights- Exploit patterns in filter measure values to auto-identify an optimal feature subset.
AbstractThis paper proposes a new feature selection framework for machine learning-based phishing detection system, called the Hybrid Ensemble Feature Selection (HEFS). In the first phase of HEFS, a novel Cumulative Distribution Function ...
Comments