ABSTRACT
The goal of feature selection (FS) in machine learning is to find the best subset of features to create efficient models for a learning task. Different FS methods are then used to assess features relevancy. An efficient feature selection method should be able to select relevant and non-redundant features in order to improve learning performance and training efficiency on large data. However in the case of non-independents features, we saw existing features selection methods inappropriately remove redundancy which leads to performance loss. We propose in this article a new criteria for feature redundancy analysis. Using our proposed criteria, we then design an efficient features redundancy analysis method to eliminate redundant features and optimize the performance of a classifier. We experimentally compare the efficiency and performance of our method against other existing methods which may remove redundant features. The results obtained show that our method is effective in maximizing performance while reducing redundancy.
- Thushara Amarasinghe, Achala Aponso, and Naomi Krishnarajah. 2018. Critical Analysis of Machine Learning Based Approaches for Fraud Detection in Financial Transactions. In Proceedings of the 2018 International Conference on Machine Learning Technologies (Jinan, China) (ICMLT ’18). Association for Computing Machinery, New York, NY, USA, 12–17. https://doi.org/10.1145/3231884.3231894Google ScholarDigital Library
- Girish Chandrashekar and Ferat Sahin. 2014. A Survey on Feature Selection Methods. Comput. Electr. Eng. 40, 1 (Jan. 2014), 16–28. https://doi.org/10.1016/j.compeleceng.2013.11.024Google ScholarDigital Library
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/mlGoogle Scholar
- Boli Fang, Miao Jiang, P. Cheng, J. Shen, and Yi Fang. 2020. Achieving Outcome Fairness in Machine Learning Models for Social Decision Problems. In IJCAI.Google Scholar
- Bahar Farahani, Mojtaba Barzegari, and Fereidoon Shams Aliee. 2019. Towards Collaborative Machine Learning Driven Healthcare Internet of Things. In Proceedings of the International Conference on Omni-Layer Intelligent Systems (Crete, Greece) (COINS ’19). Association for Computing Machinery, New York, NY, USA, 134–140. https://doi.org/10.1145/3312614.3312644Google ScholarDigital Library
- I. Gheyas and L. Smith. 2010. Feature subset selection in large dimensionality domains. Pattern Recognit. 43(2010), 5–13.Google ScholarDigital Library
- Mark Andrew Hall. 1999. Correlation-based feature selection for machine learning. (1999).Google Scholar
- George H. John, Ron Kohavi, and Karl Pfleger. 1994. Irrelevant Features and the Subset Selection Problem. In Proceedings of the Eleventh International Conference on International Conference on Machine Learning(New Brunswick, NJ, USA) (ICML’94). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 121–129.Google ScholarDigital Library
- Ioannis K. Paparrizos, B. B. Cambazoglu, and A. Gionis. 2011. Machine learned job recommendation. In RecSys ’11.Google Scholar
- Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 8 (Aug. 2005), 1226–1238. https://doi.org/10.1109/TPAMI.2005.159Google ScholarDigital Library
- Zhiwei (Tony) Qin, Jian Tang, and Jieping Ye. 2019. Deep Reinforcement Learning with Applications in Transportation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discover & Data Mining(Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 3201–3202. https://doi.org/10.1145/3292500.3332299Google ScholarDigital Library
- Laura Elena Raileanu and Kilian Stoffel. 2004. Theoretical comparison between the gini index and information gain criteria. Annals of Mathematics and Artificial Intelligence 41, 1(2004), 77–93.Google ScholarDigital Library
- G. Thippa Reddy, M. Praveen Kumar Reddy, Kuruva Lakshmanna, Rajesh Kaluri, Dharmendra Singh Rajput, Gautam Srivastava, and Thar Baker. 2020. Analysis of Dimensionality Reduction Techniques on Big Data. IEEE Access 8(2020), 54776–54788. https://doi.org/10.1109/ACCESS.2020.2980942Google ScholarCross Ref
- Matthias Ring and Bjoern M. Eskofier. 2016. An Approximation of the Gaussian RBF Kernel for Efficient Classification with SVMs. Pattern Recogn. Lett. 84, C (Dec. 2016), 107–113. https://doi.org/10.1016/j.patrec.2016.08.013Google ScholarDigital Library
- S. Salzberg. 1994. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Machine Learning 16(1994), 235–240.Google ScholarCross Ref
- Nirmalya Thakur and Chia Y. Han. 2020. An Approach for Detection of Walking Related Falls During Activities of Daily Living. In 2020 International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). 280–283. https://doi.org/10.1109/ICBAIE49996.2020.00066Google Scholar
- Adnan Ullah, Usman Qamar, Farhan Hassan Khan, and Saba Bashir. 2017. Dimensionality Reduction Approaches and Evolving Challenges in High Dimensional Data. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning (Liverpool, United Kingdom) (IML ’17). Association for Computing Machinery, New York, NY, USA, Article 67, 8 pages. https://doi.org/10.1145/3109761.3158407Google ScholarDigital Library
- B. Venkatesh and J. Anuradha. 2019. A Review of Feature Selection and Its Methods. Cybernetics and Information Technologies 19 (2019), 26 – 3.Google ScholarDigital Library
- Mei Wang, Xinrong Tao, and Fei Han. 2020. A New Method for Redundancy Analysis in Feature Selection. In 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence (Sanya, China) (ACAI 2020). Association for Computing Machinery, New York, NY, USA, Article 21, 5 pages. https://doi.org/10.1145/3446132.3446153Google ScholarDigital Library
- L. Yu and H. Liu. 2004. Efficient Feature Selection via Analysis of Relevance and Redundancy. J. Mach. Learn. Res. 5(2004), 1205–1224.Google ScholarDigital Library
Recommendations
A New Method for Redundancy Analysis in Feature Selection
ACAI '20: Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial IntelligenceFeature selection has become an important research issue in the fields of pattern recognition, data mining and machine learning. When processing some high-dimensional data, traditional machine learning algorithms may not be able to get satisfactory ...
A feature selection algorithm based on redundancy analysis and interaction weight
AbstractThe performance of some three-dimensional mutual information-based algorithms can be affected, since only relevance and interaction are considered. Aiming at solving the problem, a feature selection algorithm based on redundancy analysis and ...
Forward feature selection based on approximate markov blanket
ISNN'12: Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part IIFeature selection has many applications in solving the problems of multivariate time series . A novel forward feature selection method is proposed based on approximate Markov blanket. The relevant features are selected according to the mutual ...
Comments