Abstract
Feature reduction refers to the problem of deleting those input features that are less predictive of a given outcome; a problem encountered in many areas such as pattern recognition, machine learning and data mining. In particular, it has been successfully applied in tasks that involve datasets containing huge numbers of features. Rough set theory has been used as such a data set preprocessor with much success, but current methods are inadequate at solving the problem of numerical feature reduction. As the classical rough set model can just be used to evaluate categorical features, we introduce a neighborhood rough set model to deal with numerical datasets by defining a neighborhood relation. However, this method is still not enough to find the optimal subsets regularly. In this paper, we propose a new feature reduction mechanism based on fish swarm algorithm (FSA) in an attempt to polish up this. The method is then applied to the problem of finding optimal feature subsets in the neighborhood rough set reduction process. We define three foraging behaviors of fish to find the optimal subsets and a fitness function to evaluate the best solutions. We construct the neighborhood feature reduction algorithm based on FSA and design some experiments comparing with a heuristic neighborhood feature reduction method. Experimental results show that the FSA-based neighborhood reduction method is suitable to deal with numerical data and more possibility to find an optimal reduct.
Similar content being viewed by others
References
Ansari E, Sadreddini MH, Sadeghi B, Alimardani Bigham F (2013) A combinatorial cooperative-tabu search feature reduction approach. Sci Iran 20(3):657–662
Bonabeau E, Dorigo M, Theraulez G (1999) Swarm intelligence: from natural to artificial systems. Oxford University Press, Oxford
Cheng YM, Liang L, Chi SC, Wei WB (2008) Determination of the critical slip surface using artificial fish swarms algorithm. J Geotech Geoenviron Eng 134(2):244–251
Chen X, Sun D, Wang J, Liang J (2008) Time series forecasting based on novel support vector machine using artificial fish swarm algorithm. In: Proceedings 4th international conference on natural computation, pp 206–211
Chen YM, Wu KS, Chen XH et al (2014) An entropy-based uncertainty measurement approach in neighborhood systems. Inf Sci 279:239–250
Chen BJ, Shu HZ, Coatrieux G et al (2015) Color image analysis by quaternion-type moments. J Math Imaging Vis 51(1):124–144
Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorisation. Appl Artif Intell 15(9):843–873
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151:155–176
Greco S, Matarazzo B, Slowinski R (2002) Rough approximation by dominance relations. Int J Intell Syst 17(2):153–171
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Hou ML, Wang SL, Li XL et al (2010) Neighborhood rough set reduction-based gene selection and prioritization for gene expression profile analysis and molecular cancer classification. J Biomed Biotechnol 6:1110–7243
Hu Q, Yu D, Xie Z (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423
Hu Q, Yu D, Xie Z, Liu J (2006) Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans Fuzzy Syst 14(2):191–201
Hu Q, Liu J, Yu D (2008) Mixed feature selection based on granulation and approximation. Knowl Based Syst 21(4):294–304
Hu Q, Yu D, Liu J, Wu C (2008a) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
Hu Q, Yu D, Xie Z (2008b) Neighborhood classifiers. Expert Syst Appl 34(2):866–876
Hu Q, Pedrycz W, Yu D, Lang J (2010) Selecting discrete and continuous features based on neighborhood decision error minimization. IEEE Trans Syst Man Cybern B 40(1):137–150
Jensen R, Shen Q (2003) Finding rough set reducts with ant colony optimization. In: Proceeding of 2003 UK workshop computational intelligence, pp 15–22
Jensen R, Shen Q (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans Knowl Data Eng 16(12):1457–1471
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of AAAI-92, San Jose, CA, pp 129–134
Kohavi R (1994) Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In: Proceedings of AAAI fall symposium on relevance, pp 109–113
Li X, Shao Z, Qian J (2002) An optimizing method based on autonomous animates: fish-swarm algorithm. Syst Eng Theory Pract 22:32–38
Li X, Xue Y, Lu F, Tian G (2004) Parameter estimation method based on artificial fish school algorithm. J ShanDong Univ (Eng Sci) 34(3):84–87
Li X, Lu F, Tian G, Qian J (2004) Applications of artificial fish school algorithm in combinatorial optimization problems. J ShanDong Univ (Eng Sci) 34(5):64–67
Lin TY (2001) Granulation and nearest neighborhoods: rough set approach. In: Granular computing. Physica-Verlag GmbH, Heidelberg, pp 125–142
Liu Y, Xu C, Zhang Q, Pan Y (2008) Rough rule extracting from various conditions: incremental and approximate approaches for inconsistent data. Fundam Inform 84(3–4):403–427
Liu H, Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer, Boston
Ma TH, Jing CX, Hou RT et al (2015) Chi-square statistics feature selection based on term frequency and distribution for text categorization. IETE J Res 61(4):351–362
Miao DQ, Hou L (2004) A comparison of rough set methods and representative inductive learning algorithms. Fundam Inform 59(2–3):203–219
Modrzejewski M (1993) Feature selection using rough sets theory. In: Proceedings of the European conference on machine learning, Vienna, Austria, pp 213–226
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishing, Dordrecht
Peters JF, Wasilewski P (2012) Tolerance spaces: origins, theoretical aspects and applications. Inf Sci 195:211–225
Qian J, Lv P, Yue XD et al (2015) Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl Based Syst 73:18–31
Slowinski R, Vanderpooten D (2000) A generalized definition of rough approximations based on similarity. IEEE Trans Knowl Data Eng 12(2):331–336
Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24(6):833–849
Udhaya kumara S, Hannah Inbaranib H (2015) A novel neighborhood rough set based classification approach for medical diagnosis. Procedia Comput Sci 47:351–359
Wen XZ, Shao L, Xue Y et al (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295(1):395–406
Xue Y, Zhong SM, Ma TH et al (2014) A hybrid evolutionary algorithm for numerical optimization problem. Intell Autom Soft Comput 21(4):473–490
Yang X, Li X, Lin TY (2009) First GrC model: neighborhood systems the most general rough set models. In: GrC, pp 691–695
Yao YY (1998) A comparative study of fuzzy sets and rough sets. Inform Sci 109:21–47
Yao YY, Yao BX (2012) Covering based rough set approximations. Inform Sci 200:91–107
Yeung DS, Chen D, Tsang ECC, Lee JWT, Wang X (2005) On the generalization of fuzzy rough sets. IEEE Trans Fuzzy Syst 13(3):343–361
Yuan CH, Sun XM, Lv R (2016) Fingerprint liveness detection based on multi-scale LPQ and PCA. China Commun 13(7):60–65
Zhai LY et al (2002) Feature extraction using rough set theory and genetic algorithms: an application for the simplification of product quality evaluation. Comput Ind Eng 43:661–676
Zhang XY, Miao DQ (2014) Reduction target structure-based hierarchical attribute reduction for two-category decision-theoretic rough sets. Inf Sci 277(1):755–776
Zhong N, Dong JZ (2001) Using rough sets with heuristics for feature selection. J Intell Inf Syst 16:199–214
Zhu ZX, Ong YS, Dash M (2007) Wrapper-filter feature selection algorithm using a memetic framework. IEEE Trans Syst Man Cybern B Cybern 37:70–76
Zhu W (2009) Relationship among basic concepts in covering-based rough sets. Inform Sci 179(14):2478–2486
Zhu W, Wang FY (2007) On three types of covering-based rough sets. IEEE Trans Knowl Data Eng 19(8):1131–1144
Zhu W, Wang FY (2012) The fourth type of covering-based rough sets. Inform Sci 201:80–92
Acknowledgments
This study was funded by Open Fund Project of State International S&T Cooperation Base of Networked Supporting Software (Nos. NSS1404, NSS1405), National Natural Science Foundation of China (No. 61573297), Postdoctoral Science Foundation of China (No. 2014M562306) and Natural Science Foundation of Fujian Province (No. 2015J01277).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Communicated by A. Di Nola.
Rights and permissions
About this article
Cite this article
Chen, Y., Zeng, Z. & Lu, J. Neighborhood rough set reduction with fish swarm algorithm. Soft Comput 21, 6907–6918 (2017). https://doi.org/10.1007/s00500-016-2393-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-016-2393-6