Abstract
Feature selection is an important data analysis technique that used to reduce the redundancy of features and exploit hidden information in high-dimensional data. In this paper we propose a similarity metric based feature selection method named Fesim. We use the Euclidean distance to measure the similarity among all features, and then apply the density based DBSCAN algorithm to clustering features which to be relevant. Moreover, we present a strategy which choose representative features of each cluster accurately. We conducted comprehensive experiments to evaluate the proposed approach, and the results on different datasets are demonstrated its superiority.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th National Conference on Artificial Intelligence, pp. 129–134 (1992)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 1157–1182 (2003)
George, F.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 1289–1305 (2003)
Brassard, G., Bratley, P.: Fundamentals of Algorithmics, 1st edn. Pearson, London (1995)
Ester, M., Kriegel, H.P., Xu, X.: A database interface for clustering in large spatial databases. In: KDD, pp. 94–99 (1995)
Ester, M., Kriegel, H.P., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Palo Alto (1996)
Tahir, N.M., Hussain, A., Samad, S.A.: Feature Selection for Classification Using Decision Tree. Research and Development, Malaysia (2006)
Au, W.-H.: Attribute clustering for grouping, selection, and classification of gene expression data. IEEE Trans. Comput. Biol. Bioinform. 83–101 (2005)
Liu, H.: A new feature selection method based on clustering. In: 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 2. IEEE (2011)
Maji, P.: Mutual information-based supervised attribute clustering for microarray sample classification. IEEE Trans. Knowl. Data Eng. 24(1), 127–140 (2012)
Eshaghi, N., Aghagolzadeh, A.: FFS: an F-DBSCAN clustering- based feature selection for classification data. J. Adv. Comput. Res. Sari Branch, Islamic Azad University, Sari, I. R. Iran, pp. 43–54 (2017)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, University of California, Department of Information and Computer Science, Irvine (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Higuera, C., Gardiner, K.J., Cios, K.J.: Self-organizing feature maps identify proteins critical to learning in a mouse model of down syndrome. PLoS One 10(6), e0129126 (2015)
Ahmed, M.M., Dhanasekaran, A.R., Block, A., Tong, S., Costa, A.C.S., Stasko, M., et al.: Protein dynamics associated with failed and rescued learning in the Ts65Dn mouse model of down syndrome. PLoS One 10(3), e0119491 (2015)
Zarchi, M.S., SMM Fatemi Bushehri, Dehghanizadeh, M.:. SCADI: a standard dataset for self-care problems classification of children with physical and motor disability. Int. J. Med. Inf. (2018)
Andrzejak, R.G., Lehnertz, K., Rieke, C., Mormann, F., David, P., Elger, C.E.: Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. Phys. Rev. E 64, 061907 (2001)
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. (1985)
Rosenberg, A., Hirschberg, J.: V-Measure: a conditional entropy-based external cluster evaluation measure (2007)
Rousseeuw, Peter J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. (1974)
Acknowledgements
This research is supported by the National Natural Science Foundation of China (Grant No. 61462012, No. 61562010, No. U1531246), the Innovation Team of the Data Analysis and Cloud Service of Guizhou Province (Grant No. [2015]53), Science and Technology Project of the Department of Science and Technology in Guizhou Province (Grant No. LH [2016]7427).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, Y., Li, H., Chen, M., Dai, Z., Li, H., Zhu, M. (2019). Enhancing Feature Selection with Density Cluster for Better Clustering. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Computational and Statistical Methods in Intelligent Systems. CoMeSySo 2018. Advances in Intelligent Systems and Computing, vol 859. Springer, Cham. https://doi.org/10.1007/978-3-030-00211-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-00211-4_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00210-7
Online ISBN: 978-3-030-00211-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)