Feature selection based on probability and mathematical expectation

Deng, Zhixuan; Li, Tianrui; Liu, Keyu; Zhang, Pengfei; Deng, Dayong

doi:10.1007/s13042-023-01920-8

Feature selection based on probability and mathematical expectation

Original Article
Published: 13 July 2023

Volume 15, pages 477–491, (2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Zhixuan Deng ORCID: orcid.org/0000-0002-3374-3666^1,2,
Tianrui Li^1,2,
Keyu Liu^1,2,
Pengfei Zhang^1,2 &
…
Dayong Deng^3,4

236 Accesses
1 Citation
Explore all metrics

Abstract

Many kinds of information entropy are employed for feature selection, but they lack corresponding probabilities to interpret; Despite many statistical indicators utilized in feature selection, neither probability nor mathematical expectation was applied to perform feature selection directly. To address such two problems, this article redefines three kinds of probabilities and their corresponding mathematical expectations from the perspective of granular computing and investigates their properties. These novel probabilities and mathematical expectations extend the meanings of classical probability and mathematical expectation and provide statistical interpretation for their corresponding information entropy, and then, attribute reducts based on probabilities and mathematical expectations are defined, which are proved to be equivalent to those based on their corresponding information entropy. A framework of feature selection algorithms based on probabilities and mathematical expectations (ARME) is designed after the presentation of their properties. Moreover, a novel definition form for feature selection is proposed, and another feature selection algorithm based on the mathematical expectation of conditional probability (ARMEC) is designed to reduce negative features on classification. Theoretical analysis and experimental results show that probabilities and mathematical expectations have super efficiency than their corresponding information entropy when they are considered as criteria of feature selection. Therefore, the novel method has the advantage over many state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection using three-stage heuristic measures based on mutual fuzzy granularities

Article 06 January 2024

Attribute reduction via local conditional entropy

Article 04 April 2019

Feature selection for set-valued data based on D–S evidence theory

Article 03 August 2022

Notes

https://archive.ics.uci.edu/ml/datasets.php.

References

Meerkov SM, Ravichandran MT (2017) Combating curse of dimensionality in resilient monitoring systems: Conditions for lossless decomposition. IEEE Transact Cybernet 47(5):1263–1272
Google Scholar
Theodoridis S, Koutroumbas K (2006) Feature selection. Pattern Recognition, Beijing: China Machine Press 213–262
Chamakura L, Saha G (2019) An instance voting approach to feature selection. Informat Sci 504:449–469
MathSciNet Google Scholar
Shang R, Song J, Jiao L, Li Y (2020) Double feature selection algorithm based on low-rank sparse non-negative matrix factorization. Int J Mach Learn Cybernet 11(8):1891–1908
Google Scholar
Sun L, Yang Y, Liu Y, Ning T (2023) Feature selection based on a hybrid simplified particle swarm optimization algorithm with maximum separation and minimum redundancy. Int J Mach Learn 14: 789–816
Salesi S, Cosma G, Mavrovouniotis M (2021) Taga: Tabu asexual genetic algorithm embedded in a filter/filter feature selection approach for high-dimensional data. Informat Sci 565:105–127
MathSciNet Google Scholar
Haq A, Zeb A, Lei ZF, Zhang DF (2021) Forecasting daily stock trend using multi-filter feature selection and deep learning. Expert Syst Applicat 168:114444
Google Scholar
Nouri-Moghaddam B, Ghazanfari M, Fathian M (2021) A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst Applicat 175:114737
Google Scholar
Al-Yaseen WL, Idrees AK, Almasoudy FH (2022) Wrapper feature selection method based differential evolution and extreme learning machine for intrusion detection system. Pattern Recognit 132:108912
Google Scholar
Mahendran N (2022) PM DRV (2022) A deep learning framework with an embedded-based feature selection approach for the early detection of the alzheimer’s disease. Comp Biol Med 141:105056
Google Scholar
Pang Q, Zhang L (2021) A recursive feature retention method for semi-supervised feature selection. Int J Mach Learn Cybernet 12(9):2639–2657
Google Scholar
Yao YY (2020) Three-way granular computing, rough sets, and formal concept analysis. Int J Approxim Reason 116:106–125
MathSciNet Google Scholar
Zhang P, Li T, Wang G, Wang D, Lai P, Zhang F (2023) A multi-source information fusion model for outlier detection. Informat Fusion 93:192–208
Google Scholar
Xu W, Guo D, Qian Y, Ding W (2022) Two-way concept-cognitive learning method: a fuzzy-based progressive learning. IEEE Transact Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2022.3216110
Article Google Scholar
Xu W, Guo D, Mi J, Qian Y, Zheng K, Ding W (2023) Two-way concept-cognitive learning via concept movement viewpoint. IEEE Transact Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2023.3235800
Article MathSciNet Google Scholar
Yuan K, Xu W, Li W, Ding W (2022) An incremental learning mechanism for object classification based on progressive fuzzy three-way concept. Informat Sci 584:127–147
Google Scholar
Xu W, Yuan K, Li W (2022) Dynamic updating approximations of local generalized multigranulation neighborhood rough set. Appl Intell 52:9148–9173
Google Scholar
Schmeidler D, Wakker P (1990). In: Eatwell J, Milgate M, Newman P (eds) Expected utility and mathematical expectation. Palgrave Macmillan, UK, London, pp 70–78
Lu C, Zhang XR, Wang XY, Han YD (2015) Mathematical expectation modeling of wide-area controlled power systems with stochastic time delay. IEEE Transact Smart Grid 6(3):1511–1519
Google Scholar
Zhu SY, Lu JQ, Lin L, Liu Y (2021) Minimum-time and minimum-triggering observability of stochastic boolean networks. IEEE Transact Automatic Cont 67(3):1558–1565
MathSciNet Google Scholar
Fang XN, You LH, Liu HH (2021) The expected values of sombor indices in random hexagonal chains, phenylene chains and sombor indices of some chemical graphs. Int J Quantum Chem 121(17):26740
Google Scholar
Zhuang ZH, Tao HF, Chen YY, Stojanovic V, Paszke W (2022) Iterative learning control for repetitive tasks with randomly varying trial lengths using successive projection. Int J Adapt Cont Sig Process 36(5):1196–1215
MathSciNet Google Scholar
Pawlak Z (1982) Rough sets. Int J Comp Informat Sci 11(5):341–356
Google Scholar
Pawlak Z, Skowron A (2007) Rudiments of rough sets. Informat Sci 177(1):3–27
MathSciNet Google Scholar
Lin YJ, Hu QH, Liu JH, Zhu XQ, Wu XD (2021) Mulfe: multi-label learning via label-specific feature space ensemble. ACM Transact Knowledge Discovery Data 16(1):1–24
Google Scholar
Zhang PF, Li TR, Wang GQ, Luo C, Chen HM, Zhang JB, Wang DX, Yu Z (2021) Multi-source information fusion based on rough set theory: A review. Inf Fus 68:85–117
Google Scholar
Liu K, Yang X, Fujita H, Liu D, Yang X, Qian Y (2019) An efficient selector for multi-granularity attribute reduction. Inf Sci 505:457–472
Google Scholar
Li W, Zhai S, Xu W, Pedrycz W, Qian Y, Ding W, Zhan T (2022) Feature selection approach based on improved fuzzy c-means with principle of refined justifiable granularity. IEEE Transact Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2022.3217377
Article Google Scholar
Li W, Zhou H, Xu W, Wang X-Z, Pedrycz W (2022) Interval dominance-based feature selection for interval-valued ordered data. IEEE Transact Neural Net Learn Syst. https://doi.org/10.1109/TNNLS.2022.3184120
Article Google Scholar
Li W, Wei Y, Xu W (2022) General expression of knowledge granularity based on a fuzzy relation matrix. Fuzzy Sets Syst 440:149–163
MathSciNet Google Scholar
Xu W, Yuan K, Li W, Ding W (2023) An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution. IEEE Transact Emerg Top Comput Intell 7(1):76–88
Google Scholar
Liu K, Li T, Yang X, Chen H, Wang J, Deng Z (2023) Semifree: Semi-supervised feature selection with fuzzy relevance and redundancy. IEEE Transact Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2023.3255893
Article Google Scholar
Zhang P, Li T, Yuan Z, Deng Z, Wang G, Wang D, Zhang F (2023) A possibilistic information fusion-based unsupervised feature selection method using information quality measures. IEEE Transact Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2023.3238803
Article Google Scholar
Hu QH, Yu DR (2009) Neighborhood entropy. In: 2009 International Conference on Machine Learning and Cybernetics,3: 1776–1782. IEEE
Hu QH, Zhang L, Zhang D, Pan W, An S, Pedrycz W (2011) Measuring relevance between discrete and continuous features based on neighborhood mutual information. Exp Syst Applicat 38(9):10737–10750
Google Scholar
Mariello A, Battiti R (2018) Feature selection based on the neighborhood entropy. IEEE Transact Neural Net Learn Syst 29(12):6313–6322
Google Scholar
Sun L, Zhang XY, Qian YH, Xu JC, Zhang SG (2019) Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci 502:18–41
MathSciNet Google Scholar
Yuan Z, Chen HM, Li TR, Zhang XY, Sang BB (2022) Multigranulation relative entropy-based mixed attribute outlier detection in neighborhood systems. IEEE Transact Syst, Man, Cybern 52(8):5175–5187
Google Scholar
Sang BB, Chen HM, Yang L, Li TR, Xu WH (2022) Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Transact Fuzzy Syst 30(6):1683–1697
Google Scholar
Zhang P, Li T, Yuan Z, Luo C, Wang G, Liu J, Du S (2022) A data-level fusion model for unsupervised attribute selection in multi-source homogeneous data. Inf Fus 80:87–103
Google Scholar
Wang PX, Yao YY (2018) Ce3: A three-way clustering method based on mathematical morphology. Knowledge-Based Syst 155:54–65
Google Scholar
Li XN, Wang X, Lang GM, Yi HJ (2021) Conflict analysis based on three-way decision for triangular fuzzy information systems. Int J Approx Reason 132:88–106
MathSciNet Google Scholar
Wang WJ, Zhan JM, Mi JS (2022) A three-way decision approach with probabilistic dominance relations under intuitionistic fuzzy information. Inf Sci 582:114–145
MathSciNet Google Scholar
Fan JC, Wang PX, Jiang CM, Yang XB, Song JJ (2022) Ensemble learning using three-way density-sensitive spectral clustering. Int J Approx Reas 149:70–84
MathSciNet Google Scholar
Deng DY, Yan DX, Chen L (2011) Attribute significance for f-parallel reducts. In: 2011 IEEE International Conference on Granular ComputingGrC2011
Deng DY, Xu XY, Huang HK (2015) Concept drift detection for categorical evolving data based on parallel reducts. Comp Res Develop 52(5):1071–1079
Google Scholar
Deng DY, Li YN, Huang HK (2018) Concept drift and attribute reduction from the viewpoint of f-rough sets. ACTA Automatica Sinica 44(10):1781–1789
Google Scholar
Yu DR, An S, Hu QH (2011) Fuzzy mutual information based min-redundancy and max-relevance heterogeneous feature selection. Int J Comput Intell Syst 4(4):619–633
Google Scholar
Sun L, Wang LY, Ding WP, Qian YH, Xu JC (2021) Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets. IEEE Transact Fuzzy Syst 29(1):19–33
Google Scholar
Wan JH, Chen HM, Li TR, Yuan Z, Liu J, Huang W (2021) Interactive and complementary feature selection via fuzzy multigranularity uncertainty measures. IEEE Transact Cybernet. https://doi.org/10.1109/TCYB.2021.3112203
Article Google Scholar
Zhang XY, Fan YR, Yang JL (2021) Feature selection based on fuzzy-neighborhood relative decision entropy. Patt Recog Lett 146:100–107
Google Scholar
Deng DY, Tang YP, Du QL (2022) Ideal information systems and unification of rough set models. J Zhejiang Normal Univ 45(1):21–25
Google Scholar
Hu QH, Yu D, Liu JF, Wu CX (2008) Neighborhood rough set based heterogeneous feature subset selection. Inf Sci 178(18):3577–3594
MathSciNet Google Scholar
Deng DY, Xue HH, Miao DQ, Lu KW (2017) Study on criteria of attribute reduction and information loss of attribute reduction. Acta Electronica Sinica 45(2):401–407
Google Scholar

Download references

Acknowledgements

This work was partially supported by the National Key R &D Program of China (2019YFB2101802), the National Science Foundation of China (615732920), the Zhejiang Provincial Science and Technology Plan Project of China(2023C35089).

Author information

Authors and Affiliations

School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, 611756, China
Zhixuan Deng, Tianrui Li, Keyu Liu & Pengfei Zhang
Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province, Chengdu, 611756, China
Zhixuan Deng, Tianrui Li, Keyu Liu & Pengfei Zhang
Xingzhi College, Zhejiang Normal University, Lanxi, 321100, China
Dayong Deng
Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Jinhua, 321004, China
Dayong Deng

Authors

Zhixuan Deng
View author publications
You can also search for this author in PubMed Google Scholar
Tianrui Li
View author publications
You can also search for this author in PubMed Google Scholar
Keyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Pengfei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dayong Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dayong Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Deng, Z., Li, T., Liu, K. et al. Feature selection based on probability and mathematical expectation. Int. J. Mach. Learn. & Cyber. 15, 477–491 (2024). https://doi.org/10.1007/s13042-023-01920-8

Download citation

Received: 14 November 2022
Accepted: 02 July 2023
Published: 13 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s13042-023-01920-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature selection based on probability and mathematical expectation

Abstract

Access this article

Similar content being viewed by others

Feature selection using three-stage heuristic measures based on mutual fuzzy granularities

Attribute reduction via local conditional entropy

Feature selection for set-valued data based on D–S evidence theory

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature selection based on probability and mathematical expectation

Abstract

Access this article

Similar content being viewed by others

Feature selection using three-stage heuristic measures based on mutual fuzzy granularities

Attribute reduction via local conditional entropy

Feature selection for set-valued data based on D–S evidence theory

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation