Feature selection via normative fuzzy information weight with application into tumor classification
Introduction
Rough set (hereinafter referred to as RS) [1], [2] is an effective mathematical tool to dispose of uncertain information. The idea of RS is to describe imprecise or uncertain information by the known information. This idea can well realize feature selection (also called attribute reduction), which selects the feature subset that can retain the identifiable ability of the original data. It plays an indispensable role in feature selection and attracts many scholars to study its models and applications in depth [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18].
Mutual information is an effective criterion in feature selection. The mutual information of feature about decision is described as where and express the entropy and conditional entropy, respectively. It describes the shared information between two variables. Battiti [19] initially used mutual information to select features. Yu et al. [20] manifested that feature relevance alone is not enough for efficient feature selection of high-dimensional data. These methods focus on whether a candidate feature has an impact on classification, while ignore the possible impact on the selected features when we add the candidate feature.
To solve this issue, Fleuret [21] proposed a feature selection method using conditional mutual information defined as. In Eq. (2), , namely the conditional mutual information, quantifies the classification information supplied by when is selected. indicates the redundant information. Wang et al. [22] presented independent classification information (shortly, ICI) to integrate redundancy and classification information. However, these methods do not consider the proportion of new classification information provided by the selected feature in its own information, and tend to select features having high redundancy with selected features.
Traditional rough set theory can only deal with symbolic data, but cannot deal with numerical data. There are a lot of numerical data in the real world. To handle numerical or fuzzy data, fuzzy rough set (hereinafter referred to as FRS) models have been constructed. Prade and Dubois [23] combined the rough set with fuzzy set, and proposed the fuzzy rough set. It depicts fuzzy concept through fuzzy similarity relation, and extends the application of RS from crisp data to the fuzzy data. Fuzzy rough set models have been widely used in attribute reduction [24], [25], [26], [27], [28] and rule reasoning [29], [30], [31]. Jensen et al. [25], [32] extended the positive region to process fuzzy events in FRS. Hu et al. [33] extended crisp Shannon’s entropy to measure fuzzy information via fuzzy equivalent relation. Tsang et al. [34] exploited an algorithm to calculate attribute reduction by discernibility matrix. Dai et al. [28] further fuzzified the mutual information to propose a valid criterion based on the information gain rate, and obtained remarkable results when applying it in tumour classification. Ni et al. [35] proposed a feature selection accelerator via positive-region based on fuzzy rough set.
Unfortunately, fuzzy extension of Shannon’s entropy does not have monotonicity, and uncertainty measurement by Shannon’s entropy is not applicable for analysing information correlation and redundancy in FRS. Inspired by the concept of knowledge granularity mentioned by Dai et al. [36], we propose a monotone fuzzy metric in the framework of FRS. The fuzzy independent classification of information via fuzzy conditional mutual information is defined, the first feature selection method via fuzzy independent classification information is constructed. Considering the proportion of new classification information provided by the selected attribute in its own information and tend to select features having high redundancy with selected features, we introduce the concept of normative fuzzy information weight via symmetric uncertainty and propose the second feature selection method by the normative fuzzy information weight in FRS. It should be noted that our study is carried out in complete information systems. Therefore, it is assumed that there are no missing values in conditional features or decision feature in the information systems, and the information systems are static.
This paper consists of the following parts. We review some preliminaries in Section 2. In Section 3, the monotone fuzzy metric structure is defined, and we raise two feature selection methods based on fuzzy ICI and normative fuzzy information weight, respectively. Experiments and comparisons with other methods are conducted in Section 4. In Section 5, the second method is applied to the classification of tumors. Conclusion is contained in Section 6.
Section snippets
Some primary concepts in rough sets
In this subsection, we retrospect some basic concepts about RS, which can be searched in some quotations [1], [28], [37], [38], [39].
Definition 1 is an information system, is a nonempty samples set; is the attribute (or feature) set; is the attributes domains, ; , assigns specific values to object from attribute domain (). For any attribute subset , there is an indiscernibility relation
[1]
Evidently, is
Feature selection in fuzzy rough sets
A monotone fuzzy measure, called Fuzzy Granularity (simply, FG), is constructed in this section, and we propose two types of feature selection methods via FG.
Experiments and analyses
This section verifies the validity of the two proposed criterions through experiments. The first nine datasets are taken from UCI [50], the CLL_SUB and GLI_85 datasets can be obtained from NCBI (https://www.ncbi.nlm.nih.gov/), and the remaining datasets are from the Repository of Kent Ridge Biomedical Dataset (http://leo.ugr.es/elvira/DBCRepository/). The datasets are described in Table 2.
For evaluating the validity of the proposed criterions in the framework of fuzzy rough set model, we
Application
In the previous section, we show the validity of the two proposed methods. In this section, the second feature selection method is applied to select important features in tumor classification.
Conclusion
The work of this paper is based on the following two reasons: one is that the fuzzy extension of Shannon entropy does not have monotonicity, and uncertainty measurement by Shannon entropy is not applicable for analysing information correlation and redundancy; the other is that many existing feature selection methods ignore the proportion of new classification information provided by the selected features in their own information, and tend to select features having high redundancy with selected
CRediT authorship contribution statement
Jianhua Dai: Conceptualization, Supervision, Methodology, Writing - original draft. Jiaolong Chen: Investigation, Methodology, Software, Writing - original draft.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (No. 61976089, No. 61473259, No. 61070074, No. 60703038), and the Hunan Provincial Science and Technology Project Foundation, China (2018TP1018, 2018RS3065).
References (53)
Rough sets and intelligent data analysis
Inform. Sci.
(2002)- et al.
Uncertainty measurement for interval-valued decision systems based on extended conditional entropy
Knowl.-Based Syst.
(2012) - et al.
Feature selection based on artificial bee colony and gradient boosting decision tree
Appl. Soft Comput.
(2019) - et al.
Discrete particle swarm optimization approach for cost sensitive attribute reduction
Knowl.-Based Syst.
(2016) - et al.
A rough set approach for selecting clustering attribute
Knowl.-Based Syst.
(2010) - et al.
Attribute selection based on a new conditional entropy for incomplete decision systems
Knowl.-Based Syst.
(2013) - et al.
A group incremental feature selection for classification using rough set theory based genetic algorithm
Appl. Soft Comput.
(2018) Wavelet neural network prediction method of stock price trend based on rough set attribute reduction
Appl. Soft Comput.
(2018)- et al.
Prediction of service life of large centrifugal compressor remanufactured impeller based on clustering rough set and fuzzy bandelet neural network
Appl. Soft Comput.
(2019) - et al.
Fuzzy rough set-based attribute reduction using distance measures
Knowl.-Based Syst.
(2019)