Feature selection via normative fuzzy information weight with application into tumor classification

https://doi.org/10.1016/j.asoc.2020.106299Get rights and content

Highlights

  • We construct a monotone fuzzy granularity metric structure.

  • We raise a feature selection method by Fuzzy Independent Classification Information.

  • We raise an improved feature selection method by Normative Fuzzy Information Weight.

  • We test the feasibility of the two proposed methods by experiments.

  • We apply the improved feature selection method to tumor classification.

Abstract

Feature selection via mutual information has been widely used in data analysing. Mutual information with monotonous is an effective tool to analyse the correlation and redundancy of features. However, the mutual information, which adopted in most of existing feature selection criterions, can’t explain the correlation and redundancy of features in the fuzzy situation well. Therefore, we propose feature selection strategy via normative fuzzy information weight based on fuzzy conditional mutual information in this paper. Firstly, the monotone fuzzy metric structure is defined, and some theoretical properties are proved. Secondly, we put forward the concept of fuzzy independent classification information based on fuzzy conditional mutual information, and propose a feature selection method via fuzzy independent classification information. Thirdly, considering the proportion of new classification information provided by the selected feature in its own information, we introduce the concept of normative fuzzy information weight and propose an improved feature selection method. Finally, the availability of the two proposed methods is tested by comparative experiments, and the improved feature selection method is applied to tumor classification. This work provides an alternative strategy for feature selection in real-world data applications.

Introduction

Rough set (hereinafter referred to as RS) [1], [2] is an effective mathematical tool to dispose of uncertain information. The idea of RS is to describe imprecise or uncertain information by the known information. This idea can well realize feature selection (also called attribute reduction), which selects the feature subset that can retain the identifiable ability of the original data. It plays an indispensable role in feature selection and attracts many scholars to study its models and applications in depth [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18].

Mutual information is an effective criterion in feature selection. The mutual information of feature ak about decision D is described as I(D;ak)=H(D)H(D|ak)where H(D) and H(D|ak) express the entropy and conditional entropy, respectively. It describes the shared information between two variables. Battiti [19] initially used mutual information to select features. Yu et al. [20] manifested that feature relevance alone is not enough for efficient feature selection of high-dimensional data. These methods focus on whether a candidate feature has an impact on classification, while ignore the possible impact on the selected features when we add the candidate feature.

To solve this issue, Fleuret [21] proposed a feature selection method using conditional mutual information defined as. I(D;ak|aj)=I(D;ak)I(D;ak;aj)In Eq. (2), I(D;ak|aj), namely the conditional mutual information, quantifies the classification information supplied by ak when aj is selected. I(D;ak;aj) indicates the redundant information. Wang et al. [22] presented independent classification information (shortly, ICI) to integrate redundancy and classification information. However, these methods do not consider the proportion of new classification information provided by the selected feature in its own information, and tend to select features having high redundancy with selected features.

Traditional rough set theory can only deal with symbolic data, but cannot deal with numerical data. There are a lot of numerical data in the real world. To handle numerical or fuzzy data, fuzzy rough set (hereinafter referred to as FRS) models have been constructed. Prade and Dubois [23] combined the rough set with fuzzy set, and proposed the fuzzy rough set. It depicts fuzzy concept through fuzzy similarity relation, and extends the application of RS from crisp data to the fuzzy data. Fuzzy rough set models have been widely used in attribute reduction [24], [25], [26], [27], [28] and rule reasoning [29], [30], [31]. Jensen et al. [25], [32] extended the positive region to process fuzzy events in FRS. Hu et al. [33] extended crisp Shannon’s entropy to measure fuzzy information via fuzzy equivalent relation. Tsang et al. [34] exploited an algorithm to calculate attribute reduction by discernibility matrix. Dai et al. [28] further fuzzified the mutual information to propose a valid criterion based on the information gain rate, and obtained remarkable results when applying it in tumour classification. Ni et al. [35] proposed a feature selection accelerator via positive-region based on fuzzy rough set.

Unfortunately, fuzzy extension of Shannon’s entropy does not have monotonicity, and uncertainty measurement by Shannon’s entropy is not applicable for analysing information correlation and redundancy in FRS. Inspired by the concept of knowledge granularity mentioned by Dai et al. [36], we propose a monotone fuzzy metric in the framework of FRS. The fuzzy independent classification of information via fuzzy conditional mutual information is defined, the first feature selection method via fuzzy independent classification information is constructed. Considering the proportion of new classification information provided by the selected attribute in its own information and tend to select features having high redundancy with selected features, we introduce the concept of normative fuzzy information weight via symmetric uncertainty and propose the second feature selection method by the normative fuzzy information weight in FRS. It should be noted that our study is carried out in complete information systems. Therefore, it is assumed that there are no missing values in conditional features or decision feature in the information systems, and the information systems are static.

This paper consists of the following parts. We review some preliminaries in Section 2. In Section 3, the monotone fuzzy metric structure is defined, and we raise two feature selection methods based on fuzzy ICI and normative fuzzy information weight, respectively. Experiments and comparisons with other methods are conducted in Section 4. In Section 5, the second method is applied to the classification of tumors. Conclusion is contained in Section 6.

Section snippets

Some primary concepts in rough sets

In this subsection, we retrospect some basic concepts about RS, which can be searched in some quotations [1], [28], [37], [38], [39].

Definition 1

[1]

IS=U,A,V,f is an information system, U={x1,,xn} is a nonempty samples set; A is the attribute (or feature) set; V is the attributes domains, V=aAVa; f:U×AV, f assigns specific values to object x from attribute a domain (fa,x). For any attribute subset BA, there is an indiscernibility relation IND(B) IND(B)={(x,y)U×U|aB,f(a,x)=f(a,y)}

Evidently, IND(B) is

Feature selection in fuzzy rough sets

A monotone fuzzy measure, called Fuzzy Granularity (simply, FG), is constructed in this section, and we propose two types of feature selection methods via FG.

Experiments and analyses

This section verifies the validity of the two proposed criterions through experiments. The first nine datasets are taken from UCI [50], the CLL_SUB and GLI_85 datasets can be obtained from NCBI (https://www.ncbi.nlm.nih.gov/), and the remaining datasets are from the Repository of Kent Ridge Biomedical Dataset (http://leo.ugr.es/elvira/DBCRepository/). The datasets are described in Table 2.

For evaluating the validity of the proposed criterions in the framework of fuzzy rough set model, we

Application

In the previous section, we show the validity of the two proposed methods. In this section, the second feature selection method NGFMRI is applied to select important features in tumor classification.

Conclusion

The work of this paper is based on the following two reasons: one is that the fuzzy extension of Shannon entropy does not have monotonicity, and uncertainty measurement by Shannon entropy is not applicable for analysing information correlation and redundancy; the other is that many existing feature selection methods ignore the proportion of new classification information provided by the selected features in their own information, and tend to select features having high redundancy with selected

CRediT authorship contribution statement

Jianhua Dai: Conceptualization, Supervision, Methodology, Writing - original draft. Jiaolong Chen: Investigation, Methodology, Software, Writing - original draft.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (No. 61976089, No. 61473259, No. 61070074, No. 60703038), and the Hunan Provincial Science and Technology Project Foundation, China (2018TP1018, 2018RS3065).

References (53)

  • JainI. et al.

    Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification

    Appl. Soft Comput.

    (2018)
  • ChenD. et al.

    Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets

    Inform. Sci.

    (2011)
  • DaiJ.H. et al.

    Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification

    Appl. Soft Comput.

    (2013)
  • LiuG. et al.

    Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning

    Knowl.-Based Syst.

    (2018)
  • JuradoS. et al.

    Fuzzy inductive reasoning forecasting strategies able to cope with missing data: A smart grid application

    Appl. Soft Comput.

    (2017)
  • NiP. et al.

    PARA: A positive-region based attribute reduction accelerator

    Inform. Sci.

    (2019)
  • DaiJ.H. et al.

    Fuzzy rough set model for set-valued data

    Fuzzy Sets and Systems

    (2013)
  • DaiJ.H. et al.

    Entropy measures and granularity measures for set-valued information systems

    Inform. Sci.

    (2013)
  • JingY. et al.

    An incremental attribute reduction approach based on knowledge granularity with a multi-granulation view

    Inform. Sci.

    (2017)
  • SheY.H. et al.

    An axiomatic approach of fuzzy rough sets based on residuated lattices

    Comput. Math. Appl.

    (2009)
  • WuW. et al.

    Generalized fuzzy rough sets

    Inform. Sci.

    (2003)
  • ZadehL.A.

    Similarity relations and fuzzy ordering

    Inform. Sci.

    (1971)
  • MoserB.

    On the t-transitivity of kernels

    Fuzzy Sets and Systems

    (2006)
  • HuQ. et al.

    Gaussian kernel based fuzzy rough sets: Model, uncertainty measures and applications

    Internat. J. Approx. Reason.

    (2010)
  • IizukaN. et al.

    Oligonucleotide microarray for prediction of early intrahepatic recurrence of hepatocellular carcinoma after curative resection

    Lancet

    (2003)
  • PawlakZ.

    Rough set theory and its applications to data analysis

    Cybern. Syst.: Int. J.

    (1998)
  • Cited by (0)

    View full text