Keywords

1 Introduction

Feature selection is an important technique for dimensionality reduction which is widely used in the field of Data mining and Machine Learning. It is a crucial preprocessing stage in Knowledge Discovery in Databases (KDD). The feature selection or feature subset selection is the process of selecting a subset of features by removing redundant features without resulting in information loss. Pawlak [13] developed Rough Set Theory (RST), which got established as a popular soft computing methodology for knowledge discovery amidst uncertainty. Reduct denotes the subset of the features selected using RST.

Classical RST is used for reduct computation in complete symbolic decision systems. But it is inappropriate for reduct computation in real-valued or hybrid decision systems (HDS). Fuzzy rough set model was introduced by Dubois and Prade [3] in 1990 for dealing with hybrid decision systems. Several extensions to Dubois and Prade Fuzzy-Rough Set model were introduced in literature such as Radzikowska and Kerre’s model [14], Hu model [5] and Gaussian Kernel-based Fuzzy Rough Set model [6] etc. Parallely reduct computation approaches are developed in these models which are primarily Sequential Forward Selection (SFS) based algorithms [4, 6, 7, 9, 16,17,18,19,20].

Computing all the possible reducts of a given decision system or computing minimum length optimal reduct is proved to be an NP-Hard problem [14]. Hence researchers are developing the heuristic approach for near optimal reduct computation algorithms. Two important aspects of reduct computing algorithm are the dependency measure (heuristic), used for assessing the quality of the reduct, and the control strategy used for attribute selection. Two important control strategies for reduct computation in literature are SFS and SBE approaches. In SFS, reduct is initialized to empty set and in every iteration the attribute with the optimal heuristic measure is included into the reduct till the end condition is reached. In SBE, reduct is initialized to all the attributes, and with every attribute, a test is conducted to check whether the omission of the attribute doesn’t lead to information loss. If an attribute is found to be redundant then it is removed from the reduct. The SBE approaches will always result in a reduct without redundancy, and the SFS approaches can not guaranty the redundancy less reduct computation.

The complexity of the reduct computation algorithm is much higher in fuzzy rough sets compared to classical rough sets. It is observed that while many SFS and SBE approaches are available for the classical rough sets, in our literature exploration, we have not come across any SBE approach for fuzzy rough sets.

This paper presents an efficient and effective SBE based reduct computation algorithm for Gaussian Kernel-based fuzzy rough sets (GK-FRS). The proposed methodology acquires significance as the theoretical time complexity is of third order (\(O(|C||U|^2)\)) which is significantly better in comparison to existing fuzzy rough set reduct algorithm having fourth order complexity (\(O(|C|^2|U|^2)\)). Cardinality of conditional attributes and universe of objects is denoted by |C| and |U| respectively.

The organization of this paper is given here. Section 2 discusses the basic concepts of rough sets and fuzzy rough sets. Section 3 describes the fundamentals of GK-FRS. Section 4 details the proposed approach for SBE reduct computation in GK-FRS. Experiments and results are provided in Sect. 5 followed by conclusion.

2 Theoretical Background

2.1 Rough Sets and Fuzzy-Rough Sets

The Rough Set Theory is a useful tool to discover data dependency and reduce the dimensionality of the data using data alone. Fuzzy-rough sets is a hybrid model of rough sets and fuzzy sets with ability to deal with quantitative data. Basic rough set theory and fuzzy-rough set theory are described in [4, 18]. The concepts of GK-FRS are described below using the Hybrid Decision Systems (HDS):

HDS is represented by \((U, C \cup \{d\}, V, f)\) where in U is the set of objects and C is the collection of heterogeneous attributes such as quantitative, qualitative, logical, set valued, interval based etc., and ‘d’ is the qualitative decision attribute.

2.2 Gaussian Kernel Function

The Gaussian function is a very popular kernel which is extensively used in SVM and RBF neural networks. The similarity between two objects \(u_i, u_j \in U\) is computed using Gaussian kernel function \(k(u_i, u_j)\) given in Eq. (1)

$$\begin{aligned} k(u_i, u_j) = exp\left( - \frac{||u_i - u_j||^2}{2\delta ^2} \right) \end{aligned}$$
(1)

The distance of object \(u_i\) to object \(u_j\) is given by \(||u_i - u_j||\) and a user controlled parameter \(\delta \) influences the resulting quality of approximation. \(||u_i - u_j||\) is computed as [20]:

$$\begin{aligned} ||u_i - u_j|| = \frac{|a(u_i) - a(u_j)|}{4\delta _a} \end{aligned}$$
(2)

where \(a\in C\) is a quantitative conditional attribute and \(\delta _a\) represents the standard deviation of a.

3 Gaussian Kernel Based Fuzzy Rough Sets

The kernel method and rough set theory are the two imperative aspects of pattern recognition. The kernel function maps the data into a high dimensional space whereas rough set approximate the space. Qinghua Hu et al. [6] introduced Gaussian kernel function with fuzzy rough sets (GK-FRS) by incorporating Gaussian Kernel function with fuzzy-rough sets. The Gaussian kernel based fuzzy lower and upper approximations [6, 20] of a decision system is calculated as:

$$\begin{aligned} \underline{R_G}d_i(x)= & {} inf_{y\notin d_i} \sqrt{1-R_G^2(x,y)} \end{aligned}$$
(3)
$$\begin{aligned} \overline{R_G}d_i(x)= & {} sup_{y\in d_i} R_G(x,y) \end{aligned}$$
(4)

where \(d_i \in U/\{d\}\) and \(x\in U\). For a given \(B \subseteq C\), \(R_G^B\) denotes Gaussian Kernel-based fuzzy similarity relation expressed as a matrix of order |U| \(\times \) |U|. For any \(x,y \in U\), \(R_G^B(x,y)\) represents the fuzzy similarity between the object x and object y based on B attributes. Based on Proposition 3 in [20], \(R_G^{\{a\}\cup \{b\}}\) can be calculated using \(R_G^{\{a\}}\) and \(R_G^{\{b\}}\) by element-wise matrix multiplication as given in Eq. (5)

$$\begin{aligned} R_G^{\{a\}\cup \{b\}}(x, y) = R_G^{\{a\}}(x,y) \times R_G^{\{b\}}(x,y) \qquad \quad \forall x,y \in U \end{aligned}$$
(5)

Indiscernible classes based on decision attribute is \(U/\{d\}\) = \(\{d_1,d_2,...d_l\}\), a partition of U. The fuzzy positive region is computed as:

$$\begin{aligned} POS_{B}(\{d\}) = \bigcup _{i=1}^l \underline{R_G^B}d_i. \end{aligned}$$
(6)

The measure of dependency of ‘d’ on \(B\subseteq C\) is given by

$$\begin{aligned} \gamma _B(\{d\}) = \frac{|POS_B(\{d\})|}{|U|} = \frac{|\bigcup _{i=1}^l \underline{R_G^B} d_i|}{|U|} \end{aligned}$$
(7)

where \(\bigcup _{i=1}^l \underline{R_G^B} d_i = \sum _i \sum _{x\in d_i} \underline{R_G^B} d_i(x)\).

In 2010 Qinghua Hu et al. [6], proposed a feature selection algorithm FS-GKA which is based on computing Dependency with Gaussian kernel approximation (DGKA). Later Zeng et al. [20] used this DGKA algorithm and proposed FRSA-NFS-HIS [20] algorithm for feature selection of the HDS. In 2016, Ghosh et al. [4] proposed an Improved Gaussian kernel approximation (IDGKA) algorithm for dependency computation and developed the algorithm MFRFS-NFS-HIS using IDGKA algorithm.

4 Proposed Backward Elimination Approach for Feature Selection

The nature of SBE based reduct computation requires |C| iterations irrespectively of size of the reduct Red. The SFS based reduct computation benefit from the fact that the number of iterations is limited by the size of the reduct |Red|. In addition to this observation, the primary reason for the computational complexity of SBE stems from the fact that the relational representation (indiscernibility for classical RST, fuzzy similarity relation in fuzzy rough sets) of \(C-\{a\}\) (\(\forall a \in \) C) can not be obtained from the relational representation of C. For example in fuzzy rough sets, fuzzy similarity matrix \(R^{C-\{a\}}\) is usually not derivable directly from the \(R^C\). This results in the requirement for recomputation of \(R^{C-\{a\}}\) from similarity matrices of attributes of \(C-\{a\}\).

The proposed SBE reduct computation algorithm BEA-GK-FRFS using GK-FRS emerged from the identification of possibility for deriving \(R_G^{C-\{a\}}\) directly from \(R_G^C\) and \(R_G^{\{a\}}\) as described bellow:

The Eq. (5) originally from literature [20] can be casted as Eq. (8)

$$\begin{aligned} R_G^C = R_G^{C-\{a\}} * R_G^{\{a\}} \end{aligned}$$
(8)

where operator * represents the element wise matrix multiplication. Hence from Eq. (8) the required \(R_G^{C-\{a\}}\) is obtained by

$$\begin{aligned} R_G^{C-\{a\}} = R_G^{C}/R_G^{\{a\}} \end{aligned}$$
(9)

Equation (9) is well defined only when the atomic component \(R_G^{\{a\}}\) does not contain zeros. The \(R_G^{\{a\}}\) is free from zeros due to the Gaussian Kernel adaptation. However, the possible occurrence of zeros due to the system limitation is addressed by thresholding to \(\epsilon \) (infinitesimal number). It is found by experimental verification that reduct computation is insensitive for this infinitesimal modification.

The composite component \(R_G^C\), resulting from the multiplication of matrices is expected to have very very small value \((>0)\) and more so when |C| is becoming larger. This infinitesimally small value will be represented by exact zero due to the system limitation in representation of numerical precision. These zeros are detrimental for carrying out the computation in Eq. (9). A logarithmic transformation aptly engineered to overcome this ill-conditioning scenario as shown in Eqs. (1013).

$$\begin{aligned}&R_G^{C} = e^{log_e(R_G^C)} \end{aligned}$$
(10)
$$\begin{aligned}&where \qquad log_e(R_G^C) = \sum \nolimits _{b \in C} log_e(R_G^{\{b\}}) \end{aligned}$$
(11)

Here operations of log, exp and \(\sum \) are defined as element-wise matrix operations. The required \(R_G^{C-\{a\}}\) is computed as:

$$\begin{aligned}&R_G^{C-\{a\}} = e^{log_e(R_G^{C-\{a\}})} \end{aligned}$$
(12)
$$\begin{aligned}&where \qquad log_e(R_G^{C-\{a\}}) = log_e(R_G^C) - log_e(R_G^{\{a\}}) \end{aligned}$$
(13)

The Eq. (13) follows from the Eq. (11).

figure a

The proposed algorithm BEA-GK-FRFS is given in Algorithm 1. The order of checking the redundancy in SBE reduct algorithms has an influence on the resulting reduct [8]. BEA-GK-FRFS (line: 4) uses attributes in the ascending order by gamma measure. The time complexity of sequential forward selection based reduct computation algorithms is \(O(|C|^2|U|^2)\) [20]. The order of time complexity of SBE based algorithms also is \(O(|C|^2|U|^2)\) where in an iteration \(R_G^{Red-\{a\}}\) requires O(|C|) matrix operations. In BEA-GK-FRFS using Eqs. (12 and 13), \(R_G^{Red-\{a\}}\) requires only two matrix operations. Hence, the time complexity of algorithm BEA-GK-FRFS is \(O(|C||U|^2)\).

5 Experiments, Results and Analysis

The configuration of the system used for experiments is: CPU:Intel(R) Core i5, Clock Speed: 2.66 GHz, RAM:4 GB, OS:Ubuntu 16.04 LTS 64 bit, and Software:R Studio Version 1.0.136. Nine benchmark quantitative decision systems from UCI Machine Learning Repository [10] were used in the experiments. Out of these datasets, four(6–9 in Table 1) datasets were of large magnitude that memory for representing the fuzzy similarity matrices exceed the system limit. Hence, for these datasets, a stratified random sampling based sub-dataset are used in our experiment. The original size of the dataset is indicated in the bracket.

5.1 Comparative Experiments with MFRSA-NFS-HIS and FRSA-SBE Algorithms

The proposed BEA-GK-FRFS is implemented in R environment. Its performance is compared with the R implementation of MFRSA-NFS-HIS by Ghosh et al. [4] which was established to be an efficient fuzzy-rough set based reduct computation algorithm using SFS strategy. To aptly illustrate the relevance of proposed SBE algorithm FRSA-SBE is implemented in R environment following traditional SBE approach. FRSA-SBE is exactly same as BEA-GK-FRFS except for \(6^{th}\) step in Algorithm 1 wherein \(R_G^{Red-\{a\}}\) is computed from the atomic component \(R_G^{b}, \forall b \in (Red-{a})\). These results are summarized in Table 1 reporting reduct length, computation time in seconds. Table 1 also reports computation gain percentage obtain by BEA-GK-FRFS over MFRSA-NFS-HIS and FRSA-SBE.

Table 1. Comparison of BEA-GK-FRFS, MFRSA-NFS-HIS and FRSA-SBE algorithms

Analysis of Results

Based on the results in Table 1, BEA-GK-FRFS in general computationally efficient in comparison to other algorithms. MFRSA-NFS-HIS has performed better than FRSA-SBE establishing the reason for the precedence given to SFS approaches in comparison to SBE approaches till date. It is observed that on all decision systems especially large scale datasets such as Web, DNA, batch1cifar, Spambase, BEA-GK-FRFS has obtained significant computational gained (greater than 34%) over MFRSA-NFS-HIS validating empirically the betterment observed in theoretical time complexity of BEA-GK-FRFS. On small scale dataset such as German, Image Segmentation (small scale due to less |C|) Sona_Mines_Rocks no significant gain are obtained with respect to FRSA-SBE. This is due to the observation that when |C| is less, direct matrix multiplication is performing similar to exponential and logarithmic operation.

5.2 Comparative Experiments with L-FRFS and B-FRFS Algorithms

R package “Rough Set” [15] is a collaborative effort by several researchers in bringing together established algorithm of rough sets and fuzzy rough sets into a unified framework. L-FRFS and B-FRFS are SFS based fuzzy rough reduct algorithm [9] were made available in Rough Set package. The experiment of reduct computation is performed with B-FRFS and L-FRFS using package implementation, in the same system used for proposed algorithm. It is observed from Table 2 that BEA-GK-FRFS has achieved highly significant computational gained (greater than 95%) compared to L-FRFS and B-FRFS.

Table 2. Comparison of algorithm L-FRFS and B-FRFS available in R package with BEA-GK-FRFS algorithm

We have also analysed performance of reduct in construction of classifier modelFootnote 1, using 10 fold cross validation. No significant differences were observed in the classifier analysis between BEA-GK-FRFS and MFRSA-NFS-HIS.

6 Conclusion

Researchers of fuzzy rough sets have preferred SFS based reduct computation over SBE based owing to increased computation requirements in SBE. This work presented a novel SBE based reduct computation algorithm BEA-GK-FRFS in GK-FRS. The time complexity of BEA-GK-FRFS is third order \((O(|C||U|^2))\) in comparison to existing fuzzy rough set reduct algorithms having fourth order time complexity \((O(|C|^2|U|^2))\). Experiments conducted on benchmark datasets have validated the computation efficiency of BEA-GK-FRFS in comparison to existing fuzzy rough set reduct algorithms.