Bireducts with tolerance relations☆
Introduction
Fuzzy Set Theory (FST) introduced by Zadeh [44] and Rough Set Theory (RST) proposed by Pawlak [35], are complementary approaches to treating imperfect knowledge: the first one allows the elements belonging to a set with a certain degree of truth given, whereas the second one provides approximations of concepts when the available information is incomplete. Specifically, in the absence of exact information about a set, it is represented by a pair of sets, which are the lower approximation and the upper approximation of the set.
Although in the original version proposed by Pawlak, the considered approximations were classical sets, which corresponds to the utilization of equivalence relations while building approximations; there have been introduced some extended variants in which the approximations could be fuzzy sets. Later, the concepts were extended to tolerance relationships and granules. A first definition, the fuzzy rough sets, was given by Fariñas del Cerro and Prade [10], and after this one, numerous hybrid models have been introduced.
Information about the elements of sets that we want to approximate is often expressed by values of some attributes. An important task is to decrease the complexity related to the number of attributes, but without losing ability to produce good approximations. To this end, various types of the so-called reducts which are minimal subsets of attributes preserving various kinds and levels of information were presented and studied in the RST-related literature [8], [17], [29].
Attribute reduction is helpful not only with regard to deriving efficient approximations, but also as a support for construction of clear data-based classification and knowledge representation models. Therefore, there were several attempts to combine RST also with other approaches in this area. As an example of a useful tool in this field, which has become an appealing major research topic from both the theoretical and applied perspectives, let us mention Formal Concept Analysis (FCA) introduced by Wille in [43]. In this case, like in RST, some fuzzy-set-based extensions have been proposed [4], [5], [30]. Moreover, a number of connections between FCA and RST have been identified. In particular, applying the proposed rough set attribute reduction methods can potentially lead toward more powerful FCA models [29], [42].
In this paper, we consider bireducts which extend classical RST-based notions of reducts in order to provide more flexibility in operating with subsets of attributes and subsets of objects that those attributes can efficiently be described [18], [26], [39], [40]. Having a data set organized in a tabular form, the main objective of the bireducts is to reduce the number of attributes by preventing the occurrence of incompatibilities and eliminating existing noise in the original data.
Throughout the work, we refer to information reducts and information bireducts, as well as to decision reducts and decision bireducts. According to RST nomenclature, data collected within the information systems are used to describe general knowledge with respect to a number of attributes, while the decision systems contain a distinguished decision column in which an action, prediction or classification to be taken is determined depending on the multivalued attributes which each of the objects has.
Similarity and tolerance relations have been considered in the literature in order to provide a natural relationship of distance or proximity among the elements considered in the framework [1], [7], [12], [19], [23]. In different cases, a tolerance relation can be more appropriate since, for instance, the transitivity constraints imposed by similarity relations may produce conflicts with user’s specifications or the exclusive use of similarity relations may cause wrong modeling of vague information.
It is worth emphasizing that both theoretical and practical results related to bireducts were, up to now, focused on equivalence relations modeling categorical data. This paper is the first comprehensive attempt to extend those results towards the cases of tolerance relations and, eventually, structures generated by fuzzy similarities.
The obtaining of bireducts in an environment that considers tolerance relations, is more complex but also more applicable in real-world scenarios. Therefore, in this paper we study representations of bireducts both in the classical case and in situations when the notion of equality is weakened towards tolerance. Usually people consider distance-based tolerance relations. The results presented in this paper are more general and can also hold for other tolerance relations, as the ones defined from fuzzy tolerance relations and different thresholds, which will be also considered throughout this paper. It also allows to draw a stronger relationship between tolerance-based extensions developed within RST and FCA.
There are other mechanisms to reduce data sets given in a tabular form. For example, the association between the objects and attributes of a database can be reflected by biclustering, this mechanism aims at group data according to some measure of similarity or distance. Specifically, biclustering enables simultaneous clustering of a two-dimensional matrix. Indeed, the problem of obtaining all non-extendable exact biclusters of a database can be considered equivalent to the problem in FCA of obtaining all concepts of the concept lattice associated with a formal context. In fact, both theories have been applied to the analysis of gene expression data obtaining interesting results [21], [27] and there exist works that relate biclustering to FCA as [20]. However, it is worth noting that biclusters and bireducts are sort of dual concepts, like closures and their minimal generators. Biclusters (just like concepts in FCA) attempt to describe maximally areas of the data (both with respect to objects and attributes). On the other hand, bireducts attempt to describe maximally heterogeneous areas of the data (by means of objects) using minimal number of attributes.
The paper is organized as follows: in Section 2, we summarize several definitions we will use throughout the paper. Then, we generalize the notions of reduct and bireduct considering tolerance relations over the conditional attributes and over the decision attribute in Section 3, introducing an extra flexibility level since a general family of tolerance relations is considered in these definitions. Section 4 presents the corresponding characterizations of the new reducts and bireducts based on a generalization of the discernibility function. The introduced results are applied to a general decision system in Section 5. Finally, the conclusion and future work section is included.
Section snippets
Preliminaries
In this paper the classical theory of propositional logic will be considered in order to interpret the expression of the discernibility function. Hence, several basic notions of propositional logic will be recalled.
First of all, an alphabet is formed by a numerable set of symbols or propositional variables: as well as the constant symbols ⊤ and ⊥, the symbols ¬, ∧, ∨, → and ↔, which are called connectives or logical operators, and the punctuation symbols
(Bi)Reducts over tolerance-based conditional attributes
As we previously mentioned, all the results corresponding to bireducts have hitherto been presented considering equivalence relations. In this work, we will extend those results by using tolerance and fuzzy similarity relations. Therefore, in this section, we will introduce the new necessary definitions corresponding to reducts and bireducts in the proposed framework with tolerance relations.
First of all, the definition of information reduct is introduced.
Definition 8 The set is called -information
Characterizing reducts and bireducts
In this section, in order to ease the calculation of decision reducts and bireducts, we will present some results to characterize these notions. The discernibility function will be the main tool we will use for computing both decision reducts and bireducts, considering fuzzy relations. This function is based on the elements of the discernibility matrix of (U, where and it is defined in this framework, for i and j in as follows:
A worked example
Finally, the following example applies the results to a general decision system in which the value set associated with the decision attribute is not boolean. We adopt the decision system presented in Example 3, with the same set of objects and attributes but, here, the decision attribute is replaced by what kind of activity does each object. The following table shows the relationship between objects and attributes.
Empty Cell Outlook Temp. Humid. Wind Activity? 1 sunny hot high weak run 2 sunny hot high strong
Conclusions and future work
We have studied the reducts and bireducts in the classical environment of RST considering tolerance relations. We have generalized the classical discernibility function notion, from which we have characterized the reducts and bireducts in these environments, providing a linear procedure for computing one reduct of bireduct and a mechanism for computing all of them. The computation of all reducts and bireducts is NP-hard, but the relation to RDNFs also provides the possibility of using various
References (44)
- et al.
Automated prover for attribute dependencies in data with grades.
Int. J. Approx. Reason.
(2016) - et al.
Formal concept analysis and linguistic hedges.
Int. J. Gen. Syst.
(2012) - et al.
Relations of reduction between covering generalized rough sets and concept lattices.
Inf. Sci. (Ny)
(2015) - et al.
Similarity relations in fuzzy attribute-oriented concept lattices.
Fuzzy Sets Syst.
(2015) - et al.
Attribute selection with fuzzy decision reducts.
Inf. Sci. (Ny)
(2010) - et al.
Rough Sets, Twofold Fuzzy Sets and Modal Logic—Fuzziness in Indiscernibility and Partial Information.
Ontology-based concept similarity in formal concept analysis.
Inf. Sci. (Ny)
(2006)- et al.
A formal concept analysis approach to rough data tables.
Lect. Notes Comput. Sci.
(2009) - et al.
Unsupervised similarity learning from textual data.
Fundam. Inf.
(2012) Should fuzzy equality and similarity satisfy transitivity? Comments on the paper by M. De Cock and E. Kerre.
Fuzzy Sets Syst.
(2003)
On databases with incomplete information.
J. ACM
Biclustering algorithms for biological data analysis: a survey.
IEEE/ACM Trans. Comput. Biol. Bioinf.
Multi-adjoint property-oriented and object-oriented concept lattices.
Inf. Sci. (Ny)
Information systems theoretical foundations.
Inf. Syst.
Rough sets
Int. J. Comput. Inf. Sci.
Rough sets and boolean reasoning.
Inf. Sci. (Ny)
Recent advances in decision bireducts: complexity, heuristics and streams.
Lect. Notes Comput. Sci.
Decision bireducts and decision reducts–a comparison.
Int. J. Approx. Reason.
Relation between concept lattice reduction and rough set reduction.
Knowl. Based Syst.
An efficient reasoning method for dependencies over similarity and ordinal data.
Lect. Notes Comput. Sci.
Fast factorization by similarity in formal concept analysis of data with fuzzy attributes.
J. Comput. Syst. Sci.
Construction of the L-fuzzy concept lattice.
Fuzzy Sets Syst.
Cited by (35)
Exploring interactive attribute reduction via fuzzy complementary entropy for unlabeled mixed data
2022, Pattern RecognitionAn intuitionistic fuzzy bireduct model and its application to cancer treatment
2022, Computers and Industrial EngineeringFusing attribute reduction accelerators
2022, Information SciencesEnsemble learning based on approximate reducts and bootstrap sampling
2021, Information SciencesFailure mode and effect analysis: An interval-valued intuitionistic fuzzy cloud theory-based method
2021, Applied Soft Computing
- ☆
Partially supported by the Spanish Science Ministry project TIN2016-76653-P.