Elsevier

Information Sciences

Volume 435, April 2018, Pages 26-39
Information Sciences

Bireducts with tolerance relations

https://doi.org/10.1016/j.ins.2017.12.037Get rights and content

Abstract

Reducing the number of attributes by preventing the occurrence of incompatibilities and eliminating existing noise in the original data is an important goal in different frameworks, such as in those focused on modelling and processing incomplete information in information systems. Bireducts were introduced in Rough Set Theory (RST) as one of successful solutions for the problem aimed at achieving a balance between elimination of attributes and characterization of objects that the remaining attributes can still distinguish. This paper considers bireducts in a general framework in which attributes induce tolerance relations over the available objects. In order to compute the new reducts and bireducts a characterization based on a general discernibility function is given.

Introduction

Fuzzy Set Theory (FST) introduced by Zadeh [44] and Rough Set Theory (RST) proposed by Pawlak [35], are complementary approaches to treating imperfect knowledge: the first one allows the elements belonging to a set with a certain degree of truth given, whereas the second one provides approximations of concepts when the available information is incomplete. Specifically, in the absence of exact information about a set, it is represented by a pair of sets, which are the lower approximation and the upper approximation of the set.

Although in the original version proposed by Pawlak, the considered approximations were classical sets, which corresponds to the utilization of equivalence relations while building approximations; there have been introduced some extended variants in which the approximations could be fuzzy sets. Later, the concepts were extended to tolerance relationships and granules. A first definition, the fuzzy rough sets, was given by Fariñas del Cerro and Prade [10], and after this one, numerous hybrid models have been introduced.

Information about the elements of sets that we want to approximate is often expressed by values of some attributes. An important task is to decrease the complexity related to the number of attributes, but without losing ability to produce good approximations. To this end, various types of the so-called reducts which are minimal subsets of attributes preserving various kinds and levels of information were presented and studied in the RST-related literature [8], [17], [29].

Attribute reduction is helpful not only with regard to deriving efficient approximations, but also as a support for construction of clear data-based classification and knowledge representation models. Therefore, there were several attempts to combine RST also with other approaches in this area. As an example of a useful tool in this field, which has become an appealing major research topic from both the theoretical and applied perspectives, let us mention Formal Concept Analysis (FCA) introduced by Wille in [43]. In this case, like in RST, some fuzzy-set-based extensions have been proposed [4], [5], [30]. Moreover, a number of connections between FCA and RST have been identified. In particular, applying the proposed rough set attribute reduction methods can potentially lead toward more powerful FCA models [29], [42].

In this paper, we consider bireducts which extend classical RST-based notions of reducts in order to provide more flexibility in operating with subsets of attributes and subsets of objects that those attributes can efficiently be described [18], [26], [39], [40]. Having a data set organized in a tabular form, the main objective of the bireducts is to reduce the number of attributes by preventing the occurrence of incompatibilities and eliminating existing noise in the original data.

Throughout the work, we refer to information reducts and information bireducts, as well as to decision reducts and decision bireducts. According to RST nomenclature, data collected within the information systems are used to describe general knowledge with respect to a number of attributes, while the decision systems contain a distinguished decision column in which an action, prediction or classification to be taken is determined depending on the multivalued attributes which each of the objects has.

Similarity and tolerance relations have been considered in the literature in order to provide a natural relationship of distance or proximity among the elements considered in the framework [1], [7], [12], [19], [23]. In different cases, a tolerance relation can be more appropriate since, for instance, the transitivity constraints imposed by similarity relations may produce conflicts with user’s specifications or the exclusive use of similarity relations may cause wrong modeling of vague information.

It is worth emphasizing that both theoretical and practical results related to bireducts were, up to now, focused on equivalence relations modeling categorical data. This paper is the first comprehensive attempt to extend those results towards the cases of tolerance relations and, eventually, structures generated by fuzzy similarities.

The obtaining of bireducts in an environment that considers tolerance relations, is more complex but also more applicable in real-world scenarios. Therefore, in this paper we study representations of bireducts both in the classical case and in situations when the notion of equality is weakened towards tolerance. Usually people consider distance-based tolerance relations. The results presented in this paper are more general and can also hold for other tolerance relations, as the ones defined from fuzzy tolerance relations and different thresholds, which will be also considered throughout this paper. It also allows to draw a stronger relationship between tolerance-based extensions developed within RST and FCA.

There are other mechanisms to reduce data sets given in a tabular form. For example, the association between the objects and attributes of a database can be reflected by biclustering, this mechanism aims at group data according to some measure of similarity or distance. Specifically, biclustering enables simultaneous clustering of a two-dimensional matrix. Indeed, the problem of obtaining all non-extendable exact biclusters of a database can be considered equivalent to the problem in FCA of obtaining all concepts of the concept lattice associated with a formal context. In fact, both theories have been applied to the analysis of gene expression data obtaining interesting results [21], [27] and there exist works that relate biclustering to FCA as [20]. However, it is worth noting that biclusters and bireducts are sort of dual concepts, like closures and their minimal generators. Biclusters (just like concepts in FCA) attempt to describe maximally areas of the data (both with respect to objects and attributes). On the other hand, bireducts attempt to describe maximally heterogeneous areas of the data (by means of objects) using minimal number of attributes.

The paper is organized as follows: in Section 2, we summarize several definitions we will use throughout the paper. Then, we generalize the notions of reduct and bireduct considering tolerance relations over the conditional attributes and over the decision attribute in Section 3, introducing an extra flexibility level since a general family of tolerance relations is considered in these definitions. Section 4 presents the corresponding characterizations of the new reducts and bireducts based on a generalization of the discernibility function. The introduced results are applied to a general decision system in Section 5. Finally, the conclusion and future work section is included.

Section snippets

Preliminaries

In this paper the classical theory of propositional logic will be considered in order to interpret the expression of the discernibility function. Hence, several basic notions of propositional logic will be recalled.

First of all, an alphabet A is formed by a numerable set of symbols or propositional variables: Π={p1,q1,r1,p2,q2,r2,,pn,qn,rn,}as well as the constant symbols ⊤ and ⊥, the symbols ¬, ∧, ∨,  →  and ↔, which are called connectives or logical operators, and the punctuation symbols

(Bi)Reducts over tolerance-based conditional attributes

As we previously mentioned, all the results corresponding to bireducts have hitherto been presented considering equivalence relations. In this work, we will extend those results by using tolerance and fuzzy similarity relations. Therefore, in this section, we will introduce the new necessary definitions corresponding to reducts and bireducts in the proposed framework with tolerance relations.

First of all, the definition of information reduct is introduced.

Definition 8

The set BA is called E-information

Characterizing reducts and bireducts

In this section, in order to ease the calculation of decision reducts and bireducts, we will present some results to characterize these notions. The discernibility function will be the main tool we will use for computing both decision reducts and bireducts, considering fuzzy relations. This function is based on the elements of the discernibility matrix of (U, A{d}), where U={x1,,xn}, and it is defined in this framework, for i and j in {1,,n}, as follows: O(xi,xj)={if(d(xi),d(xj))Rd{aA(a(x

A worked example

Finally, the following example applies the results to a general decision system in which the value set associated with the decision attribute is not boolean. We adopt the decision system A=(U,A{d}) presented in Example 3, with the same set of objects and attributes but, here, the decision attribute is replaced by what kind of activity does each object. The following table shows the relationship between objects and attributes.

Empty CellOutlookTemp.Humid.WindActivity?
1sunnyhothighweakrun
2sunnyhothighstrong

Conclusions and future work

We have studied the reducts and bireducts in the classical environment of RST considering tolerance relations. We have generalized the classical discernibility function notion, from which we have characterized the reducts and bireducts in these environments, providing a linear procedure for computing one reduct of bireduct and a mechanism for computing all of them. The computation of all reducts and bireducts is NP-hard, but the relation to RDNFs also provides the possibility of using various

References (44)

  • W. Lipski

    On databases with incomplete information.

    J. ACM

    (1981)
  • S.C. Madeira et al.

    Biclustering algorithms for biological data analysis: a survey.

    IEEE/ACM Trans. Comput. Biol. Bioinf.

    (2004)
  • J. Medina

    Multi-adjoint property-oriented and object-oriented concept lattices.

    Inf. Sci. (Ny)

    (2012)
  • Z. Pawlak

    Information systems theoretical foundations.

    Inf. Syst.

    (1981)
  • Z. Pawlak

    Rough sets

    Int. J. Comput. Inf. Sci.

    (1982)
  • Z. Pawlak et al.

    Rough sets and boolean reasoning.

    Inf. Sci. (Ny)

    (2007)
  • S. Stawicki et al.

    Recent advances in decision bireducts: complexity, heuristics and streams.

    Lect. Notes Comput. Sci.

    (2013)
  • S. Stawicki et al.

    Decision bireducts and decision reducts–a comparison.

    Int. J. Approx. Reason.

    (2017)
  • WeiL. et al.

    Relation between concept lattice reduction and rough set reduction.

    Knowl. Based Syst.

    (2010)
  • R. Bělohlávek et al.

    An efficient reasoning method for dependencies over similarity and ordinal data.

    Lect. Notes Comput. Sci.

    (2012)
  • R. Bělohlávek et al.

    Fast factorization by similarity in formal concept analysis of data with fuzzy attributes.

    J. Comput. Syst. Sci.

    (2007)
  • A. Burusco et al.

    Construction of the L-fuzzy concept lattice.

    Fuzzy Sets Syst.

    (1998)
  • Cited by (35)

    • Fusing attribute reduction accelerators

      2022, Information Sciences
    View all citing articles on Scopus

    Partially supported by the Spanish Science Ministry project TIN2016-76653-P.

    View full text