Elsevier

Information Sciences

Volume 376, 10 January 2017, Pages 39-53
Information Sciences

Canonical dichotomous direct bases

https://doi.org/10.1016/j.ins.2016.10.004Get rights and content

Abstract

Closure systems are usually characterized in terms of implications. The directness property of implicational systems is a key issue in their computational usability. In this work we focus on this property, studying its connection with the structure of implicational systems and the design of methods for transforming any implicational system into an equivalent direct implicational system. We introduce a new paradigm based on the bipartition of the implicational sets into two components, according to their behavior wrt the closure. In addition, we present the notions of two new direct bases, named DD-basis and canonical DD-basis, also providing two methods to compute each of them. The advantages of the dichotomous approach will be shown both from the theoretical and empirical points of view.

Introduction

Closure operators play an outstanding role in a wide range of research areas including algebra, topology, logic, computer science, etc. It is well-known that a closure system can be dually presented as sets of implications, called implicational systems. These notions are the pillars for several disciplines such as formal concept analysis (FCA) and lattice theory. Closure operators are systematically used in some important problems, which are solved with large/exponential methods. In the performance of these methods, closure computation has a direct impact.

For example, in the key finding problem, i.e. the enumeration of all minimal keys [7], an exponential number of closures are exhaustively computed. This problem has been enunciated in different areas such as artificial intelligence, databases, logic programming and rough set theory. Minimal keys are related with several notions in different areas: minimal traversals [12] in graph theory, key sets [2] in data mining, key box [25] in description logic and minimal generators [14] in FCA.

In [11], Duquenne motivates the connections among bases of implications, closure operators, reductions and redundancies. Bertet suggested a line of work in [4] where implicational systems are highlighted as convenient tools to handle a closure system. Rudolph shows in [24] that “one central task when dealing with closure operators is to represent them in a succinct way while still allowing for their efficient computational usage”. This author highlights that some algorithms in FCA require an exponential range of closure computations. Although the theoretical computational cost cannot be avoided, its real performance can be improved by efficiently managing closure operators. Moreover, Rudolph poses an open question: “if variants of standard FCA algorithms can be improved by adding the option of working with alternative closure operator representations”. In [5], Bertet et al. established some specific properties of such representations to approach this aim. They maintain that a desirable feature of implicational systems is to be direct and optimal. These two properties ensure efficient management and storage. Directness guarantees that the computation of the closure can be performed in one traversal while optimality search for succinct expressions of the direct implication sets: no implication can be removed without losing such property. Thus, they propose a balance between the cardinality of the set of implications and its efficient management as a closure system.

Developing this line, Adaricheva et al. [1] affirm that “there is an apparent trade-off between the number of implications in the basis and the number of iterations one needs to compute the closures of subsets”. In this sense, these authors propose the so-called D-basis as an alternative to the basis proposed in [5].

Despite the common goodness of the direct basis definitions, their main handicap is the inherent cost of its computation. Thus, a hot topic in this line is the definition of new direct basis whose associated transformation methods have a better performance. In this paper, in order to deal with this problem we introduce a new basis definition together with two methods to compute it.

In the design process to define a new direct basis, we carried out a study of the set of implications, rendering a dichotomous partition of the whole set of implications according to their behavior wrt the closure operator. To avoid an expensive classification cost, we accompany the method with a very efficient criterion to classify each implication. The new basis definition, called dichotomous basis, is strongly based on the separate treatment of both kinds of implications. This basis, as we shall see, belongs to the family of direct basis implying that closures are computed in just one traversal of the implicational set. A fundamental result in our approach is the proof of how each component of the implicational system has a very different behavior in closure systems.

In addition, regarding the hot topic of direct bases computation, dichotomousness also provides an improvement in the building of the new direct basis by dividing the original problem into two smaller, separate ones. Thus, one part will support the hard computation of this construction, whereas the other is carried out almost instantaneously.

As mentioned above, in the literature, we can consider a further step in the direct basis study: the design of a direct basis also taking into account the optimality issue. Thus, the culmination of this work is the introduction of the canonical definition in the set of dichotomous basis. Such a canonical basis is that which has the least size and cardinality among all the dichotomous direct bases.

The work is organized as follows: after background and related works summarized in Sections 2 and 3 (respectively), we introduce the notion of quasi-key (Section 4) being the kernel of a new definition of implicational system called dichotomous implicational system (Section 5). We show that the new implicational system preserves the directness property and, in this way, the low cost of closure methods is retained. Moreover, we introduce the notion of direct dichotomous basis (DD-basis) and illustrate its advantages (Section 6). In Section 7, we present the framework to develop a well-founded method to compute a dichotomous direct basis.

Finally, in Section 8, we introduce the definition of canonical DD-basis, showing its uniqueness and optimality and providing a quadratic method to compute such a basis. In Section 9, we present an empirical study and finalize the paper with a Conclusion and Future Works section.

Section snippets

Background

In this section, we present several definitions and results that will be used throughout the paper.

A closure operator on a set S is a mapping φ: 2S → 2S that is extensive (i.e. A ⊆ φ(A) for each AS), isotone (φ(A1) ⊆ φ(A2) for all A1A2S) and idempotent (φφ=φ). A closure system is a pair (S, φ) where φ is a closure operator on S.

Furthermore, a -subsemilattice of (2S, ⊆) containing S is named a Moore family.1

Related works

As mentioned in the introduction, we are interested in the progress of the direct basis issue. Now, we summarize the works which appeared in the literature providing closure operators for some implicational systems defined to be direct.

Quasi-key implications

Before presenting the new kind of direct basis, as a first contribution, we have to characterize the behavior of each implication wrt closure operator execution. Moreover, we need an efficient criterion to identify those implications that will be stored in each part of the dichotomous set to be built in the next section.

The kernel of our criterion is the notion of quasi-key, which is based on the concept of key [6], [17].

Definition 15

Let (S, φ) be a closure system. A set AS is a key for φ if φ(A)=S.

In

Directness and dichotomous sets of implications

This section can be considered as the kernel of this work. Once we have introduced the quasi-key implications, we justify here that these implications have a different behavior with respect to the others when closure computation is carried out. Such a characteristic allows us to make a separate treatment of both kinds of implications and, therefore, to split the implicational system in two well-defined subsets.

Basis and directness: DD-basis.

In the previous section we have focused on the directness property for dichotomous sets of implications. Now, we introduce an alternative direct ‘basis’ definition in this framework.

Definition 20

A dichotomous set of implications ⟨Σ*, Σk⟩ is said to be a dichotomous direct basis, briefly DD-basis, if the following conditions hold:

  • 1.

    σΣ*,Σk is idempotent (i.e. it is a closure operator).

  • 2.

    For all AB, CDΣ*, if A⊊︀C then BD=.

Example 6

Consider the following implicational system on S={a,b,c,d,e,g}: Σ={ad,ceg,cg

A straight method to compute the DD-basis

As previously mentioned, the algorithms that transform any implicational system to a direct basis have non-polynomial cost with respect to the size of the original implicational system. The definition of DD-basis and the following theorem allow a reduction in the size of the input and this fact has a huge repercussion on the cost of the transformation. This is the main advantage of the proposed DD-basis with respect to both alternative direct bases: our approach reduces the size of the subset

Canonical DD-basis

As stated in Corollary 1, there are different equivalent DD-bases and all of them share the same first component of the dichotomous set of implications. In the following example, three equivalent DD-bases are presented to illustrate this situation:

Example 9

The following DD-bases are equivalent:

  • ⟨{abc, bc, aed, ced, bed}, {abcdegh}⟩.

  • ⟨{abc, bc, aed, ced, bed}, {aegbcdh, acegbdh}⟩.

  • ⟨{abc, bc, aed, ced, bed}, {abegcdh, acegbdh}⟩.

The aim of this

Experimental results

In this work we have introduced two notions of direct basis, DD-basis and canonical DD-basis, and their corresponding methods to compute them. In this section, we compare the computation of three methods to obtain three direct basis: the direct optimal basis (by means of the most efficient algorithm that has appeared in the literature [22]), the DD-basis (using Algorithm 1) and the canonical DD-basis (using Algorithm 1 and then Algorithm 2).

Since there is no implicational system benchmark in

Conclusions and future works

Algorithms for computing closures of attribute sets can be considered as a brick exhaustively used in some complex tasks to work out significant problems in several areas. Although there are algorithms to solve the closure problem with linear cost, due to its exhaustive use in some NP-algorithms, a minor gain in the closure performance entails a major advantage for these complex methods.

A successful way to tackle this problem is the study of directness of the bases so that the closure can be

Acknowledgment

Supported by project TIN2014-59471-P of the Science and Innovation Ministry of Spain, co-funded by the European Regional Development Fund (ERDF).

References (26)

  • P. Cordero et al.

    Computing left-minimal direct basis of implications

    Concept lattice and their applications (CLA)

    (2013)
  • P. Cordero et al.

    SLFD Logic: Elimination of Data Redundancy in Knowledge Representation, Lecture Notes in Computer Science, 2527

    (2002)
  • V. Duquenne

    Some variations on Alan Day’s algorithm for calculating canonical basis of implications

    Concept lattice and their applications (CLA): Volume 331 of CEUR Workshop Proceedings, CEUR-WS

    (2007)
  • Cited by (5)

    • Quasi-closed elements in fuzzy posets

      2022, Journal of Computational and Applied Mathematics
      Citation Excerpt :

      In such a case, they are said to be equivalent sets of implications and, if two sets are so, the implications of one of them can be logically inferred from those of the other. In the literature, there are many papers that study sets of implications, called bases, which fulfill a certain property of minimality (in terms of cardinality, size, redundant information, etc.) among the sets equivalent to it [14–17]. The most popular base, introduced in [18], is the so-called canonical base, Duquenne–Guigues base or stem base, which is not only non-redundant but also minimal in terms of their cardinality.

    • Formal Methods in FCA and Big Data

      2022, Complex Data Analytics with Formal Concept Analysis
    • Knowledge Implications in Multi-adjoint Concept Lattices

      2022, Studies in Computational Intelligence
    • A formal concept analysis approach to cooperative conversational recommendation

      2020, International Journal of Computational Intelligence Systems
    View full text