Canonical dichotomous direct bases
Introduction
Closure operators play an outstanding role in a wide range of research areas including algebra, topology, logic, computer science, etc. It is well-known that a closure system can be dually presented as sets of implications, called implicational systems. These notions are the pillars for several disciplines such as formal concept analysis (FCA) and lattice theory. Closure operators are systematically used in some important problems, which are solved with large/exponential methods. In the performance of these methods, closure computation has a direct impact.
For example, in the key finding problem, i.e. the enumeration of all minimal keys [7], an exponential number of closures are exhaustively computed. This problem has been enunciated in different areas such as artificial intelligence, databases, logic programming and rough set theory. Minimal keys are related with several notions in different areas: minimal traversals [12] in graph theory, key sets [2] in data mining, key box [25] in description logic and minimal generators [14] in FCA.
In [11], Duquenne motivates the connections among bases of implications, closure operators, reductions and redundancies. Bertet suggested a line of work in [4] where implicational systems are highlighted as convenient tools to handle a closure system. Rudolph shows in [24] that “one central task when dealing with closure operators is to represent them in a succinct way while still allowing for their efficient computational usage”. This author highlights that some algorithms in FCA require an exponential range of closure computations. Although the theoretical computational cost cannot be avoided, its real performance can be improved by efficiently managing closure operators. Moreover, Rudolph poses an open question: “if variants of standard FCA algorithms can be improved by adding the option of working with alternative closure operator representations”. In [5], Bertet et al. established some specific properties of such representations to approach this aim. They maintain that a desirable feature of implicational systems is to be direct and optimal. These two properties ensure efficient management and storage. Directness guarantees that the computation of the closure can be performed in one traversal while optimality search for succinct expressions of the direct implication sets: no implication can be removed without losing such property. Thus, they propose a balance between the cardinality of the set of implications and its efficient management as a closure system.
Developing this line, Adaricheva et al. [1] affirm that “there is an apparent trade-off between the number of implications in the basis and the number of iterations one needs to compute the closures of subsets”. In this sense, these authors propose the so-called D-basis as an alternative to the basis proposed in [5].
Despite the common goodness of the direct basis definitions, their main handicap is the inherent cost of its computation. Thus, a hot topic in this line is the definition of new direct basis whose associated transformation methods have a better performance. In this paper, in order to deal with this problem we introduce a new basis definition together with two methods to compute it.
In the design process to define a new direct basis, we carried out a study of the set of implications, rendering a dichotomous partition of the whole set of implications according to their behavior wrt the closure operator. To avoid an expensive classification cost, we accompany the method with a very efficient criterion to classify each implication. The new basis definition, called dichotomous basis, is strongly based on the separate treatment of both kinds of implications. This basis, as we shall see, belongs to the family of direct basis implying that closures are computed in just one traversal of the implicational set. A fundamental result in our approach is the proof of how each component of the implicational system has a very different behavior in closure systems.
In addition, regarding the hot topic of direct bases computation, dichotomousness also provides an improvement in the building of the new direct basis by dividing the original problem into two smaller, separate ones. Thus, one part will support the hard computation of this construction, whereas the other is carried out almost instantaneously.
As mentioned above, in the literature, we can consider a further step in the direct basis study: the design of a direct basis also taking into account the optimality issue. Thus, the culmination of this work is the introduction of the canonical definition in the set of dichotomous basis. Such a canonical basis is that which has the least size and cardinality among all the dichotomous direct bases.
The work is organized as follows: after background and related works summarized in Sections 2 and 3 (respectively), we introduce the notion of quasi-key (Section 4) being the kernel of a new definition of implicational system called dichotomous implicational system (Section 5). We show that the new implicational system preserves the directness property and, in this way, the low cost of closure methods is retained. Moreover, we introduce the notion of direct dichotomous basis (DD-basis) and illustrate its advantages (Section 6). In Section 7, we present the framework to develop a well-founded method to compute a dichotomous direct basis.
Finally, in Section 8, we introduce the definition of canonical DD-basis, showing its uniqueness and optimality and providing a quadratic method to compute such a basis. In Section 9, we present an empirical study and finalize the paper with a Conclusion and Future Works section.
Section snippets
Background
In this section, we present several definitions and results that will be used throughout the paper.
A closure operator on a set S is a mapping φ: 2S → 2S that is extensive (i.e. A ⊆ φ(A) for each A ⊆ S), isotone (φ(A1) ⊆ φ(A2) for all A1 ⊆ A2 ⊆ S) and idempotent (). A closure system is a pair (S, φ) where φ is a closure operator on S.
Furthermore, a -subsemilattice of (2S, ⊆) containing S is named a Moore family.1
Related works
As mentioned in the introduction, we are interested in the progress of the direct basis issue. Now, we summarize the works which appeared in the literature providing closure operators for some implicational systems defined to be direct.
Quasi-key implications
Before presenting the new kind of direct basis, as a first contribution, we have to characterize the behavior of each implication wrt closure operator execution. Moreover, we need an efficient criterion to identify those implications that will be stored in each part of the dichotomous set to be built in the next section.
The kernel of our criterion is the notion of quasi-key, which is based on the concept of key [6], [17].
Definition 15 Let (S, φ) be a closure system. A set A ⊆ S is a key for φ if . In
Directness and dichotomous sets of implications
This section can be considered as the kernel of this work. Once we have introduced the quasi-key implications, we justify here that these implications have a different behavior with respect to the others when closure computation is carried out. Such a characteristic allows us to make a separate treatment of both kinds of implications and, therefore, to split the implicational system in two well-defined subsets.
Basis and directness: DD-basis.
In the previous section we have focused on the directness property for dichotomous sets of implications. Now, we introduce an alternative direct ‘basis’ definition in this framework.
Definition 20 A dichotomous set of implications ⟨Σ*, Σk⟩ is said to be a dichotomous direct basis, briefly DD-basis, if the following conditions hold:
is idempotent (i.e. it is a closure operator). For all A → B, C → D ∈ Σ*, if A⊊︀C then .
Example 6
Consider the following implicational system on :
A straight method to compute the DD-basis
As previously mentioned, the algorithms that transform any implicational system to a direct basis have non-polynomial cost with respect to the size of the original implicational system. The definition of DD-basis and the following theorem allow a reduction in the size of the input and this fact has a huge repercussion on the cost of the transformation. This is the main advantage of the proposed DD-basis with respect to both alternative direct bases: our approach reduces the size of the subset
Canonical DD-basis
As stated in Corollary 1, there are different equivalent DD-bases and all of them share the same first component of the dichotomous set of implications. In the following example, three equivalent DD-bases are presented to illustrate this situation:
Example 9 The following DD-bases are equivalent:
⟨{a → bc, b → c, ae → d, ce → d, be → d}, {abcdeg → h}⟩. ⟨{a → bc, b → c, ae → d, ce → d, be → d}, {aeg → bcdh, aceg → bdh}⟩. ⟨{a → bc, b → c, ae → d, ce → d, be → d}, {abeg → cdh, aceg → bdh}⟩.
The aim of this
Experimental results
In this work we have introduced two notions of direct basis, DD-basis and canonical DD-basis, and their corresponding methods to compute them. In this section, we compare the computation of three methods to obtain three direct basis: the direct optimal basis (by means of the most efficient algorithm that has appeared in the literature [22]), the DD-basis (using Algorithm 1) and the canonical DD-basis (using Algorithm 1 and then Algorithm 2).
Since there is no implicational system benchmark in
Conclusions and future works
Algorithms for computing closures of attribute sets can be considered as a brick exhaustively used in some complex tasks to work out significant problems in several areas. Although there are algorithms to solve the closure problem with linear cost, due to its exhaustive use in some NP-algorithms, a minor gain in the closure performance entails a major advantage for these complex methods.
A successful way to tackle this problem is the study of directness of the bases so that the closure can be
Acknowledgment
Supported by project TIN2014-59471-P of the Science and Innovation Ministry of Spain, co-funded by the European Regional Development Fund (ERDF).
References (26)
- et al.
Ordered direct implicational basis of a finite closure system
Discrete Appl. Math.
(2013) - et al.
The multiple facets of the canonical direct unit implicational basis
Theor. Comput. Sci.
(2010) - et al.
Knowledge discovery in social networks by using a logic-based treatment of implications
Knowl.-Based Syst.
(2015) - et al.
Some decision and counting problems of the Duquenne-Guigues basis of implications
Discrete Appl. Math.
(2008) On minimal sets of graded attribute implications
Inf. Sci.
(2015)- et al.
Fast computation of concept lattices using data mining techniques
Proceedings Seventh International Workshop on Knowledge Representation Meets Databases
(2000) - et al.
A logic of graded attributes
Arch. Math. Logic
(2015) - et al.
Efficient algorithms on the Moore family associated to an implicational system
Discrete Math. Theor. Comput. Sci.
(2004) - et al.
About keys of formal context and conformal hypergraph
Proceedings of the 6th International Conference on Formal Concept Analysis
(2008) - et al.
A tableaux-like method to infer all minimal keys
Logic J. IGPL
(2014)
Computing left-minimal direct basis of implications
Concept lattice and their applications (CLA)
SLFD Logic: Elimination of Data Redundancy in Knowledge Representation, Lecture Notes in Computer Science, 2527
Some variations on Alan Day’s algorithm for calculating canonical basis of implications
Concept lattice and their applications (CLA): Volume 331 of CEUR Workshop Proceedings, CEUR-WS
Cited by (5)
Quasi-closed elements in fuzzy posets
2022, Journal of Computational and Applied MathematicsCitation Excerpt :In such a case, they are said to be equivalent sets of implications and, if two sets are so, the implications of one of them can be logically inferred from those of the other. In the literature, there are many papers that study sets of implications, called bases, which fulfill a certain property of minimality (in terms of cardinality, size, redundant information, etc.) among the sets equivalent to it [14–17]. The most popular base, introduced in [18], is the so-called canonical base, Duquenne–Guigues base or stem base, which is not only non-redundant but also minimal in terms of their cardinality.
Formal Methods in FCA and Big Data
2022, Complex Data Analytics with Formal Concept AnalysisKnowledge Implications in Multi-adjoint Concept Lattices
2022, Studies in Computational IntelligenceA formal concept analysis approach to cooperative conversational recommendation
2020, International Journal of Computational Intelligence SystemsMinimal generators, an affordable approach by means of massive computation
2019, Journal of Supercomputing