The complexity and generality of learning answer set programs

both ILP sm and ILP c . Although ILP sm is equally D 11 -general to ILP b , we show that ILP sm is more general than ILP b under the one-to-many and many-to-many generality measures


Introduction
Over the last two decades there has been a growing interest in Inductive Logic Programming (ILP) [1], where the goal is to learn a logic program called a hypothesis, which together with a given background knowledge base, explains a set of examples. The main advantage that ILP has over traditional statistical machine learning approaches is that the learned hypotheses can be easily expressed in plain English and explained to a human user, so facilitating a closer interaction between humans and machines. Traditional ILP frameworks have focused on learning definite logic programs [1][2][3][4][5][6] and normal logic programs [7,8]. On the other hand, Answer Set Programming [9] is a powerful language for knowledge representation and reasoning. ASP is closely related to other declarative paradigms such as SAT, SMT and Constraint Programming, which have each been used for inductive reasoning [10][11][12]. Compared with these other paradigms, due to its non-monotonicity, ASP is particularly suited for common-sense reasoning [13][14][15]. Because of its expressiveness and efficient solving, ASP is also increasingly gaining attention in industry [16]; for example, in decision support systems [17], in e-tourism [18] and in product configuration [19]. Consequently, the scope of ILP has recently been extended to learning answer set programs from examples of partial solutions of a given problem, with the intention being to provide algorithms that support automated learning of complex declarative knowledge. Learning ASP programs allows us to learn a variety of declarative non-monotonic, common-sense theories, including for instance Event Calculus [20] theories [21] and theories for scheduling problems and agents' preference models, both from real user data [22] and from synthetic data [23,24].
Learning ASP programs has several advantages when compared to learning Prolog programs. Firstly, when learning Prolog programs, the goal directed SLDNF procedure of Prolog must be taken into account. Specifically, when learning programs with negation, it must be ensured that the programs are stratified, or otherwise the learned program may loop under certain queries. As ASP is declarative, no such consideration need be taken into account when learning ASP programs. A second, more fundamental advantage of learning ASP programs, is that the theory learned can be expressed using extra types of rules that are not available in Prolog, such as choice rules and weak constraints. Learning choice rules allows us to learn non-deterministic concepts; for instance, we may learn that a coin may non-deterministically land on either heads or tails, but never both. This could be achieved by learning the simple choice rule 1{heads, tails}1. Learning choice rules is different from probabilistic ILP settings such as [25][26][27] where, in similar coins problems the focus would be on learning the probabilities of the two outcomes of are coin. Learning weak constraints enables a natural extension of ILP to preference learning [23], which has resulted to be effective in problem domains such as learning preference models for scheduling [23] and for urban mobility [24].
Several algorithms, aimed at learning under the answer set semantics, and different frameworks for learning ASP programs have been recently introduced in the literature. [28] presented the notions of brave induction (I L P b ) and cautious induction (I L P c ), based respectively on the well established notions of entailment under the answer set semantics [13,29] of brave entailment (when an atom is true in at least one answer set) and cautious entailment (when and an atom is true in all answer sets). In brave induction, at least one answer set must cover the examples, whereas in cautious induction, every answer set must cover the examples. Brave induction is actually a special case of an earlier learning framework, called induction of stable models (I L P sm ) [30], in which examples are partial interpretations. A hypothesis is a solution of an induction of stable models task if for each of the example partial interpretations, there is an answer set of the hypothesis combined with the background knowledge, that covers that partial interpretation. Brave induction is equivalent to induction of stable models with exactly one (partial interpretation) example.
Each of the above frameworks for learning ASP programs is unable to learn some types of ASP programs [31]; for example, brave induction alone cannot learn programs containing hard constraints. In [31], we presented a learning framework, called Learning from Answer Sets (I L P L A S ), which unifies brave and cautious induction and is able to learn ASP programs containing normal rules, choice rules and hard constraints. In spite of the increased expressivity, none of the above approaches can learn weak constraints, which are able to capture preference learning. Informally, learning weak constraints consists on identifying conditions for ordering answer sets. The learning task in this case would require examples of orderings over partial interpretations. To tackle this aspect of learning ASP programs, we have extended the Learning from Answer Sets framework to Learning from Ordered Answer Sets (I L P L O A S ) [23] and demonstrated that our algorithm 1 is able to learn preferences in a scheduling domain. More recently, we have extended the I L P L O A S framework to I L P context L O A S , with context-dependent examples, which come together with extra contextual information [24].
In this paper, we explore both the expressive power and the computational complexity of each framework. The former is important, as it allows us to identify the class of problems that each framework can solve, whereas the latter gives an indication of the price paid for using each framework. We characterise the expressive power of a framework in terms of new notions called one-to-one-distinguishability, one-to-many-distinguishability and many-to-many-distinguishability. The intuition of one-to-one-distinguishability is that, given some fixed background knowledge B and sufficient examples, the framework should be able to distinguish a target hypotheses H 1 from another, unwanted, hypotheses H 2 . This means that there should be at least one task T (of the given framework) with background knowledge B, such that H 1 is a solution of T , and H 2 is not. We characterise the one-to-one-distinguishability class of a framework F (written D 1 1 (F )) as the set of tuples B, H 1 , H 2 for such B's, H 1 's and H 2 's, and state that a framework F 1 is more D 1 1 -general than another F 2 if F 2 's one-to-one-distinguishability class is a strict subset of F 1 's one-to-one-distinguishability class.
One-to-many-distinguishability relates to the task of finding a single target hypothesis from within a set of possible hypotheses. It upgrades the notion of one-to-one-distinguishability classes to one-to-many-distinguishability classes. These are tuples of the form B, H, S for which a framework has at least one task that includes H and none of the (unwanted) hypotheses in S as an inductive solution. Many-to-many-distinguishability upgrades this notion to many-tomany-distinguishability classes. These contain tuples of the form B, S 1 , S 2 , where S 1 is a set of target hypotheses, for which a framework must have a task that accepts each hypothesis in S 1 and no hypothesis in S 2 as inductive solution. We show that, under these three measures, I L P context L O A S is more general than I L P L O A S , which is more general than I L P L A S . We also show that I L P L A S is more general than both I L P sm and I L P c . Although I L P sm is equally D 1 Despite the different generalities of I L P c , I L P L A S , I L P L O A S and I L P context L O A S , we show that the computational complexity of all four frameworks is the same, both for the decision problem of verifying that a given hypothesis is a solution of a given learning task, and for the problem of deciding whether a given learning task has any solutions. Similarly, we also show that I L P sm and I L P b have the same computational complexities for both decision problems, despite the former being more general than the latter under two of our generality measures.
We begin, in Section 2, by reviewing the background material necessary for the rest of the paper. In Section 3 we recall the definitions of each of the learning frameworks and in Sections 4 and 5 we prove the complexities and generalities (respectively) of each learning framework. We conclude the paper with a discussion of the related and future work.

Answer Set Programming
In this section we introduce the concepts needed in the paper. Given any atoms h, h 1 , . . . , h k , b 1 , . . . , b n , c 1 , . . . , c m , h :-b 1 , . . . , b n , not c 1 , . . . , not c m is called a normal rule, with h as the head and b 1 , . . . , b n , not c 1 , . . . , not c m (collectively) as the body ("not" represents negation as failure); a rule :-b 1 , . . . , b n , not c 1 , . . . , not c m , with an empty head, is a hard constraint; a choice rule is a rule l{h 1 , . . . , h k }u ← b 1 , . . . , b n , not c 1 , . . . , not c m (where l and u are integers) and its head is called an aggregate. A rule R is safe if each variable in R occurs in at least one positive literal in the body of R. In this paper we will use ASP ch to denote the set of choice programs P , which are programs composed of safe normal rules, choice rules, and hard constraints. Given a rule R, we will write head(R) to denote the head of R, body(R) to denote the body of R and body + (resp. body − (R)) to denote the atoms that occur positively (resp. negatively) in the body of R. Given a program P , we will also write Atoms(P ) to denote the atoms in P . We will also extend this notation to fragments of a program.
The Herbrand Base of any program P ∈ ASP ch , denoted H B P , is the set of variable free (ground) atoms that can be formed from predicates and constants in P . The subsets of H B P are called the (Herbrand) interpretations of P . A ground aggregate l{h 1 , . . . , h k }u is satisfied by an interpretation I iff l ≤ |I ∩ {h 1 , . . . , h k }| ≤ u.
As we restrict our programs to sets of normal rules, (hard) constraints and choice rules, we can use the simplified definitions of the reduct for choice rules presented in [33]. Given a program P and an Herbrand interpretation I ⊆ H B P , the reduct P I is constructed from ground(P ) (the set of ground instances of rules in P ) in 4 steps: firstly, remove rules whose bodies contain the negation of an atom in I ; secondly, remove all negative literals from the remaining rules; thirdly, replace the head of any hard constraint, or any choice rule whose head is not satisfied by I with ⊥ (where ⊥ / ∈ H B P ); and finally, replace any remaining choice rule l{h 1 , . . . , h m }u:b 1 , . . . , b n with the set of rules {h i :-b 1 , . . . , b n | h i ∈ I ∩ {h 1 , . . . , h m }}. Any I ⊆ H B P is an answer set of P if it is the minimal model of the reduct P I . Throughout the paper we denote the set of answer sets of a program P with A S(P ).
We say a program P bravely entails an atom a (written P |= b a) if there is at least one answer set A of P such that a ∈ A. Similarly, P cautiously entails a (written P |= c a) if for every answer set A of P , a ∈ A.
Unlike hard constraints in ASP, weak constraints do not affect what is, or is not, an answer set of a program P . Hence the above definitions also apply to programs with weak constraints. Weak constraints create an ordering over A S(P ) specifying which answer sets are "preferred" to others. A weak constraint is of the form :∼ b 1 , . . . , b n , not c 1 , . . . , not c m .[w@l, t 1 , . . . , t k ] where b 1 , . . . , b n , c 1 , . . . , c m are atoms, w and l are terms specifying the weight and the level, and t 1 , . . . , t k are terms. A weak constraint W is safe if every variable in W occurs in at least one positive literal in the body of W . At each priority level l, the aim is to discard any answer set which does not minimise the sum of the weights of the ground weak constraints with level l whose bodies are true. The higher levels are minimised first. The terms t 1 , . . . , t k specify which ground weak constraints should be considered unique [34]. For any program P and an interpretation A, weak(P , A) is the set of tuples (w, l, t 1 , . . . , t k ) for which there is some :∼ b 1 , . . . , b n , not c 1 , . . . , not c m .[w@l, t 1 , . . . , t k ] in ground(P ) such that A satisfies b 1 , . . . , b n , not c 1 , . . . , not c m .
For each level l, the score of the interpretation A is the sum of the weights of tuples with level l, formally P l A = (w,l,t 1 ,...,t k )∈weak(P ,A) w. For A 1 , A 2 ∈ A S(P ), A 1 dominates A 2 (written A 1 P A 2 ) iff ∃l such that P l A 1 < P l A 2 and ∀m > l, P m . An answer set A ∈ A S(P ) is optimal if it is not dominated by any A 2 ∈ A S(P ). The first weak constraint states that if any of the p atoms is true then a penalty of one must be paid. This penalty is only paid once, regardless whether 1, 2 or 3 of the p atoms are true. Conversely, the second weak constraint says that a penalty of 1 must be paid for each of the p atoms that is true. In both cases, ∅ is the only optimal answer set; however, in the first case, none of the remaining answer sets dominate each other, whereas in the second case, the answer sets with only one p atom dominate those with 2 p atoms, which in turn each dominate the single answer set with 3 p atoms.
Note that the definition of weak constraints used in this paper is in line with the recent ASP standard established in [34].
The syntax of some previous definitions of weak constraints such as [13] do not include the terms t 1 , . . . , t k and considered every ground instance of every weak constraint individually. This semantics can be achieved using the notion of weak constraints in [34]. Any weak constraint :∼ body. Unless otherwise stated, when we refer to an ASP program in this paper, we mean a program consisting of a finite set of normal rules, choice rules, hard and weak constraints.
We now introduce some extra notation which will be useful in later sections. Given a set of interpretations S, the set ord(P , S) captures the ordering of the interpretations given by the weak constraints in P . It generalises the dominates relation; so it not only includes A 1 , A 2 , < if A 1 P A 2 , but it also includes tuples for other binary comparison operators. Formally, Given an ASP program, we write ord(P ) as a shorthand for ord(P , A S(P )). Two ASP programs P and Q are strongly equivalent (written P ≡ s Q ) if for every ASP program R, A S(P ∪ R) = A S(Q ∪ R).
We now recall the splitting set theorem from [35], which we use in the proofs throughout the paper. This theorem relies on the notions of a splitting set and the partial evaluation of a logic program. Given a program P , a set U ⊆ H B P is a splitting set of P if and only if for every rule R ∈ ground(P ) such that Atoms(head(R)) ∩ U = ∅, Atoms(R) ⊆ U . Given a ground rule R and a set of atoms U , we write R\U to denote the rule R with all (positive or negative) occurrences of atoms in U removed from the body of R. Given a program P a splitting set U of P and a set X ⊆ U , the partial evaluation of P with respect to U and X , written

Theorem 1. Given any ground ASP program P , and splitting set U of P , A S(P
The intuition behind the splitting set theorem is that if a set of atoms U is known to split the program P , then we can find the answer sets of the subprogram that defines the atoms in U first. For each of these answer sets X , we can partially evaluate P using X and solve this partially evaluated program for answer sets. The splitting set theorem then guarantees that for each answer set Y of the partially evaluated program, X ∪ Y is an answer set of P . Furthermore, every answer set of P can be constructed in this way.

Complexity theory
We assume the reader is familiar with the fundamental concepts of complexity, such as Turing machines and reductions; for a detailed explanation, see [36].
Many of the decision problems for ASP are known to be complete for classes in the polynomial hierarchy [37]. The classes of the polynomial hierarchy are defined as follows: P is the class of all problems which can be solved in polynomial time by a Deterministic Turing Machine (DTM); P 0 = P 0 = P 0 = P ; P k+1 = P P k is the class of all problems which can be solved by a DTM in polynomial time with a P k oracle; P k+1 = N P P k is the class of all problems which can be solved by a non-deterministic Turing Machine in polynomial time with a P k oracle; finally, P k+1 = co-N P P k is the class of all problems whose complement can be solved by a non-deterministic Turing Machine in polynomial time with a P k oracle. P 1 and P 1 are N P and co-N P (respectively), where N P is the class of problems which can be solved by a non-deterministic Turing machine in polynomial time and co-N P is the class of problems whose complement is an N P problem.
D P is the class of problems D that can be mapped to a pair of problems D 1 and D 2 such that D 1 ∈ N P , D 2 ∈ co-N P , and for each instance I of D, I answers "yes" if and only if both of the mapped instances I 1 and I 2 (of D 1 and D 2 , respectively) answer "yes". It is well known [36] that the following inclusions hold: P ⊆ N P ⊆ D P ⊆ P 2 ⊆ P 2 and P ⊆ co-N P ⊆ D P ⊆ P 2 ⊆ P 2 .

Learning frameworks
In this section, we give the definitions of the six learning frameworks we analyse in this paper. The first three -brave induction, cautious induction and induction of stable models -are not our own. We reformulate, but preserve the meaning of, the original definitions for easier comparison with our own. It is common in ILP for a task to have a hypothesis space (the set of all rules which can appear in hypotheses). The purpose of the hypothesis space is two-fold: firstly, it allows the task to be restricted to those solutions which are in some way interesting; secondly, it aids the computational search for inductive solutions. Tasks for brave and cautious induction and for induction of stable models were originally presented with no hypothesis space [28,30] as they were mainly considered theoretically without the specifications of efficient algorithmic computations. The only publicly available algorithms for brave induction [38,39] make use of a hypothesis space defined by mode declarations [40]. In this paper, we "upgrade" each of brave induction, cautious induction and induction of stable models with a hypothesis space S M .

Notation and terminology
An ILP learning framework F defines what a learning task of F is and what an inductive solution is for a given learning task of F . For each framework a task is a tuple B, S M , E , where B is an ASP program called the background knowledge, S M is a set of ASP rules called the hypothesis space, and E is a tuple called the examples. The structure of E depends on the type of ILP framework. Each of the papers [28], [30], [31] and [23] presented learning frameworks with different languages for B and S M ; for example, induction of stable models was presented only for normal logic programs. It would be unfair to say that induction from stable models is not general enough to learn programs with choice rules, simply because they were not considered in the original paper (in fact, induction from stable models is general enough to learn some programs with choice rules). For a fair comparison we therefore assume in this paper that every learning framework has a background knowledge B and hypothesis space S M that consist of normal rules, choice rules, hard constraints and weak constraints.
Given a framework F and a learning task T F = B, S M , E of F , a hypothesis is any subset of the hypothesis space S M .
In Section 5, we consider tasks with unrestricted hypothesis spaces (written B, E ), in which case any ASP program can be called a hypothesis. An inductive solution is a hypothesis that, together with the background knowledge B, satisfies some conditions on E (given by the particular learning framework F ). We write I L P F (T F ) to denote the set of all inductive solutions of T F . Throughout the paper, we use the term covers to apply to any kind of example: i.e. given a F task B, S M , E , we say that a hypothesis H covers an example e (any element of any component of E), if it meets the particular conditions that the framework F puts on H and e.

Framework definitions
Brave induction (I L P b ), first presented in [28], defines an inductive task in which all examples are ground atoms that should be covered in at least one answer set, i.e. entailed under brave entailment in ASP. The original definition did not consider atoms which should not be present in an answer set, namely negative examples. The two publicly available algorithms that realise brave induction, on the other hand, do allow for negative examples. We therefore upgrade the definition in this paper to allow negative examples 3 as follows.
Cautious induction (I L P c ) was also first presented in [28]. It defines an inductive task where all of the examples should be covered in every answer set (i.e. entailed under cautious entailment in ASP) and that B ∪ H should be satisfiable (have at least one answer set). Similarly to brave induction, the original definition did not consider negative examples, but in Definition 2 we upgrade the framework to include negative examples.
Brave induction alone can only reason about what should be true (or false) in a single answer set of B ∪ H . It cannot specify other brave tasks such as enforcing that two atoms are both bravely entailed, but not necessarily in the same answer set. Induction of stable models [30] (I L P sm ), on the other hand, generalises the notion of brave induction as shown in Definition 4. The following terminology is first introduced.

Definition 3.
A partial interpretation e is a pair of sets of ground atoms e inc , e exc . An interpretation I is said to extend e iff e inc ⊆ I and e exc ∩ I = ∅. Note that a brave induction task can be thought of as a special case of induction of stable models, with exactly one (partial interpretation) example.
We now consider the Learning from Answer Sets framework introduced in [31]. This is the first framework capable of unifying the concepts of brave and cautious induction. The idea is to use examples of partial interpretations which should or should not be extended by answer sets of B ∪ H .
Note that this definition combines properties of both the brave and cautious semantics: the positive examples must each be bravely entailed, whereas the negation of each negative example must be cautiously entailed.

Example 2.
Consider an I L P L A S learning task whose background knowledge B contains definitions of the structure of a 4x4 Sudoku board; i.e. definitions of cell, same_row, same_col and same_block (where same_row, same_col and same_block are true only for two different cells in the same row, column or block). 2)). . . . cell((4, 4)). same_row((X1, Y), (X2, Y)):-cell((X1, Y)), cell((X2, Y)), X1 = X2. same_col((X, Y1), (X, Y2)):-cell((X, Y1)), cell((X, Y2)), Y1 = Y2. block((1, 1), 1). block((1, 2), 1). block((2, 1), 1). block((2, 2), 1). block((3, 1), 2). block((3, 2), 2). block((4, 1), 2). block((4, 2), 2).  We need to be able to say that there should be at least one answer set that assigns a value to a cell, or otherwise the empty hypothesis would be sufficient. This is captured by our positive example which causes at least one of the choice rules to be part of a solution in order to be covered. Our first three negative examples require the three constraints to be also included in a solution. Without each one of these negative examples, at least one constraint could be left out of the solution. The fourth negative example means that the upper bound of the counting aggregate in the choice rule must be 1, as otherwise there would be answer sets in which cell (1, 1) was assigned to both 1 and 2. Finally, the fifth negative example forces that the lower bound of the choice rule should be 1 as otherwise there would be answer sets in which (1, 1) was not assigned to any of the values between 1 and 4. Hence, one possible inductive solution is: Note that we need I L P L A S 's combination of brave and cautious induction to separate the correct hypothesis from the incorrect hypotheses.
, any such answer set is also an answer set of B ∪ H ; and hence, H is also a solution of the task. we show that this extension allows us to learn a wider class of programs. We now define the notion of Learning from Ordered Answer Sets (I L P L O A S ).

Definition 7. A Learning from Ordered Answer Sets task is a tuple
Note that the orderings are only over positive examples. We chose to make this restriction as there does not appear to be any scenario where a hypothesis would need to respect orderings which are not extended by any pair of answer sets of • B = {0{p, q}2.} • S M is unrestricted (i.e. S M is the set of all normal rules, choice rules and hard and weak constraints).
where e + 1 = {p}, ∅ and e + 2 = ∅, {p} The positive examples of this task are already satisfied by the background knowledge, which has the answer sets ∅, {p}, {q} and {p, q}. As there are no negative examples, it remains to find a set of weak constraints such that there is at least one answer set which contains p which is preferred to at least one answer set which does not contain p and all answer sets which contain p are equally optimal.
When examples are given with empty contexts, they are equivalent to examples in I L P L O A S . Note also that contexts do not contain weak constraints. In fact, the operator P defines the ordering over two answer sets based on the weak constraints in one program P . So, given a CDOE e 1 , C 1 , e 2 , C 2 such that C 1 and C 2 contain different weak constraints, it is not clear which program to consider for computing the ordering of answer sets -i.e. whether they should be checked against the weak constraints in P , P ∪ C 1 , P ∪ C 2 or P ∪ C 1 ∪ C 2 .
We now present a formal definition of the I L P context L O A S framework.
In [24], we showed that context-dependent examples could be used to simplify the encoding of certain tasks, by splitting the background knowledge into contexts that were only relevant to particular examples. Although any I L P context L O A S task can be transformed into an I L P L O A S task, in general this requires parts of the examples to be encoded in the background knowledge. Example 4 shows such a transformation. at least one answer set containing beep, and when combined with the second example would have at least one answer set not containing beep. If we were expressing the same task in I L P L O A S the above two scenarios would be represented considering the background knowledge:

Table 2
A summary of the complexity of the various learning frameworks.

Framework
Complexity of verification Complexity of deciding satisfiability allow us instead to separate information that is truly background knowledge, which applies in all scenarios, from information that is part of a particular example.

Systems for learning under the answer set semantics
The current publicly available systems for ILP can be categorised according to the 6 frameworks presented in this section (Table 1). It should be noted that although there are no systems which directly solve I L P c or I L P sm tasks, both can be simply translated into I L P L A S tasks, and can therefore be solved by the ILASP system.
The ILED [21] system is an incremental extension of XHAIL, that is specifically targeted at learning Event Calculus [20] theories. The underlying mechanism is based on brave induction, but each of its examples are in terms of two sequential time points.

Complexity
In this section, we discuss the complexity of each of the learning frameworks presented in Section 3 with respect to two decision problems: verification, deciding whether a given hypothesis H is an inductive solution of a task T ; and satisfiability, deciding whether a learning task T has any inductive solutions. A summary of the results is shown in Table 2. To aid readability, the proofs of the propositions stated in this section are given in appendix. All complexities discussed in this section are for propositional versions of the frameworks (both the background knowledge and hypothesis space of each learning task is ground).

Learning from answer sets with stratified summing aggregates
As there are existing results on the complexity of solving aggregate stratified programs, it is useful to introduce a new learning framework I L P s L A S , which is a generalization of I L P L A S , that allows summing aggregates in the bodies of rules, as long as they are stratified. The existing results on the complexity of these programs then allow us to prove the complexity of I L P s L A S . Hence, as we can show that I L P L O A S reduces to I L P s L A S , this is helpful in proving the complexity of I L P L O A S .
A summing aggregate s is of the form l#sum{a 1 = w 1 , . . . , a n = w n }u, where l, u and w 1 , . . . , w n are integers and a 1 , . . . , a n are atoms. s is satisfied by an interpretation I if and only if l ≤ .n], a i ∈ I}. We now recall the definition of aggregate stratification from [44]. We slightly simplify the definition by considering only propositional programs without disjunction.

Definition 10.
A propositional logic program P , in which aggregates occur only in bodies of rules, is stratified on an aggregate agg if there is a level mapping from Atoms(P ) to ordinals, such that for each rule R ∈ P , the following holds: 1. ∀b ∈ Atoms(body(R)) : ||b|| ≤ ||head(R)|| 2. If agg ∈ body(R), then ∀b ∈ Atoms(agg) : ||b|| < ||head(R)|| P is said to be aggregate stratified if it is stratified on every aggregate in P .
The intuition is that aggregate stratification forbids recursion through aggregates. In general aggregate stratified programs have a lower complexity than non-aggregate stratified programs. Aggregate stratification has nothing to do with negation as failure, and therefore, whether a program is aggregate stratified is unrelated to whether it is stratified in the usual sense. Note that constraints and choice rules can be added in to any aggregate stratified program without breaking stratification so long as no atoms in the head of the choice rule are on a lower level than any atom in the body. This is illustrated by the following example. Provided the previous program was aggregate stratified, then this new one is too. To avoid constantly using this mapping, we will refer to programs with choice rules and constraints as also being aggregate stratified. Lemma 1. [44] Deciding whether an aggregate stratified propositional program without disjunction cautiously entails an atom is co-N P -complete.

Corollary 1. Deciding whether an aggregate stratified propositional program without disjunction bravely entails an atom is N P -complete.
Proof. We first show that deciding whether an aggregate stratified propositional program without disjunction bravely entails an atom is in N P . We do this by showing that there is a polynomial reduction from this problem to the complement of the problem in Lemma 1 (which by definition of co-N P must be in N P ). The complement of the problem in Lemma 1 is deciding whether a non disjunctive aggregate stratified program does not cautiously entail an atom. Take any non-disjunctive aggregate stratified program P and any atom a and let neg_a be an atom that does not occur in P . P |= b a if and only if P ∪ {neg_a:not a.} |= c neg_a. So the decision problem is in N P .
It remains to show that deciding whether an aggregate stratified propositional program without disjunction bravely entails an atom is N P -hard. We do this by showing that any problem in N P can be reduced in polynomial time to deciding the satisfiability of an aggregate stratified propositional program without disjunction.
Consider an arbitrary N P problem D. The complement of D, D , must be in co-N P (by definition of co-N P ). Hence, by Lemma 1, there is a polynomial reduction from D to deciding whether an aggregate stratified propositional program without disjunction cautiously entails an atom. We define the polynomial reduction from D to deciding whether an aggregate stratified propositional program without disjunction bravely entails an atom as follows: for any instance I of D, let P and a be the program and atom given by the polynomial reduction from the complement of I to deciding cautious entailment; define P as the program P ∪ {neg_a:not a.} (where neg_a is a new atom). I returns true if and only if P |= c a if and only if P |= b neg_a. Hence, as P is still aggregate stratified (the new atom neg_a can be put in the top strata), this is a polynomial reduction from D to deciding whether an aggregate stratified propositional program without disjunction bravely entails an atom. Hence, the decision problem is N P -hard. 2 We can now introduce our extra learning task, Learning from Answer Sets with Stratified Aggregates (I L P s L A S ). It is the same as Learning from Answer Sets, except for allowing summing aggregates in the bodies of the rules in B and S M , as long as

Relationships between the learning tasks
In this section we prove for both decision problems that I L P b and I L P sm both reduce to each other polynomially. We also show that for both decision problems there is a chain of polynomial reductions from I L P c to I L P L A S to I L P context L O A S to I L P L O A S to I L P s L A S . This chain of reductions is then used in proving that all four tasks share the same complexity for both decision problems. By proving that I L P c is O-hard and I L P s L A S is in O for some complexity class O, we prove that all four tasks are O-complete. Similarly as I L P b and I L P sm both reduce polynomially to each other for both decision problems, if for one of the problems I L P b is O-complete for some class then so is I L P sm . The chains of reductions are shown in Fig. 1.
Proposition 1 shows that the complexity of I L P b and I L P sm coincide for both decision problems. Proposition 1.

Proposition 2 shows that there is a chain of polynomial reductions from I L P c to I L P L A S to I L P L O A S to I L P context L O A S to I L P s L A S for both decision problems.
Proposition 2.

Complexity of deciding verification and satisfiability for each framework
For each of the learning frameworks, we prove the complexity of deciding verification and satisfiability. We start with the I L P b and I L P sm frameworks, for which both decision problems are N P -complete.

Corollary 3. Deciding the satisfiability of a general I L P sm task is N P -complete.
We have now proven the complexity of deciding verification and satisfiability for I L P b and I L P sm , proving the corresponding entries in Table 2. It remains to show the complexities for I L P c , I L P L A S , I L P L O A S and I L P context L O A S . As we have shown that I L P c reduces to I L P L A S which, in turn, reduces to I L P L O A S , which reduces to I L P context L O A S and that I L P context L O A S reduces to I L P s L A S (all in polynomial time), to prove the complexity of verifying a hypothesis for each framework, it suffices to show that I L P c is D P -hard (thus also proving the hardness for each of the other frameworks) and that I L P s L A S is a member of D P (thus proving membership for the other frameworks). This shows that each framework is both a member of D P and also D P -hard, and therefore must be D P -complete.

Proposition 5.
Deciding verification for I L P s L A S is a member of D P .

Proposition 6. Deciding verification for I L P c is D P -hard.
We can now prove the complexity of deciding verification for I L P c , I L P L A S and I L P L O A S . This proves the corresponding entries in Table 2.

Theorem 2. Deciding whether a given H is a solution of any I L P c , I L P L A S , I L P L O A S or I L P context L O A S task is D P -complete in each case.
Proof. By Proposition 6, deciding the verification for I L P c is D P -hard. By Proposition 2, deciding the verification for I L P c reduces to deciding verification for I L P L A S which, in turn, reduces to deciding verification for I L P context L O A S , which reduces to deciding satisfiability for I L P L O A S , which again reduces to deciding verification for I L P s L A S and by Proposition 5, deciding verification for I L P s L A S is a member of D P . Deciding verification for each of these learning frameworks must therefore be both a member of D P and must be D P -hard. Hence, deciding verification for each framework is D P -complete. 2 Similarly, to show that deciding satisfiability is P 2 -complete for each framework, we only need to show that I L P s L A S is a member of P 2 and I L P c is P 2 -hard.

Proposition 7.
Deciding satisfiability for I L P s L A S is in P 2 .

Proposition 8. Deciding satisfiability for I L P c is P 2 -hard.
We can now prove the complexity of deciding satisfiability for I L P c , I L P L A S and I L P L O A S . This proves the remaining entries in Table 2.

Theorem 3. Deciding the satisfiability of any I L P c , I L P L A S , I L P L O A S or I L P context L O A S task is P
Proof. (similar to the proof of Theorem 2) By Proposition 8, deciding satisfiability for I L P c is P 2 -hard. By Proposition 2, deciding satisfiability for I L P c reduces to deciding satisfiability for I L P L A S which, in turn, reduces to deciding satisfiability for I L P context L O A S , which reduces to deciding satisfiability for I L P L O A S , which again reduces to deciding satisfiability of I L P s L A S . By Proposition 7, deciding satisfiability for I L P s L A S is in P 2 . Deciding satisfiability for each of these learning frameworks is therefore both a member of P 2 and is P 2 -hard. Hence, deciding satisfiability for each framework is P 2 -complete. 2

Considering noisy examples
Although the frameworks considered in this paper were originally presented under the assumption that all examples were perfectly labeled (i.e. there is no noise in the examples), some of the systems for solving these tasks do consider noise when searching for an optimal solution.
A common approach, used by both XHAIL [42] and ILASP [32] is to penalise hypotheses for the examples they do not cover. In ILASP, some examples can be labeled together with a penalty that must be paid if a hypothesis does not cover the example. Any example that is not labeled with a penalty must be covered by any inductive solution. Given a set of examples E, the score of a hypothesis H is said to be |H| + p(H, E), where |H| is the length of the hypothesis, and p(H, E) is the sum of the weights of all examples that are not covered by H . As a hypothesis is an inductive solution if and only if it covers all the examples that are labeled with a penalty, that were not labeled with an explicit penalty, the two decision problems of verification and satisfiability can be reduced to the corresponding decisions for non-noisy tasks (by simply removing any example with a penalty).

Generality
In this section, we present a new notion of the generality of a learning framework. The aim is to get a sense of which class of ASP programs a framework is capable of learning, if given sufficient examples. Language biases tend, in general, to impose their own restrictions on the classes of program that can be learned. They are primarily used to aid the performance of the computation, rather than to capture intrinsic properties of a learning framework. In this section we will therefore consider learning tasks with unrestricted hypothesis spaces: hypotheses can be constructed from any set of (first order) normal rules, choice rules and hard and weak constraints. We assume each learning framework F to have a task consisting of a pair B, E F , where B is the (first order ASP) background knowledge and E F is a tuple consisting of the examples for this framework; for example E L A S = E + , E − where E + and E − are sets of partial interpretations.
Allowing an unrestricted hypothesis space raises the question of whether a learning framework is general enough to define tasks that lead to a particular set of hypotheses as the inductive solutions. On a first instance, we could say that a framework F is general enough to learn a hypothesis H if there is at least one task T F in this framework such that H is an inductive solution of T F . However, as shown in Example 6, such a "loose notion" of generality may lead to the trivial learning framework, whose learning tasks have no examples, as the most general framework possible. Example 6. Consider the trivial learning framework I L P whose learning tasks are pairs B, E , where E is the empty tuple and B is an ASP program. I L P ( B, E ) is then the set of all ASP programs, i.e., every ASP program is a solution of every I L P task. Although for every hypothesis H , given any background knowledge B there is clearly a set of examples E such that H ∈ I L P ( B, E ), every other possible hypothesis is also a solution of this same task, making it impossible to distinguish any hypothesis from another.
It is clearly not sufficient to say that a framework is general enough to learn some target hypothesis (denoted from now on as H T ) if we can find at least one learning task with H T as a solution. What this definition lacks is a way to express that H T is a solution of a task T , but that some other (unwanted) hypothesis is not a solution of T . To capture this property of a learning framework we should be able to say that a task T can distinguish a hypothesis H T from the unwanted hypothesis. Pairs of target and unwanted hypotheses, which can be distinguished from each other, are an interesting starting point when considering generality of a learning framework. But this again might not be the only property of generality. Frameworks, such as brave induction, can distinguish the target hypothesis H T from two (or more) unwanted hypotheses, e.g., H 1 and H 2 , in two separate learning tasks, but they may not have a single learning task capable of accepting H T as inductive solution but neither H 1 nor H 2 . Consider for instance the following example.

Example 7.
Imagine the scenario where we are observing a coin being tossed several times. Obviously there are two outcomes, and we would like to learn an ASP program whose answer sets correspond to these two different outcomes. Consider the background knowledge B to be empty, and the atoms heads and tails to be true when the coin lands on heads or tails respectively. A more general notion of generality of learning framework can be considered, which looks at distinguishing a target hypothesis H T from a set of unwanted hypotheses S. In Section 5.2 we introduce the notion of one-to-many-distinguishability class of a learning framework. This corresponds to the class of pairs of single hypothesis H T 's and set S's of hypotheses for which a learning framework has at least one task that distinguishes H T from each hypothesis in S. Informally, this notion expresses the generality of a framework in finding a single target hypothesis in the presence of many unwanted hypotheses. In Section 5.3, we extend one-to-many-distinguishability class of a learning framework to many-to-many-distinguishability, which in turns captures the notion of distinguishing a set of target hypotheses S 1 from another set of unwanted hypotheses S 2 , with a single task.
In the remainder of this section we explore these three new measures of generality, expressed as three different learning problems. One-to-one-distinguishability determines the hypotheses that a framework is general enough to learn, while ruling out another unwanted hypothesis; one-to-many-distinguishability determines the hypotheses that can be learned from within a space of unwanted hypotheses; and finally, many-to-many-distinguishability determines exactly which sets of hypotheses can be learned. We will prove properties of our three classes of generalities making use of a definition of strong reduction from one framework to another. Strong reduction is different from the concept of reduction presented in [45]. Definitions 11 and 12 present, respectively, a reformulation of the notion of reduction introduced in [45] and of our new concept of strong reduction.
Example 8. Consider the I L P b and I L P c learning frameworks. I L P b → r I L P c , as any I L P b task B, E + , E − maps to the I L P c task B ∪ {:not e. | e ∈ E + } ∪ {:e. | e ∈ E − }, ∅, ∅ . I L P c does not, however, reduce to I L P b . Consider, for instance, the I L P c task T c = ∅, {p}, ∅ and assume that there is a task , and, given the assumption, H 1 is also in I L P b (T b ). But consider now the hypothesis Hence, I L P c does not reduce to I L P b , and I L P c is more r-general than I L P b .
We discuss the relationship between reductions and our own measures of generality in Section 6. Our notion of strong reduction differs from the above notion of reduction, in the fact that the reduced task must have the same background knowledge as the original task.
Proposition 9 shows the strong reduction relations between the frameworks considered in this paper. Note that although I L P c is more r-general than I L P b (as shown in Example 8), it is not more sr-general than I L P b . This is because without changing the background knowledge, I L P c cannot represent the same I L P b tasks. Proposition 9.

Distinguishability
A one-to-one-distinguishability class captures those pairs of hypotheses H 1 and H 2 that can be distinguished from each other with respect to a given possible background knowledge.

Definition 13.
The one-to-one-distinguishability class of a learning framework F (denoted D 1 1 (F )) is the set of tuples B, H 1 , H 2 of ASP programs for which there is at least one task Given two frameworks F 1 and F 2 , we say that F 1 is at least as (resp. more) D 1 1 -general as (resp. than) Note that the one-to-one-distinguishability relationship is not symmetric; i.e. there are pairs of hypotheses H 1 and H 2 such that, given a background knowledge B, H 1 can be distinguished from H 2 , but H 2 can not be distinguished from H 1 . This is illustrated by Example 9.
In fact, Proposition 10 generalises Example 9 showing that I L P b cannot distinguish any program containing a constraint from the same program without the constraint. Proof. Assume for contradiction that there is a hypothesis H = H ∪ C where C is a constraint and an I L One useful property is that if there is a strong reduction from one framework F 1 to another framework F 2 then . Note that F 2 is not guaranteed to be more D 1 1 -general than F 1 , even in the case when there is no reduction from F 2 to F 1 .

Proposition 11.
For any two frameworks F 1 and F 2 : Proof. Assume that F 1 → sr F 2 . Take any B, H 1 , H 2 ∈ D 1 1 (F 1 ). There must be some task T F 1 , with background knowledge B, such that H 1 ∈ I L P F 1 (T F 1 ) and H 2 / ∈ I L P F 1 (T F 1 ). Hence, as F 1 → sr F 2 , there must be some task T F 2 , with background knowledge B, such that H 1 ∈ I L P F 2 (T F 2 ) and H 2 / As there are clear strong reductions (shown by Proposition 9), an ordering of the one-to-one-distinguishability classes of the frameworks emerges (shown in Corollary 4).
While this does give us information about the ordering of the power of the frameworks to distinguish between hypotheses, it does not tell us, for example, what the relationship is between the distinguishability classes of I L P b and I L P c . It does not tell us which of the ⊆'s are strict (in fact, D 1 1 (I L P b ) = D 1 1 (I L P sm ), but the rest are strict subset relations). For each framework, Table 3 shows the necessary and sufficient condition needed to be able to distinguish hypotheses. In the case of the cautious induction framework, the condition makes use of a new notation. Given a program  Table 3. To aid readability, the proofs are in the appendix rather than the main paper.
Interestingly, although I L P sm sr I L P b , D 1 1 (I L P b ) = D 1 1 (I L P sm ). This is shown by Proposition 14. The reason for this is that if I L P sm can distinguish one hypothesis H 1 from another hypothesis H 2 then, there must be some task T sm such that H 1 is a solution of T sm and H 2 is not. This means that H 1 must cover all of the examples of T sm and there must be at least one (partial interpretation) example of T sm which is not covered by H 2 . This partial interpretation example can be given as the set of positive and negative examples in an I L P b task. This I L P b task will then distinguish H 1 from H 2 .
To better compare the conditions for I L P b and I L P c , we can express the necessary and sufficient condition of I L P b in terms of the notion E b (P ). Specifically, in I L P b for one hypothesis H 1 to be distinguishable from another hypothesis H 2 (with respect to a background knowledge B) it is both necessary and sufficient for E b (B ∪ H 1 ) to contain at least one conjunction that is not in So, as the one-to-one-distinguishability condition for I L P b could also be expressed as , as shown in Proposition 15. .
We now prove the one-to-one-distinguishability classes of our own frameworks, I L P L A S and I L P L O A S . D 1 1 (I L P L A S ) contains both D 1 1 (I L P b ) and D 1 1 (I L P c ) as I L P L A S can distinguish any two hypotheses which, combined with the background knowledge, have different answer sets.
As shown in Theorem 4, I L P L O A S is more D 1 1 -general than I L P L A S . This is because I L P L O A S is able to use its ordering examples to distinguish any two hypotheses that, when combined with the background knowledge, order their answer sets differently, even if the two programs have the same answer sets. .
Note that we assume I L P L O A S to be able to give ordering examples with any of the binary ordering operators. The slightly more restrictive version of I L P L O A S , presented in [23] where the operator is only the <, has a smaller one-to-onedistinguishability class. This is shown in Example 11.
Example 11. Consider the heads and tails problem again, where B = 1{heads, tails}1. , and two potential hypotheses: I L P L O A S can distinguish any two hypotheses that, when combined with a fixed background knowledge, behave differently. It cannot distinguish hypotheses that are different but behave the same with respect to the background knowledge. This means that there are some hypotheses that are not strongly equivalent (when combined with the background knowledge), but I L P L O A S cannot distinguish one from the other. We now show that I L P context L O A S can distinguish between any two hypotheses, H 1 and H 2 , that, when combined with the background knowledge, are not strongly equivalent, or there is at least one program C ∈ ASP ch (consisting of normal rules, choice rules and hard constraints), such that Now that we have proven the distinguishability classes for each learning framework, we can strengthen the statement of Corollary 4 and more precisely state the relationship between the distinguishability classes of the frameworks. Apart from the case of I L P b and I L P sm , each of the subset relations in Corollary 4 are in fact strict subsets. Proof.
1. The fact that D 1 1 (I L P b ) = D 1 1 (I L P sm ) was shown in Proposition 14. By Corollary 4, D 1 B, H 1 , H 2 does not satisfy the condition, given in Table 3, necessary for it to be in D 1 1 (I L P sm ). It does, however, satisfy the condition for it to be in Hence, by the conditions in Table 3 Table 3

The one-to-many-distinguishability class of a learning framework
In practice an ILP task has a search space of possible hypotheses, and it is important to know the cases in which one particular hypothesis can be distinguished from the rest. In what follows, we analyse the conditions under which a learning framework can distinguish an hypothesis from a set of other hypotheses. As mentioned at the beginning of Section 5, this corresponds to the new notion we call the one-to-many-distinguishability class of a learning framework, which is a generalisation of the notion of the one-to-one-distinguishability class described above.

Definition 14.
The one-to-many-distinguishability class of a learning framework F (denoted D 1 m (F )) is the set of all tuples B, H, {H 1 , . . . , H n } such that there is a task T F which distinguishes H from each H i with respect to B. Given two frameworks F 1 and F 2 , we say that F 1 is at least as (resp. more) D 1 m -general than The one-to-many-distinguishability class tells us the circumstances in which a framework is general enough to distinguish some target hypothesis from a set of unwanted hypotheses. Note that, although the tuples in a one-to-many-distinguishability class that have a singleton set as third argument correspond to the tuples in a one-to-onedistinguishability class of that framework, it is not always the case that if F 1 is more D 1 m -general than F 2 then F 1 is also more D 1 1 -general than F 2 . For example, we will see that I L P sm is more D 1 m -general than I L P b , but we have already shown in Proposition 14 that the I L P b and I L P sm are equally D 1 1 -general. Proposition 19 shows, however, that if F 1 is at least as D 1 m -general as F 2 then F 1 is at least as D 1 1 -general as F 2 .

Proposition 19.
For any two frameworks F 1 and F 2 such that F 1 is at least as D 1 m -general as F 2 , F 1 is at least as D 1 Proof. Assume that F 1 is at least as D 1 m -general as F 2 and let B, H 1 , H 2 ∈ D 1 1 (F 2 ). To show that F 1 is at least as D 1 1 -general as F 2 , we must show that B, H 1 , We have already seen that if there is a strong reduction from F 1 to F 2 then F 2 is at least as D 1 1 -general as F 1 . Proposition 20 shows that a similar result holds for D 1 m -generality. Similarly to D 1 1 -generality, however, a strong reduction from F 1 to F 2 does not imply that F 2 is more D 1 m -general than F 1 , even in the case that there is no strong reduction from F 2 to F 1 .

Proposition 20.
For any two frameworks F 1 and F 2 : Proof. Assume that F 1 → sr F 2 . Take any B, H, S ∈ D 1 m (F 1 ). There must be some task T F 1 , with background knowledge B, such that H ∈ I L P F 1 (T F 1 ) and S ∩ I L P F 1 (T F 1 ) = ∅. Hence, as F 1 → sr F 2 , there must be some F 2 task T F 2 , with background knowledge B, such that H ∈ I L P F 2 (T F 2 ) and S ∩ I L P Due to the strong reductions shown in Proposition 9, an ordering of the one-to-many-distinguishability classes of the frameworks emerges (shown in Corollary 5). This time, we will see that each of the ⊆'s in Corollary 5 can be upgraded to a strict ⊂. Rather than proving the one-to-many-distinguishability classes from scratch, we now present a useful result. For some frameworks, the one-to-onedistinguishability class of a learning framework can be used to construct the one-to-many-distinguishability class. This is the case when the framework has closed one-to-many-distinguishability (formalised by Definition 15). Proposition 21 and Corollary 6 show how the one-to-many-distinguishability class of a framework can be constructed using its one-to-onedistinguishability class if it has closed one-to-many-distinguishability.

F ) . The equality holds if and only if F has closed one-to-many-distinguishability.
Note that not all learning frameworks have closed one-to-many-distinguishability; for instance, Example 12 shows that brave induction does not. We will show that induction of stable models, on the other hand, does have closed one-to-manydistinguishability. ; hence, to show that I L P b does not have closed one-to-many-distinguishability it is sufficient to show that B, H, {H 1 , H 2 } / ∈ D 1 m (I L P b ). Hence it remains to show that there is no task Assume for contradiction that there is such a task T b . As H ∈ I L P b (T b ) and A S(B ∪ H) = {{heads}, {tails}}, E + ⊂ {heads, tails} and E − ⊂ {heads, tails} (neither can be equal to {heads, tails} or H would not be a solution).

Case 1: E + = ∅
Case a: E − = ∅ Then H 1 and H 2 would be inductive solutions. This is a contradiction as {H 1 , Then H 2 would be an inductive solution of T b . Contradiction.

Case c: E − = {tails}
Then H 1 would be an inductive solution of T b . Contradiction.
In contrast to I L P b , I L P sm (which we will see does have closed one-to-many-distinguishability), can distinguish H from H 1 and H 2 with the task B, { {heads}, ∅ , {tails}, ∅ } . Note that this is a combination of the two brave tasks which distinguish H from H 1 and from H 2 . We will show that the ability to combine tasks in this way is a sufficient condition for a framework to have closed one-to-many-distinguishability. Proposition 22 shows the one-to-many-distinguishability class of I L P b . Proof.

. ∪ A S(B ∪ h m ). 2
For a framework F to have closed one-to-many-distinguishability it is sufficient (but not necessary) that for every two F tasks, there is a third F task whose solutions are exactly those hypotheses which are solutions to both of the original two tasks. This is formalised and proved in Lemma 2. This condition is not necessary in general, but it holds for the frameworks considered in this paper that have closed one-to-many-distinguishability.

Lemma 2. For any learning framework F to have closed one-to-many-distinguishability, it is sufficient that for every pair of learning tasks T
Proof. Assume that for every pair of learning tasks T 1 F it is possible to construct a new learning task T 3 As B, H, S 1 ∪ . . . ∪ S k ∈ D 1 m (F ) (by the inductive hypothesis), there must be a learning task T 1 F such that H ∈ I L P F (T 1 F ) and (S 1 ∪ . . . ∪ S k ) ∩ I L P F (T 1 F ) = ∅. As B, H, S k+1 ∈ D 1 m (F ), there must also be a learning task T 2 F such that H ∈ I L P F (T 2 F ) and S k+1 ∩ I L P F (T 2 F ) = ∅. By our initial assumption, there is a learning task T 3 Corollary 7. Given two frameworks F 1 and F 2 with closed one-to-many-distinguishability: Even if two frameworks F 1 and F 2 both have closed one-to-many-distinguishability, it might not be the case that their combination has closed one-to-many-distinguishability. Example 13 shows, for example, that this is not the case for I L P sm and I L P c . We define first what we mean by combination framework constructed from two given frameworks.

Definition 16.
Given two frameworks F 1 and F 2 , the combination framework comb(F 1 , F 2 ) allows any task B, 1, E 1 , where B, E 1 is an F 1 task, and any task B, 2, E 2 , where B, E 2 is an F 2 task.

The many-to-many-distinguishability class of a learning framework
So far, we have considered two main classes to define how general a learning framework is. Firstly, we discussed the one-to-one-distinguishability class, which is made up of tuples B, H, H such that the framework can distinguish H from H with respect to B. We showed that this has limitations and cannot separate I L P b and I L P sm even though I L P b is clearly a special case of I L P sm . This motivated upgrading the notion of a one-to-one-distinguishability class, changing the third element of each tuple from a single hypothesis to a set of hypotheses to give the notion of a one-to-many-distinguishability class.
This naturally leads to the question of whether it is possible to upgrade generality classes by allowing the second element of the tuple to also be a set of hypotheses. Each tuple would then be of the form B, S 1 , S 2 , where B is a background knowledge, and S 1 and S 2 are sets of hypotheses. For each tuple in this new class, a framework would be required to have at least one task T with the background knowledge B such that every hypothesis in S 1 is an inductive solution of T , and no hypothesis in S 2 is an inductive solution of T . Definition 17 formalises this many-to-many-distinguishability class.
Definition 17. The many-to-many-distinguishability class of a learning framework F (denoted D m m (F )) is the set of all tuples B, S 1 , S 2 , where B is a program and S 1 and S 2 are sets of hypotheses for which there is a task T F , with background knowledge B, such that S 1 ⊆ I L P F (T F ) and S 2 ∩ I L P F (T F ). Given two frameworks, F 1 and F 2 , we say that F 1 is at least as (resp. more) D m m -general than F 2 if and only if D m We have already seen that for any two frameworks, F 1 and F 2 , . We have also seen that for D 1 1 -generality and D 1 m -generality, even if there is no corresponding strong reduction from F 2 to F 1 these subset relations are not necessarily strict. Proposition 24 and Corollary 8 show that D m m -generality is equivalent to strong reductions.

Proposition 24. For any two learning frameworks
Proof.

Assume that
. Let T F 1 be an arbitrary F 1 task. We must show that there is a F 2 task with the same background knowledge and the same inductive solutions. Let B be the background knowledge of T F 1 , S 1 = I L P F 1 (T F 1 ) and S 2 be the (possibly infinite) set of ASP programs which are not in S 1 . B, S 1 , S 2 ∈ D m m (F 1 ); and hence, B, S 1 , S 2 ∈ D m m (F 2 ). Therefore, there must be at least one task T F 2 with the background knowledge B such that I L P F 2 (T F 2 ) = S 1 .
Theorem 7 shows that one framework being more D 1 m -general than another implies that it is also more D m m -general if there is a strong reduction from the second framework to the first.    m -generality relation between the two. In the next section, we discuss relationships between, and the relative merits of using, each measure of generality. Table 4 summarises the relationships between the different measures of generality presented in this paper. It shows that equal one-to-one-distinguishability is weaker than equal one-to-many-distinguishability, which is weaker than equal manyto-many-distinguishability. This can be seen from the first section of the table, as equal many-to-many-distinguishability implies equal one-to-many-distinguishability, which implies equal one-to-one-distinguishability, but the converse implications do not hold in general. On the other hand different one-to-one-distinguishability is stronger than different one-tomany-distinguishability, which in turn is stronger than different many-to-many-distinguishability. This means that manyto-many-distinguishability (resp. one-to-many-distinguishability) will be able to "separate" frameworks that one-to-manydistinguishability (resp. one-to-one-distinguishability) can not; but, there are more frameworks that are incomparable under many-to-many-distinguishability (resp. one-to-many-distinguishability) than one-to-many-distinguishability (resp. one-toone-distinguishability).

Discussion
The different notions of generalities will never be inconsistent, in the sense that one will never say that F 1 is more general than F 2 , while the other says that F 2 is more general than F 1 . It is useful, however, to explain the tasks that the different measures of generality correspond to.
1. One-to-one-distinguishability describes how general a framework is at distinguishing one hypothesis from another. 2. One-to-many-distinguishability describes how general a framework is at the task of identifying one target hypothesis within a space of unwanted hypotheses. 3. Many-to-many-distinguishability describes how general a framework is for the task of identifying a set of target hypotheses -for any background knowledge B and set of hypotheses S, there is a task T F with background knowledge In practice, as ILP usually addresses the task of finding a single target hypothesis from a space of other hypotheses, oneto-many-distinguishability is likely to be the most useful measure; however, one-to-one-distinguishability classes are useful for finding the one-to-many-distinguishability classes of frameworks, and many-to-many-distinguishability is interesting as a theoretical property.

More general learning frameworks
We have shown in this section that I L P context L O A S is more general (under every measure) than any of the other tasks presented for learning under the answer set semantics. The obvious question is whether it is possible to go further and define more general learning tasks.
The most D 1 1 -general learning task possible would be able to distinguish between any two different ASP programs H 1 and H 2 with respect to any background knowledge B. This would require the learning task to distinguish between programs which are strongly equivalent, such as {p. q:-p.} and {p:q. q.}. We would argue that this level of one-to-onedistinguishability is unnecessary as in ILP, we aim to learn programs whose output explains the examples. As two strongly equivalent programs will always have the same output, even when combined with additional programs providing "context", we can not see any reason for going further under D 1 1 -generality. As I L P context L O A S has closed one-to-many-distinguishability, the same argument can be made for D 1 m -generality.
One outstanding question is whether it is worth going any further under D m m -generality. Note that it is possible to define the notion of the closure of many-to-many-distinguishability classes; however, none of the frameworks considered in this paper have closed many-to-many-distinguishability. It is unclear whether having closed many-to-many-distinguishability is a desirable property for a framework. Closed one-to-many-distinguishability means that a framework can distinguish a target hypothesis H from any set of hypotheses S such that it can distinguish H from each element of S: this means that the sets of examples that distinguish H from each element of S can be combined to form a single set of examples, ruling out each element of S. For a framework to have closed many-to-many-distinguishability, however, given two (or more) target hypotheses h 1 , h 2 that can be distinguished from an undesirable hypothesis h 3 , it would need to be able to find a task which distinguished both h 1 and h 2 from h 3 . For example, as both ∅, {heads.}, {1{heads, tails}1.} and ∅, {tails.}, {1{heads, tails}1.} are in D 1 1 (I L P L A S ), for I L P L A S to have closed many-to-many-distinguishability it would need to be able to find a task with an empty background knowledge that distinguishes both {heads.} and {tails.} from {1{heads, tails}1.}. It is difficult to imagine a scenario, however, where we should learn either the hypothesis that a coin is always heads or always tails, when the choice rule is not a desirable hypothesis.

The generality of noisy frameworks
As discussed in Section 4.4 some learning systems are able to solve tasks where examples are potentially noisy -in this case, not all examples should necessarily be covered, and there is a trade off between maximising coverage and not over-fitting the examples. One method, used by the XHAIL [42] and ILASP [32] systems is to penalise a hypothesis for each example that is not covered. Examples are given a positive integer penalty, which must be paid if the example is not covered.
The three measures of generality presented in this section could be extended to cover the noisy tasks. For instance, in the case of one-to-one-distinguishability we could define the "noisy" one-to-one-distinguishability class of a learning framework as the set of tuples B, H 1 , H 2 , for which there is a set of examples E such that p(H 1 , E) < p(H 2 , E), where p(H, E) is the total penalty paid by a hypothesis H (together with the background knowledge B) over the examples E. In fact, we now show that this extended notion of one-to-one-distinguishability class would be equivalent to the standard "non-noisy" one-to-one-distinguishability class.
As all penalties are positive, p(  1 , H 2 that would be in the "noisy" one-to-one-distinguishability class is in the standard "non-noisy" one-to-one-distinguishability class. Similarly for any tuple B, H 1 , H 2 in the standard one-to-one-distinguishability class, there is a set of examples E such that H 1 covers every example in E, and H 2 does not; hence p(H 1 , E) < p (H 2 , E), and so B, H 1 , H 2 would be in the "noisy" one-to-one-distinguishability class.
A similar argument holds for the one-to-many-distinguishability class; however, it is worth noting that it does not hold true for the many-to-many-distinguishability class. If we upgrade the many-to-many-distinguishability class in the same way then there are some tuples which are in the "noisy" many-to-many-distinguishability class for a framework, but not in the standard many-to-many-distinguishability class. Take for instance the example discussed in the previous section:

Related work
The complexity of I L P b and I L P c for verification and satisfiability were investigated in [28]. However, in that work, the results on satisfiability are for deciding whether or not a task has any solutions with no restrictions on the hypothesis space. This means that for both I L P b and I L P c deciding whether a task is satisfiable is equivalent to checking whether there is a model of B in which the examples are covered (a simpler decision problem). For this reason, the complexity of satisfiability for I L P c in [28] was N P -complete, rather than P 2 -complete. The complexities given of verification of a hypothesis given in [28] are also different from the ones in this paper, as they consider a different language for B ∪ H . They consider disjunctive logic programs, whereas we investigated the complexity of learning programs without disjunction. The reason we chose not to consider disjunctive logic programs is that the systems available for ILP under the answer set semantics do not allow disjunction. For example, the systems for I L P b [38,39] do not allow disjunction, and allowing disjunction would raise the complexity beyond the complexity of the tasks that are actually solved in practice by the existing systems. As discussed in Section 5, the generality of a learning framework has been investigated before. In [45], the author defined generality in terms of reductions -one framework F 1 was said to be more general than another framework F 2 if and only if F 2 → r F 1 and F 1 r F 2 . We showed in Section 5 that our final notion of generality (many-to-many-distinguishability) coincides with a similar notion of strong reductions. The difference with strong reductions, as compared to the reductions in [45], is that strong reductions do not allow the background knowledge to be modified as part of the reduction. We showed in Example 8 that I L P b reduces to I L P c , but I L P b does not strongly reduce to I L P c . This is because any reduction from I L P b to I L P c must encode the examples in the background knowledge, which we would argue abuses the purpose of the background knowledge. Aside from the differences in strong reductions and reductions, we discussed in Section 5 that one-to-manydistinguishability is more relevant when comparing the generalities of frameworks with respect to the task of finding a single hypothesis within a space of hypotheses. The reductions of [45] are closer to the notion of many-to-manydistinguishability, because they compare the set of solutions.
One key advantage to using our three notions of generality, rather than strong reductions or reductions, is for comparing the relative generalities of frameworks that do not strongly reduce to one another. For instance, we have seen that I L P b and I L P c are incomparable under D-generality, but we can still reason that I L P b is never D-general enough to distinguish a hypothesis containing a constraint from the same hypothesis without the constraint. On the other hand, I L P c may be D-general enough to do so (for example, I L P c can distinguish {:p.} from ∅ with respect to the background knowledge {0{p}1.}, with the task {0{p}1.}, ∅, {p} ).

Other learning frameworks
Traditional ILP aims to learn Prolog style logic programs, often restricted to learning definite programs (with no negation as failure). For the shared subset of the languages learned by these ILP frameworks and the ASP frameworks (definite rules, not including lists), a definite learning task can be expressed as either a brave, or as a cautious task with the same examples as the definite task, and hypothesis space restricted to definite logic programs. As these frameworks do not support features such as choice rules or constraints or negation, and ASP frameworks do not support lists, a comparison of the generality is not very informative. A review of early efforts to extend ILP to learn normal logic programs was presented in [8]. The techniques discussed in [8] that operate under the stable model (or answer set) semantics require that all examples are covered in all stable models (or answer sets). This corresponds to cautious induction.
We have already discussed most of the other frameworks for ILP which work under the answer set semantics and shown in sections 4 and 5 how the complexity and generality of these frameworks compare to our own frameworks. In particular, we have shown that although the complexities of our three learning frameworks (I L P L A S , I L P L O A S and I L P context L O A S ) are the same as cautious induction, there are some learning problems which can be represented in learning from answer sets that cannot be represented in either brave or cautious induction. One example of this is the learning of the rules of Sudoku. This is because brave induction cannot incentivise learning the constraints in the rules of Sudoku, and there are no useful examples that can be given to a cautious learner about the values of cells, since no cell has the same value in every valid Sudoku board.
Another early work on learning frameworks under the answer set semantics is Induction from Answer Sets [46]. In the paper, two learning algorithms I A S pos and I A S neg are presented. The task of I A S pos is to learn a hypothesis that cautiously entails a set of examples. This corresponds to the task of cautious induction. I A S neg on the other hand aims to find a hypothesis that does not cautiously entail each of a set of examples (i.e. there should be at least one answer set that does not contain each example). This is (in some sense reversed) brave induction. As shown in the paper, in general the I A S pos and I A S neg procedures are cannot be combined in general to compute a correct hypothesis.
Another framework, under the supported model semantics rather than the answer set semantics, is Learning from Interpretation Transitions (LFIT) [47]. In LFIT, the examples are pairs of interpretations I, J where J is the set of immediate consequences of I given B ∪ H . In [24], we presented a mapping from any LFIT task to an I L P context L A S task. This shows that the complexity of deciding both satisfiability and verification for LFIT is at most P 2 -complete. The generality, on the other hand would be different to the tasks we have considered, since there are programs that are strongly equivalent under the answer sets semantics that have different supported models. Example 15 demonstrates a pair of such programs, and an example that learning from interpretations could use to distinguish between them.  P 1 = p. P 2 = p:-p. p:-not p.
For both programs P 1 and P 2 , the immediate consequences of any interpretation I is the set {p}. This means that no example could possibly distinguish P 1 from P 2 with respect to an empty background knowledge. Under the answer set semantics, however, P 1 has one answer set {p}, but P 2 has no answer sets. I L P L A S can therefore distinguish P 1 from P 2 (with respect to the empty background knowledge), with the positive example {p}, ∅ .

I L P L A S , I L P L O A S and I L P context L O A S
have different distinguishability classes to LFIT, but none is either more or less D 1 1 -general than LFIT. This is an interesting observation, as it demonstrates that even when two frameworks are incomparable under our measures of generality, we can still reason about their individual distinguishability classes and discuss hypotheses which one framework is powerful enough to distinguish between and another is not. For instance, I L P L A S cannot distinguish between any two hypotheses that are strongly equivalent under the answer set semantics, but Example 15 shows that there are some cases where I L P L F I T can.

Relation to probabilistic ILP
One of the advantages to learning ASP programs rather than Prolog programs is that ASP allows the modeling of nondeterminism, either through unstratified negation or through choice rules. The latter can be seen in the coin examples throughout the paper, where we have shown that our I L P L A S framework can learn that a coin can be either heads or tails, but not both.
Another method for achieving non-determinism in ILP is by adding probabilities. Probabilistic Inductive Logic Programming [48] is a combination of ILP with probabilistic reasoning. Its aim is to learn a logic program that is annotated with probabilities. The task of PILP is often divided into structure learning, where the underlying logic program is learned, and parameter estimation or weight learning, where the probabilities are learned. A key difference between I L P L A S and PILP is that while both aim to learn programs which are non-deterministic, I L P L A S aims to learn programs whose answer sets capture the set of possibilities, whereas PILP aims to learn a probability distribution over these possibilities.
Although there has been significant progress in the field of PILP [25,[49][50][51]26] for learning annotated Prolog programs, PILP under the answer set semantics is still relatively young, and thus, there are few approaches. PrASP [52,53,27] considers the problem of weight learning, and in fact uses a similar example of learning about coins. This example illustrates the difference between weight learning and standard ILP. In ILP our task is to learn that there are exactly two possibilities (heads and tails); whereas in weight learning, the goal is to estimate probabilities of each possibility. PROBXHAIL [54] does attempt to combine structure learning and weight learning, but can only learn definite logic programs.
While the coin example used in this paper may be viewed as inherently probabilistic, there are situations in practice where we may wish to learn non-deterministic programs without considering probability; for instance, in policy learning. A policy may well permit many valid actions in a given scenario, and impose some constraints on these actions. The task is to learn a program whose answer sets reflect the set of valid options, rather than to estimate the probability of each action being taken.

Conclusion
In this paper we have investigated the complexity and generality of the state of the art frameworks for learning answer set programs. We have shown, for the two decision problems of verification that a hypothesis is an inductive solution of a task and deciding whether a given task is satisfiable, that brave induction (I L P b ) and induction of stable models (I L P sm ) have the same complexities, and that cautious induction (I L P c ), learning from answer sets (I L P L A S ), learning from ordered answer sets (I L P L O A S ) and context dependent learning from ordered answer sets (I L P context L O A S ) also have the same complexities as each other, but higher than I L P b and I L P sm . Studying the complexity of decision problems for the learning frameworks is important, as it gives a sense of the price paid for choosing a particular framework. In contrast, generality is important, as it shows the advantages of choosing one framework over another, by specifying which hypotheses can be learned by each framework. When using ILP in practice, a trade off must be made between the complexity and generality of the framework. The generality classes presented in this paper can inform this decision, as it is likely to be influenced by the class of programs that must be learned.
We have introduced three new measures of generality (D 1 1 -generality, D 1 m -generality and D m m -generality), and shown that, both under our own measures of generality, and by using the concept of strong reductions, there is an ordering of the generalities of the frameworks considered in this paper. Although I L P c , I L P L A S , I L P L O A S and I L P context L O A S have the same computational complexities, I L P c is less general than I L P L A S , which is less general than I L P L O A S , which is less general than I L P context L O A S , under each measure of generality. This ordering could have been seen using strong reductions, but our measures go further. They allow us to reason about why one framework is more D 1 1 -general than another, for example, by studying the class of tuples which are in one framework's distinguishability class, but not the others. They also allow us to discuss the generalities of frameworks which are incomparable under strong reductions; for example, there is no strong reduction from I L P c to I L P b , or from I L P b to I L P c . Our measures allow us to show, however, that I L P b is not D 1 1 -general enough to distinguish a hypothesis containing a constraint from the same program without the constraint, but in some cases I L P c is D 1 1 -general enough to do so.
In this paper, most of the results we have presented have addressed non-noisy learning frameworks. In general our ILASP systems do support noise, by allowing examples to be labeled with a penalty. In this case, ILASP searches for a hypothesis that minimises the sum |H| + p(H, E), where p(H, E) is the sum of all examples in a set E that are not covered by a hypothesis H . Such a hypothesis is called an optimal solution. For the two decision problems of verification and satisfiability, we have shown that the complexity results are unaffected. In current work we are investigating whether the complexities of the non-noisy frameworks and the noisy frameworks differ for the decision problem of verifying that a hypothesis is an optimal solution of a given task. In future work, we also hope to "upgrade" the propositional complexity results presented in this paper to apply to the learning of first order answer set programs.

Deciding both verification and satisfiability for I L P context L O A S reduces polynomially to the corresponding I L P L O A S decision problem. 4. Deciding both verification and satisfiability for I L P L O A S reduces polynomially to the corresponding I L P s L A S decision problem.
Proof. In [24], we presented a mapping from any I L P context L O A S task to an I L P L O A S task. The correctness of this mapping is proven in Theorem 1 of [24]. Given any I L P context L O A S task, we can decide its satisfiability by using this mapping and checking the satisfiability of the resulting I L P L O A S task. Similarly, given any hypothesis and I L P context L O A S task, we can verify that the hypothesis is an inductive solution of the task by using the mapping. Hence, both satisfiability and verification for I L P context L O A S reduce to satisfiability and verification (respectively), for I L P L O A S .

We do this by translating an arbitrary I L P L O A S task T L O
Before we do this, we define several new atoms used in our meta representation.
For i ∈ {1, 2}, let f i be a function which maps each atom a in B ∪ S M to a new atom a i . We also extend this notation to work on sets of atoms and rules (and parts of rules) by replacing each atom a in the set or rule with f i (a). For each rule R ∈ S M , define a new atom in_h R . For each weak constraint W ∈ B ∪ S M let id 1 (W ) and id 2 (W ) be two new (propositional) atoms and let wt(W ) be the weight of W and priorit y(W ) be the priority level of W . For any two terms t 1 and t 2 , dominates(t 1 , t 2 ) is defined as below.
Consider the task T s  (1,2) ∪ dominates (2,1) ∪ {dom:dom(1, 2). dom:-dom(2, 1).} By using the splitting set theorem [35], it can be shown that for any H ∈ S M : Hence, as the rules in dominates(t 1 , t 2 ) describe exactly the behaviour of the weak constraints in B ∪ H for two answer sets (with dom(t 1 , t 2 ) being true if and only if the first answer set dominates the second):  This means that we can check the satisfiability of any I L P L O A S task (and similarly verify a solution) by mapping the task to an I L P s L A S task as above. Note that this is a well defined I L P s L A S task as B contains only stratified aggregates. As this mapping is polynomial in size of the original task, this means that verification and satisfiability for I L P L O A S each reduces polynomially to the corresponding decision problem for I L P s L A S . 2 Proposition 3. Verifying whether a given H is an inductive solution of a general I L P b task is N P -complete.
Proof. Let T b be any I L P b task B, S M , E + , E − . For any H ⊆ S M , H ∈ I L P b (T b ) if and only if B ∪ H ∪ {:not e + . | e + ∈ E + } ∪ {:e − . | e − ∈ E − } is satisfiable. As deciding the satisfiability of this program is N P -complete (B ∪ H contains only normal rules, choice rules and constraints), and the program can be constructed in polynomial time, this means that deciding verification for I L P b is in N P .
It remains to show that deciding verification is N P -hard. We do this by showing that deciding satisfiability for any ASP program P containing normal rules, choice rules and constraints can be reduced polynomially to deciding verification for an I L P b task. Consider the I L P b task T b = P , ∅, ∅, ∅ . Let H = ∅. H ∈ I L P b (T b ) if and only if there is an answer set of P ∪ H , and hence, if and only if P is satisfiable. 2 Proposition 4. Deciding the satisfiability of a general I L P b task is N P -complete.
Proof. First we will show that deciding the satisfiability of a general I L P b task is in N P . We do this by mapping an arbitrary task T = B, S M , E + , E − to an ASP program whose answer sets can be mapped to the solutions of T . This program will be satisfiable if and only if T is satisfiable and as the program is aggregate stratified, checking whether the program is satisfiable is in N P . Hence, if we can construct such a program then we will have proved that deciding satisfiability for I L P b is in N P .
For each R i ∈ S M we define a new atom in_h R i . Also, let meta(R i ) be the rule R i with the additional atom in_h R i added to the body.
We define the meta encoding T meta as follows: . (This can be seen by using the splitting set theorem, with {in_h R i | R i ∈ S M } as the splitting set.) Hence It remains to show that deciding the satisfiability of a general I L P b task is N P -hard. Deciding the satisfiability of a normal logic program is N P -hard, so demonstrating that a deciding the satisfiability of a normal program P can be mapped to a I L P b task is sufficient.
Let P be any normal logic program. Let T be the I L P b task P , ∅, ∅, ∅ . T is satisfiable if and only if ∃H ⊆ ∅ such that ∃A ∈ A S(P ∪ H) such that ∅ ⊆ A and A ∩ ∅ = ∅. This is true if and only if P is satisfiable.
Hence, deciding the satisfiability of a general I L P b task is N P -complete. 2

Proposition 5. Deciding verification for I L P s L A S is a member of D P .
Proof. Checking whether H is an inductive solution of an I L P s L A S task T = B, S M , E + , E − can be achieved by mapping T to two aggregate stratified ASP programs P + and P − , such that H ∈ I L P L A S (T ) if and only if P + bravely entails an atom and P − cautiously an atom.
For any integer i ∈ [1, n], let f i be a function mapping the atoms a in B ∪ H to new atoms a i . We extend the notation to allow f i to act on ASP programs (substituting all atoms in the program). Let P + be the program:  Proof. To prove that verification that a hypothesis is a solution of an I L P c task is D P -hard, we must prove that any problem in D P can be reduced to the verification task. Let D be any arbitrary decision problem which is in D P . By the definition of D P , this is the case if and only if there exist two decision problems D 1 and D 2 such that D 1 is in N P , D 2 is in co-N P and D returns yes if and only if both D 1 and D 2 return yes.
By Lemma 1 and Corollary 1, this is the case if and only if there are two programs P 1 and P 2 and two atoms a 1 and a 2 such that both P 1 |= b a 1 and P 2 |= c a 2 if and only if D returns yes. Without loss of generality we can assume that the atoms in P 1 (together with a 1 ) are disjoint from the atoms in P 2 (together with a 2 ).
Take T c to be the I L P c task B, S M , E + , E − , where the individual components of the task are defined as follows: not a 1 . 0{a 3 }1. a 2 :-not a 3 .} (where we assume a 3 to be a new atom and append(P , a) to add the atom a to the body of all rules in P ) ∅ ∈ I L P c (T c ) if and only if (P 1 ∪ {:not a 1 .}) is satisfiable and append(P 2 , a 3 ) ∪ {0{a 3 }1. a 2 :-not a 3 .} |= c a 2 . This is the case as the two subprograms P 1 ∪ {:not a 1 .} and append(P 2 , a 3 ) ∪ {0{a 3 }1. a 2 :-not a 3 .} are disjoint, and the latter is guaranteed to be satisfiable (it will always have the answer set {a 2 }).
Hence ∅ ∈ I L P c (T c ) if and only if P 1 |= b a 1 , and P 2 |= c a 2 . But this is the case if and only if D returns yes.
Hence any problem in D P can be reduced to verifying that a hypothesis is an inductive solution of an I L P c task.
Hence verification for I L P c is D P -hard. 2 Hence, deciding the existence of a solution for an I L P s L A S task is in P 2 . 2 Proposition 8. Deciding satisfiability for I L P c is P 2 -hard.
Proof. We show this by reducing a known P 2 -complete problem (deciding the existence of an answer set for a ground disjunctive logic program [55]) to an I L P c task.
Take any ground disjunctive logic program P . We will define an I L P c task T (P ) which has a solution if and only if P has an answer set.
Let Atoms be the set of atoms in P . Let P be the program constructed by replacing each negative literal not a with the literal not in_as(a) (where in_as is a new predicate) and replacing each head h 1 ∨ . . . ∨ h m with the counting aggregate 1{h 1 , . . . , h m }m (empty heads are mapped to 1{}0 -this is equivalent to ⊥).
A is a minimal model of P A ⇔ ∃A ⊆ Atoms such that A an answer set of P . ⇔ P is satisfiable.
Hence, deciding whether a disjunctive logic program is satisfiable can in general be mapped to the decision problem of checking the existence of solutions of a learning from answer sets task.
Therefore, deciding the existence of solutions of a ground I L P c task is P 2 -hard. 2

A.2. Proofs from Section 5
Proposition 12. For any programs P 1 and P 2 , Proof.
Assume c = i 1 ∧ . . . ∧ i m , ∧not e 1 , . . . , not e n ∈ E b (P 1 ). Then there must be an answer set A of P 1 which contains all of the i's and none of the e's. Hence, there is also such an answer set of P 2 which contains all of the i's and none of the e's. Hence, c ∈ E b (P 2 ).
• Conversely, assume that E b (P 1 ) ⊆ E b (P 2 ). Let A ∈ A S(P 1 ), we must show that A ∈ A S(P 2 ). Let L be the set H B P 1 ∪ H B P 2 .
As A ∈ A S(P 1 ), c = i 1 ∧ . . . ∧ i m , ∧not e 1 , . . . , not e n ∈ E b (P 1 ), where the i's are the set of atoms in A and the e's are the set of atoms in L\A. As c ∈ E b (P 1 ), c ∈ E b (P 2 ) and hence there is an answer set A of P 2 which contains each i ∈ A but no atom e ∈ L\A, and hence as H B P 2 ⊆ L, A = A. Hence A ∈ A S(P 2 ). 2 Proof. We prove this by showing that D 1 Proof.
• First we show that D 1 1 (I L P b ) ⊆ D 1 1 (I L P sm ). Assume B, H 1 , H 2 ∈ D 1 1 (I L P b ). Then there is a task T b = B, E + , E − such that H 1 ∈ I L P b (T b ) and H 2 / ∈ I L P b (T b ). Let T sm = B, { E + , E − } . H 1 ∈ I L P sm (T sm ) but H 2 / ∈ I L P sm (T sm ). Hence, B, H 1 , H 2 ∈ D 1 1 (I L P sm ). • Next we show that D 1 1 (I L P b ) ⊇ D 1 1 (I L P sm ). Assume B, H 1 , H 2 ∈ D 1 1 (I L P sm ). There must be a task T sm = B, { E + 1 , E − 1 , . . . , E + n , E − n } such that H 1 ∈ I L P sm (T sm ) and H 2 / ∈ I L P sm (T sm ). There must be at least one partial interpretation Proof.
• First we show that for any B, H 1 , H 2 ∈ D 1 1 (I L P c ), A S(B ∪ H 1 ) = ∅ and either A S(B ∪ H 2 ) = ∅ or E c (B ∪ H 1 ) E c (B ∪ H 2 ). Let B, H 1 , H 2 be an arbitrary element of D 1 1 (I L P c ). As H 1 ∈ I L P c (T c ), A S(B ∪ H 1 ) = ∅. Assume that E c (B ∪ H 1 ) ⊆ E c (B ∪ H 2 ). We must show that A S(B ∪ H 2 ) = ∅. As B, H 1 , H 2 ∈ D 1 1 (I L P c ), ∃T c = B, E + , E − such that H 1 ∈ I L P c (T c ) and H 2 / ∈ I L P c (T c ). As H 1 ∈ I L P c (T c ), ∀A ∈ A S(B ∪ H 1 ) : E + ⊆ A and E − ∩ A = ∅, hence the conjunction E + ∧ {not e − | e − ∈ E − } ∈ E c (B ∪ H 1 ); hence by our initial assumption that E c (B ∪ H 1 ) ⊆ E c (B ∪ H 2 ), the conjunction is also in E c (B ∪ H 2 ); hence, ∀A ∈ A S(B ∪ H 2 ), E + ⊆ A and E − ∩ A = ∅. But as H 2 / ∈ I L P c (T c ) this means that A S(B ∪ H 2 ) = ∅.   Let C be the ASP ch program append(C 1 , a 1 ) ∪ append(C 2 , a 2 ) ∪ {1{a 1 , a 2 }1.} (where a 1 and a 2 are new atoms and append(P , a) appends the atom a to the body of each rule in P ). Let C be the ASP ch program append(C 1 , a 1 ) ∪ append(C 2 , a 2 ) ∪ {1{a 1 , a 2 }1.} (where a 1 and a 2 are new atoms and append(P , a) appends the atom a to the body of each rule in P ). A S(B ∪ H 1 ∪ C ) = {A ∪ {a 1 } | A ∈ A S(B ∪ H 1 ∪ C 1 )} ∪ {A ∪ {a 2 } | A ∈ A S(B ∪ H 1 ∪ C 2 )}, and hence, t = A 1 ∪ {a 1 }, A 2 ∪ {a 2 }, op ∈ ord(B ∪ H 1 ∪ C ), but t / ∈ ord(B ∪ H 2 ∪ C ). Hence, ∃C ∈ ASP ch such that ord(B ∪ H 1 ∪ C ) = ord(B ∪ H 2 ∪ C ). Hence, in all cases, either B ∪ H 1 ≡ s B ∪ H 2 or ∃C ∈ ASP ch such that ord(B ∪ H 1 ∪ C ) = ord(B ∪ H 2 ∪ C ). There must be a program C such that A S(B ∪ H 1 ∪ C ) = A S(B ∪ H 2 ∪ C ). Case i: ∃A ∈ A S(B ∪ H 1 ∪ C ) such that A / ∈ A S(B ∪ H 2 ∪ C ). Let L be the set of atoms in the answer sets of B ∪ H 1 ∪ C and B ∪ H 2 ∪ C and let e A be the partial interpretation A, L\ A . Then B ∪ H 1 ∪ C has an answer set that extends e A , but B ∪ H 2 ∪ C does not, and hence, H 1 ∈ I L P context L O A S ( B, S M , { e A , C }, ∅, ∅, ∅ ) but H 2 is not. Case ii: ∃A ∈ A S(B ∪ H 2 ∪ C ) such that A / ∈ A S(B ∪ H 1 ∪ C ). Let L be the set of atoms in the answer sets of B ∪ H 1 ∪ C and B ∪ H 2 ∪ C and let e A be the partial interpretation A, L\ A . Then B ∪ H 2 ∪ C has an answer set that extends e A , but B ∪ H 1 ∪ C does not, and hence, H 1 ∈ I L P context L O A S ( B, S M , ∅, { e A , C }, ∅, ∅ ) but H 2 is not. Case 2: B ∪ H 1 ≡ s B ∪ H 2 but ∃C ∈ ASP ch such that ord(B ∪ H 1 ∪ C ) = ord(B ∪ H 2 ∪ C ) ∃A 1 , A 2 ∈ A S(B ∪ H 1 ∪ C ) (which is equal to A S(B ∪ H 2 ∪ C )) such that there is a binary operator op such that A 1 , A 2 , op ∈ ord(B ∪ H 1 ∪ C ) but A 1 , A 2 , op / ∈ ord(B ∪ H 2 ∪ C ). Let e 1 = A 1 , L\ A 1 and e 2 = A 2 , L\ A 2 (where L is the set of atoms in the answer sets of B ∪ H 1 ∪ C  Proof.
1. Consider any two I L P c tasks, T 1 H ∈ I L P c (T Hence, by Lemma 2, I L P c has closed one-to-many-distinguishability. Hence, by Lemma 2, I L P sm has closed one-to-many-distinguishability. 3. For any tasks T 1 Hence, by Lemma 2, I L P L A S has closed one-to-many-distinguishability. 4. For any tasks T 1  Hence, by Lemma 2, I L P context L O A S has closed one-to-many-distinguishability. 2