RBF-HS: Recursive Best-First Hitting Set Search

Various model-based diagnosis scenarios require the computation of most preferred fault explanations. Existing algorithms that are sound (i.e., output only actual fault explanations) and complete (i.e., can return all explanations), however, require exponential space to achieve this task. As a remedy, we propose two novel diagnostic search algorithms, called RBF-HS (Recursive Best-First Hitting Set Search) and HBF-HS (Hybrid Best-First Hitting Set Search), which build upon tried and tested techniques from the heuristic search domain. RBF-HS can enumerate an arbitrary predefined finite number of fault explanations in best-first order within linear space bounds, without sacrificing the desirable soundness or completeness properties. The idea of HBF-HS is to find a trade-off between runtime optimization and a restricted space consumption that does not exceed the available memory. In extensive experiments on real-world diagnosis cases we compared our approaches to Reiter's HS-Tree, a state-of-the-art method that gives the same theoretical guarantees and is as general(ly applicable) as the suggested algorithms. For the computation of minimum-cardinality fault explanations, we find that (1) RBF-HS reduces memory requirements substantially in most cases by up to several orders of magnitude, (2) in more than a third of the cases, both memory savings and runtime savings are achieved, and (3) given the runtime overhead is significant, using HBF-HS instead of RBF-HS reduces the runtime to values comparable with HS-Tree while keeping the used memory reasonably bounded. When computing most probable fault explanations, we observe that RBF-HS tends to trade memory savings more or less one-to-one for runtime overheads. Again, HBF-HS proves to be a reasonable remedy to cut down the runtime while complying with practicable memory bounds.


Introduction
Model-based diagnosis [4,5] is a popular, well-understood and domain-independent paradigm that has over the last decades found widespread adoption for troubleshooting systems as different as programs, circuits, physical devices, knowledge bases, spreadsheets, production plans, robots, vehicles, or aircrafts [6,7,8,9,10,11,12,13,14,15]. The principle behind model-based diagnosis is to model the system to be diagnosed by means of a logical knowledge representation language. Beside general knowledge about the system, this system description includes a characterization of the normal behavior of all system components relevant to the diagnosis task. Logical theorem provers can then be used to verify if the predicted system behavior-deduced from the system description under the assumption that runtime losses in most cases, where savings increase with increasing problem complexity, (2) saves both memory and runtime in more than a third of the cases, (3) scales to large numbers of computed leading diagnoses and to problems involving high-cardi-nality minimal diagnoses, and (4) in the rare cases where runtime overhead was significant, using HBF-HS instead of RBF-HS reduced the runtime to values comparable with HS-Tree while keeping the used memory reasonably bounded.
• Maximal probability first: When computing minimal diagnoses in descending order of probability, we find that RBF-HS tends to trade memory savings more or less one-to-one for runtime overheads (which has wellunderstood theoretical reasons that we discuss). Again, HBF-HS turns out to be a reasonable remedy to cut down the runtime while complying with practicable memory bounds.
The organization of the paper is as follows. We repeat fundamental concepts from the fields of model-based diagnosis and heuristic search in Sec. 2. The RBF-HS algorithm is introduced and discussed in Sec. 3, where we use a didactic approach which builds up RBF-HS from RBFS in a stepwise manner. In Sec. 4 we present and describe the HBF-HS algorithm. We comment on related works in Sec. 5. Finally, Sec. 6 presents our experiments and reviews the obtained results, whereas concluding remarks and pointers to future work are given in Sec. 7.

Preliminaries
First, we briefly characterize model-based diagnosis concepts used throughout this work, based on the framework of [9,34] which is (slightly) more general [35] than Reiter's theory [4]. The main reason for using this more general framework is its ability [35] to capture both classical model-based diagnosis problems involving, e.g., malfunctioning circuits or physical systems, and alternative problem types such as faulty knowledge bases which require the expression of negative measurements (things that must not be true for the diagnosed system) [34,36,37]. The quality assurance of knowledge-based systems is an important application domain of the algorithms presented in this work [9,20,28,38,39,94] and also the focus of our evaluations; however, the proposed algorithms are generally applicable to any model-based diagnosis problem. Second, we concisely review important notions from heuristic search and contrast classic path-finding with diagnosis search problems. This comparison should serve to facilitate the understanding of the development of the diagnosis computation procedure RBF-HS starting from the path-finding algorithm RBFS presented in Sec. 3.

Diagnosis Problem
We assume that the diagnosed system, consisting of a set of components {c 1 , . . . , c k }, is described by a finite set of logical sentences K ∪ B, where K (possibly faulty sentences) includes knowledge about the behavior of the system components, and B (correct background knowledge) comprises any additional available system knowledge and system observations. More precisely, there is a one-to-one relationship between axioms ax i ∈ K and components c i , where ax i describes (only) the normal behavior of c i (weak fault model [40]). E.g., if c i is an AND-gate in a circuit, then ax i := out(c i ) = and(in1(c i ), in2(c i )); B in this case might contain sentences stating, e.g., which components are connected by wires, or observed circuit outputs. The inclusion of a sentence ax i in K corresponds to the (implicit) assumption that c i is healthy. Evidence about the system behavior is captured by sets of positive (P ) and negative (N ) measurements [4,5,36]. Each measurement is a logical sentence; positive ones p ∈ P must be true and negative ones n ∈ N must not be true. The former can be, depending on the context, e.g., observations about the system, probes or required system properties. The latter model properties that must not hold for the system, e.g., if K is a knowledge base to be debugged, a negative test case might be "every bird can fly" (think of penguins). We call K, B, P , N a diagnosis problem instance (DPI).
Example 1 (Diagnosis Problem) Assume a DPI stated in propositional logic with K := {ax 1 : A → ¬B, ax 2 : A → B, ax 3 : A → ¬C, ax 4 : B → C, ax 5 : A → B ∨ C}. The "system" (the knowledge base K itself in this case) comprises five "components" c 1 , . . . , c 5 , and the "normal behavior" of c i is given by the respective sentence ax i ∈ K. No background knowledge (B = ∅) or positive measurements (P = ∅) are given from the start. But, there is one negative measurement (N = {¬A}), which stipulates that ¬A must not be an entailment of the correct system (knowledge base). Note, however, that K (i.e., the assumption that all "components" are normal) in this case does entail ¬A (e.g., due to the sentences ax 1 , ax 2 ) and thus some sentence ("component") in K must be faulty.

Diagnoses
Given that the system description along with the positive measurements (under the assumption K that all components are healthy) is inconsistent, i.e., K ∪ B ∪ P |= ⊥, or some negative measurement is entailed, i.e., K ∪ B ∪ P |= n for some n ∈ N , some assumption(s) about the normality of components, i.e., some sentences in K, must be retracted. We call such a set of sentences D ⊆ K a diagnosis for the DPI K, B, P , N iff (K \ D) ∪ B ∪ P |= x for all x ∈ N ∪ {⊥}. We say that a diagnosis D is a minimal diagnosis for dpi iff there is no diagnosis D ⊂ D for dpi . Moreover, we call a diagnosis D a minimum-cardinality diagnosis for dpi iff there is no diagnosis D with |D | < |D| for dpi . The set of minimal diagnoses is representative of all diagnoses under the weak fault model [41], i.e., the set of all diagnoses is equal to the set of all supersets of minimal diagnoses. Therefore, diagnosis approaches often restrict their focus to only minimal diagnoses. We furthermore denote by D * the (unknown) actual diagnosis which pinpoints the actually faulty axioms, i.e., all elements of D * are in fact faulty and all elements of K \ D * are in fact correct.  5 ] (we will always denote diagnoses by square brackets). For instance, D 1 is a diagnosis as (K \ D 1 ) ∪ B ∪ P = {ax 2 , ax 4 , ax 5 } is both consistent and does not entail the given negative measurement ¬A. That D 1 is a minimal diagnosis as well, can be seen by observing that (K \ D 1 ) ∪ B ∪ P |= ¬A for any D 1 ⊂ D 1 , i.e., the diagnosis property is violated after removing any element from D 1 .

Diagnosis Probability Model
Component and Diagnosis Probabilities. In case useful meta information is available that allows to assess the likeliness of failure for system components, the probability of diagnoses (of being the actual diagnosis) can be derived. Specifically, given a function pr that maps each sentence (system component) ax ∈ K to its failure probability 0 < pr (ax ) < 1, the probability pr (X) of a diagnosis (candidate) X ⊆ K under the common assumption of independent component failure is computed [5] as the probability that all sentences in X are faulty, and all others are correct, i.e., Properties of the Probability Function. We call pr strictly antimonotonic iff pr (X) > pr (Y ) whenever X ⊂ Y . Clearly, if pr is strictly antimonotonic, then each minimal diagnosis D has a higher probability pr (D) than any nonminimal diagnosis D ⊃ D. That is, assuming a list that includes all subsets of K sorted by pr in descending order, then iterating over this list implies that (i) diagnoses with higher probability are found earlier, and (ii) a non-minimal diagnosis can never be encountered before all minimal diagnoses that are subsets of it have been visited. Properties (i) and (ii) are material for search-based diagnosis computation methods, like well known existing ones [4,9,19,21,42] and those discussed in this work, which are based on the systematic exploration of (relevant parts of) the subset space of K, and which aim at finding all and only minimal diagnoses (cf. Sec. 2.1.2) in the order from high to low probability. Thus, such approaches usually rely on the strict antimonotonicity of pr . For a probability function pr to be strictly antimonotonic it is sufficient that pr (ax ) < 0.5 for all ax ∈ K. This can be easily seen from Eq. 1, where pr (X ) < pr (X) for X ⊃ X under this assumption since pr (X ) = pr (X) ax ∈X \X pr (ax )/(1 − pr (ax )) and each factor pr (ax )/(1 − pr (ax )) < 1 (see also [9,Lemma 4.14]). Note that diagnosis applications usually involve components which are a-priori much more likely to be normal than at fault, cf., e.g., [5,43,44,45,46]. Hence, strict antimonotonicity of pr will in most cases be satisfied by default. Moreover, an arbitrary function pr can be transformed to a strictly antimonotonic function pr by choosing a fixed c ∈ (0, 0.5) and by setting pr (ax ) := c · pr (ax ) for all ax ∈ K. Observe that this transformation does not affect the relative probabilities in that pr (ax i )/pr (ax j ) = k whenever pr (ax i )/pr (ax j ) = k, i.e., no information is lost in the sense that the mutual fault probability order and ratio between any two components will remain invariant.

Conflicts
Instrumental for diagnosis computation is the notion of a conflict [4,5]. A conflict is a set of healthiness assumptions for components c i that cannot all hold given the current knowledge about the system. More formally, C ⊆ K is a conflict for the DPI K, B, P , N iff C ∪ B ∪ P |= x for some x ∈ N ∪ {⊥}. We call a conflict C a minimal conflict for dpi iff there is no conflict C ⊂ C for dpi .
Example 4 (Conflicts) For our dpi from Example 1 there are four minimal conflicts, given by C 1 := ax 1 , ax 2 , C 2 := ax 2 , ax 3 , ax 4 , C 3 := ax 1 , ax 3 , ax 5 , and C 4 := ax 3 , ax 4 , ax 5 (we will always denote conflicts by angle brackets). For instance, C 4 , in CNF equal to (¬A ∨ ¬C) ∧ (¬B ∨ C) ∧ (¬A ∨ B ∨ C), is a conflict because adding the unit clause (A) to this CNF yields a contradiction, which is why the negative test case ¬A is an entailment of C 4 . The minimality of the conflict C 4 can be verified by rotationally removing from C 4 a single axiom at the time and controlling for each so obtained subset that this subset is consistent and does not entail ¬A.

Conflict Computation
Literature offers a variety of algorithms for conflict computation, e.g., [38,39,47,48,49,50,51,52,53,54,55]. Among those, we are in this work mainly interested in so-called black-box [56] techniques, such as QUICKXPLAIN [52,53] or PROGRESSION [54], which are independent of both the particular used logic and the particular used theorem prover. This independence is pivotal for the out-of-the-box applicability of diagnosis computation algorithms in domains where many different logics are adopted to solve problems of interest, e.g., in ontology-based intelligent applications, as studied in our evaluations (Sec. 6). Given a DPI dpi = K, B, P , N as input, one execution of such a black-box algorithm repeatedly calls an (arbitrary) reasoner that is sound and complete for consistency checks over the logic by which dpi is expressed, and finally returns one minimal conflict for dpi . None of the available black-box algorithms has a worst-case time complexity lower than O(|K|) consistency checks [54]. Since the performance of diagnosis computation methods depends largely on (i) the complexity of consistency checking for the used logic and (ii) on the number of consistency checks executed, and diagnostic algorithms have no influence on (i), it is important to minimize (ii) by keeping the number of conflict computations at a minimum.

Relationship between Conflicts and Diagnoses
Conflicts and diagnoses are closely related in terms of a hitting set and a duality property [4]: Hitting Set Property Let dpi = K, B, P , N be a DPI. Then D is a (minimal) diagnosis for dpi iff D is a (minimal) hitting set of all minimal conflicts for dpi . (X is a hitting set of a collection of sets S iff X ⊆ Si∈S S i and X ∩ S i = ∅ for all S i ∈ S; X is minimal iff there is no other hitting set X of S with X ⊂ X) Duality Property Given a DPI dpi = K, B, P , N , X is a diagnosis (or: contains a minimal diagnosis) for dpi iff K \ X is not a conflict (or: does not contain a minimal conflict) for dpi .
Example 5 (Conflicts vs. Diagnoses) Reconsider the DPI from Example 1. Regarding the Hitting Set Property, e.g., the minimal diagnosis D 1 (see Example 2) is a hitting set of all minimal conflict sets because each conflict (see Example 4) contains ax 1 or ax 3 . It is moreover a minimal hitting set since the elimination of ax 1 implies an empty intersection with, e.g., C 1 , and the elimination of ax 3 means that, e.g., C 4 is no longer hit. Thus, given the collection C of all minimal conflicts, we can determine all the minimal diagnoses as the collection of minimal hitting sets of C.
Concerning the Duality Property, e.g., D 4 is a diagnosis as K \ D 4 = {ax 1 , ax 3 , ax 4 } is not a conflict (this can be easily verified by checking that no minimal conflict in Example 4 is a subset of this set), or, equivalently, 4 } is both consistent and does not entail ¬A. Inversely, e.g., C 2 is a conflict since K \ C 2 = {ax 1 , ax 5 } is not a diagnosis (again, this can be easily seen by verifying that no minimal diagnosis in Example 2 is a subset of this set), or, equivalently, (K \ (K \ C 2 )) ∪ B ∪ P = C 2 ∪ B ∪ P = {ax 2 , ax 3 , ax 4 } entails the negative measurement ¬A.

Path-Finding Problem
A path-finding problem instance (PPI) [57] can be characterized as a tuple S 0 , succ, goal, g where S 0 is a distinguished initial state, succ is a successor function that returns all directly reachable neighbor states of any given state, goal is a Boolean goal test that returns true iff a given state is a goal state, and g is a cost function that assigns a real-valued cost to any given sequence of states (called path). A solution to a PPI is a path from the initial state to some goal state, and the objective is often to find an optimal solution, i.e., one with the least costs among all solutions.

Path-Finding Search Algorithms
Basic Notions and Principle. Algorithms that tackle PPIs usually produce a systematic search tree. The root node n 0 of a search tree corresponds to the state S 0 , and from a node n corresponding to state S there are |succ(S)| emanating edges to other nodes, each of which represents one of the states in succ(S). The creation of child nodes from a current leaf node n by means of succ is called expansion of n. Inversely, the creation of a child node n when its parent is expanded is called generation of n. Importantly, each generated node n stores a pointer to its parent to allow for the reconstruction of the path to n in case it is a goal. Note that one and the same state can occur multiple times in a search tree, depending on the used algorithm. In general, different ways of constructing the search tree-i.e., in which order nodes are selected for expansion, and how much about the tree construction "history" (e.g., already expanded nodes) is stored -yield a variety of search methods with different properties regarding completeness (will a solution be found whenever one exists?), best-first property or optimality (will the best solution be found first?), as well as time and space complexity (how much time and memory will the algorithm need to find a solution?). Search algorithms that solve PPIs usually stop after the first path to a goal state is found.
(Un)Informed Search. If problem-specific information beyond the mere PPI is (not) available to an algorithm, the problem is called (un)informed. If applicable, such problem-specific information is normally given as a heuristic function h which assigns to each node n a non-negative real value as an estimation of the cost of the best path from n's state to some goal state. This heuristic value h(n) can then be combined with the costs g(n) already incurred to reach n, in terms of f (n) := g(n) + h(n), which estimates the overall cost of the path from the start to some goal state via node n. The cost function f is called monotonic iff f (n) ≤ f (n ) for all nodes n, n where n is a successor of n. Some search algorithms require a monotonic function in order to guarantee optimality of the search.
Example 6 (Search Algorithms) Important uninformed search strategies are depth-first, breadth-first, uniform-cost and iterative deepening search; popular informed search methods are A* and IDA* [57]. Each of them maintains a queue of nodes that is sorted in a specific way, where the first node of this queue is chosen for expansion at each step. Each expanded node is deleted from the queue and its generated successors are added to it in a way the defined sorting is preserved. Whenever a node is expanded whose state satisfies the goal test, the respective path is returned and the search terminates. Now, depth-first search maintains a LIFO queue, breadth-first search a FIFO queue, and uniform-cost search and A*, respectively, a queue sorted in ascending order by g and f . Iterative deepening and IDA* run in iterations, executing one depth-first search per iteration. At this, each iteration uses an incremented depth-limit l = 1, 2, . . . (iterative deepening) or an incremented cost-limit equal to the best known node from the last iteration that has not been expanded (IDA*). A depth-limit (cost-limit) k means that no successors are generated for any node at tree depth k (with cost > k).

Diagnosis Search Algorithms
Principle. Given a DPI K, B, P , N , a diagnosis search algorithm is characterized by the definition of a node processing procedure. The latter is divided into two parts, node labeling and node assignment. A generic diagnosis search then works as follows: • Start with a queue including only the root node ∅.
• While the queue is non-empty and not enough minimal diagnoses have been found, poll the first node n from the queue and process it. That is, compute a label L for n, and assign n (or potentially its successors) to an appropriate node class (e.g., solutions, non-solutions) based on L. Different specific diagnosis search algorithms are obtained by (re)defining (i) the sorting of the queue and (ii) the node processing procedure.
A Prominent Example. The next example explains the workings of Reiter's seminal HS-Tree algorithm [4] (and of a uniform-cost variant thereof [9,Sec. 4.6]) based on the above generic characterization. HS-Tree is a widely used diagnosis computation technique, which is (still) the method of choice in domains where its distinguished combination of the features soundness (computation of only minimal diagnoses), completeness (generation of all minimal diagnoses), the best-first property (enumeration of diagnoses in a preference order), as well as the independence of the used logic and reasoning procedure, is vital. One such domain is the quality assurance of knowledge-based applications based on ontologies, which will also be the focus of our evaluations.
Example 7 (Reiter's HS-Tree) The sorting of the queue as well as the node labeling and assignment are implemented as follows by (uniform-cost) HS-Tree: Sorting of the queue: Depending on the desired preference criterion to be optimized, either a FIFO-queue is used (breadth-first search; minimum-cardinality diagnoses first) or the queue is kept sorted in descending order of pr (n), cf. Eq. 1 (uniform-cost search; most probable diagnoses first). Node labeling: The following checks are executed in the given order, and a label is returned as soon as the first check is positive: (non-minimality) Is n a superset of some already found diagnosis? If yes, return L = closed . (duplicate) Is there another node equal to n in the queue? If yes, return L = closed . (reuse label) Is there a conflict C among the already used node labels such that n ∩ C = ∅? If yes, return L = C. (compute label) Compute a minimal conflict for K \ n, B, P , N . If some set C is computed, return L = C. If 'no conflict' is output, return L = valid .
Node assignment: If n's computed label L = ax 1 , . . . , ax k (a minimal conflict), then k new successor nodes n 1 , . . . , n k are generated and added to the queue, where n i = n ∪ {ax i }. L = valid , then n is a solution and added to the collection of minimal diagnoses. L = closed , then n is irrelevant or a proven non-solution and not added to any collection, i.e., it is discarded.
Note, apart from guiding the node assignment, there is no purpose of a node's label L. Thus, in the queue, only nodes are stored, but not the labels along their paths. In a separate collection, already used node labels are recorded due to the reuse label check.

Remarks:
1. In order for this algorithm to be sound, complete and best-first • the function for conflict computation used in (compute label) must be sound (if a set is returned, it is a conflict), complete (a conflict is returned whenever there is one), and must return only minimal 1 conflicts, and • (for uniform-cost search) pr needs to be strictly antimonotonic [9,Sec. 4.6]. 2 1 If the minimality of computed conflicts is not guaranteed, HS-Tree becomes generally incomplete, and a directed acyclic graph version must be used to re-establish completeness, cf. [19]. 2 If pr violates this criterion, then either the transformation of pr described in Sec. 2.1.3 can be applied, or breadth-first HS-Tree can be used to first compute all (or a feasible set of) minimal diagnoses which can then be ordered by pr in a post-processing step before being returned.
2. Breadth-first search can be simulated by a uniform-cost search using pr (ax ) := c for all ax ∈ K with any fixed c ∈ (0, 0.5) (cf. Eq. 1 and Sec. 2.1.3). That is, the minimum-cardinality-first computation of diagnoses is equivalent to a most-probable-first computation given small uniform component probabilities.

Diagnosis Search vs. Path-Finding
Since the main aim of this work is to leverage ideas from classic path-finding search to derive a novel diagnosis computation approach, we next identify the main properties that distinguish diagnosis search (Sec. 2.2.3) from pathfinding (Sec. 2.2.2) algorithms: (I) PPI-formulation does not suffice as an input: Although the problem of searching for minimal diagnoses for a DPI can be stated as a PPI-where S 0 = ∅; succ gets a labeled node n with label L and returns the successors of n if L is a set, and ∅ else; goal(n) returns true iff n is a diagnosis; and g(n) := pr (n) as per Eq. 1-this characterization is not a sufficient basis to run a diagnosis search. What is missing is the definition of a node labeling and a node assignment strategy (see Sec. 2.2.3). Importantly, these missing building blocks decide over the soundness, completeness and best-first property of the diagnosis search. By contrast, for path-finding, the PPI includes all relevant information for the problem to be directly solved by an off-the-shelf path-finding algorithm (cf. Example 6).
(II) States, nodes and paths coincide: In diagnosis search, the state of a search tree node n corresponds to n itself (i.e., to a set of ax i -elements, cf. Example 7). So, no distinction between states and nodes is made. When the label ax i is assumed to be assigned to the edge from any node n to its child node n ∪ {ax i } [4], nodes (and states) can be seen as representatives of the (edge labels along the) paths in the search tree.
(III) Solutions are sets, not paths: Solutions to a diagnosis search problem are nodes (sets of edge labels along a tree path) which are minimal diagnoses for the given DPI. Unlike in path-finding problems, the order of labels along the path does not matter.
(IV) Multiple solutions are sought: In diagnosis search, it is usually of interest to find multiple solutions, i.e., after the first solution is determined, the search must be (correctly) continuable until sufficient solutions are found.
(V) Search for maximal-cost solutions: In diagnosis search, one wants to calculate the maximal-cost (i.e., most probable, cf. Remark 2 in Example 7) solutions whereas path-finding is usually about finding a minimal-cost solution.
(VI) Different conditions on cost function: Like for path-finding, the cost function used by diagnosis searches must fulfill certain criteria in order for desired properties to be guaranteed. While (informed) path-finding algorithms usually need a monotonic function f (see Sec. 2.2.2) for optimality, diagnostic searches as characterized in Sec. 2.2.3 usually require the (probability) function pr used to sort the queue to be strictly antimonotonic (cf. Sec. 2.1.3) in order to be sound, complete and best-first.
(VII) Soundness is not trivial: Whereas in path-finding any path whose end state satisfies the goal test is a valid solution to the PPI, in diagnosis search an appropriate combination of suitable goal test, node labeling, node assignment and cost function is necessary to ensure soundness, i.e., that each found solution is indeed a minimal diagnosis for the given DPI. Korf's RBFS algorithm [33,58] provides the inspiration for RBF-HS. Historically, the main motivation that led to the engineering of RBFS was the problem that best-first searches by that time required exponential space. The idea behind RBFS is to trade (more) time for (much less) space. To this end, RBFS implements a scheme that can be synopsized as • (complete and best-first): always expand current globally-best node while remembering current globallysecond-best node, n0 ← MAKENODE(S0) MAKENODE creates a tree node for the given state if goal(STATE(n)) then STATE returns the state associated with the given node 8: solution ← GETPATHTO(n) GETPATHTO returns sequence of states from root node to given node 9: exit procedure 10: for Si ∈ succ(STATE(n)) do

27:
Child_Nodes ← INSERTSORTEDBYF(n1, Child_Nodes) insert n1 s.t. sorting by F -value is preserved 28: n1 ← GETANDDELETEFIRSTNODE(Child_Nodes) n1 . . . best child 29: n2 ← GETFIRSTNODE(Child_Nodes) n2 . . . 2nd-best child 30: return F (n1) • (undo and forget to keep space linear): backtrack and explore second-best node if none of the child nodes of best node is better than second-best, • (remember utility of forgotten subtrees to keep the search progressing): before deleting a subtree in the course of backtracking, store cost of subtree's best node, • (restore utility at regeneration to avoid redundancy): whenever a subtree is reexplored, use this stored cost value to update node costs in the subtree.
As a result, RBFS is complete and best-first and works within linear-space bounds.

RBFS: Briefly Explained
RBFS is presented by Alg. 1. In a nutshell, it works as follows [57]. Initial node costs are the f -values computed from g and h, and backed-up node costs are named F -values. Initially, all backed-up node costs are the nodes' initial costs. Starting from the root node corresponding to S 0 , the principle is to follow the best (lowest F ) path downwards (recursive RBFS'-calls, line 26). At each downward step, the variable bound is used to keep track of the (backedup) cost of the best alternative path available from any ancestor of the current node (note, this is the globally best alternative path). If the current node exceeds bound , the recursion unwinds back to the alternative path. As the recursion unwinds, the cost of each node along the path is replaced with a (new) backed-up cost value, which is the best (backed-up) cost of its child nodes (cf. line 30). In this way, RBFS always remembers the backed-up cost of the best leaf in the forgotten subtree and can therefore decide whether it is worth reexpanding the subtree at some later time (this decision is made through the condition of the while-loop). When expanding a subtree rooted at node n, which has already been expanded and forgotten before (condition in line 16 is true) and whose initial cost (f -value) appears more promising than the algorithm knows from a previous iteration and the stored backed-up cost F (n) it actually is, the F -value of child nodes n i of n is not tediously learned again by RBFS, but directly updated by means of n's F -value (see line 17). If some node is recognized to correspond to a goal state, the path to this node is returned and RBFS' terminates (lines 7-9).

From RBFS to RBF-HS: Necessary Modifications
In order to transform a path-finding into a diagnosis search algorithm, we have to make adequate amendments to the former with due regard to all differences between both paradigms discussed in Bullets (I)-(VII) in Sec. 2.2.4. Next, we list and explain the main modifications necessary to derive RBF-HS from RBFS (line numbers given refer to the respective locations of the changes in the RBF-HS algorithm, i.e., in Alg. 2). (Mod1) A node labeling (line 12 and LABEL procedure) and a node assignment (lines [13][14][15][16][17][18][19] strategy have to be added. Importantly, the goal test (check, whether a node is a minimal diagnosis, lines 39, 42 and 44) as well as the preparation of nodes for expansion (i.e., the provision of a minimal conflict, line 43 or 49) is part of these two code blocks. Justification: Bullet (I).
(Mod2) Differentiation between nodes, states and paths is no longer necessary, which is why the functions MAKEN-ODE (generates node from state), STATE (extracts state from node), and GETPATHTO (returns path from root to node) can be omitted. This becomes evident in  Note, it is essential to return −∞ (i.e., the worst possible cost) as the backed-up F -cost of the solution node n in order to allow the search to continue in a well-defined and correct way. More precisely, this will cause the F -value of n's best sibling node to be propagated upwards. As a consequence, the backed-up value for any subtree including n will be the so-far found best cost over all nodes in this subtree except for n. In fact, any backed-up value F * := F (n) > −∞ would prevent RBF-HS' from terminating and thus would make it incomplete (intuitively, at some point all other nodes would have a value lower than F * and the algorithm would loop forever exploring n again and again). Justification: Bullet (IV).  (Mod6) To achieve soundness (only minimal diagnoses are added to the solutions D in line 16), the following provisions are made. Successor nodes Child_Nodes are always sorted by a strictly antimonotonic function (line 28), which is why minimal diagnoses will be found prior to non-minimal ones. Moreover, the LABEL function is designed such that only nodes n can be labeled valid for which no already-found diagnosis exists which is a subset of n (goal test, part 1, line 39), and which is evidentially a diagnosis (goal test, part 2, line 45). Finally, the node assignment ensures that only nodes labeled valid can be assigned to the solution list D (line 16). Justification: Bullet (VII).

Inputs and Output
RBF-HS is depicted by Alg. 2. It accepts the following arguments: a DPI dpi = K, B, P , N , a probability measure pr (see Sec. 2.1.3), and a stipulated number ld of minimal diagnoses ("leading diagnoses") to be returned. It outputs the ld (if existent) minimal diagnoses of maximal probability wrt. pr for dpi . Note, the computation of the ld diagnoses of minimal cardinality (instead of maximal probability) can be effectuated by specifying pr accordingly (cf. Remark 2 in Example 7).

33:
Child_Nodes ← INSERTSORTEDBYF(n1, Child_Nodes) 34: n1 ← GETANDDELETEFIRSTNODE(Child_Nodes) n1 . . . best child 35: n2 ← GETFIRSTNODE(Child_Nodes) n2 . . . 2nd-best child 36: return F (n1) 37: procedure LABEL(n) 38: for ni ∈ D do 39: if n ⊇ ni then goal test, part 1 (is n non-minimal?) 40: return closed n is a non-minimal diagnosis 41: for C ∈ C do 42: if C ∩ n = ∅ then cheap non-goal test (is n not a diagnosis?) 43: return C n is not a diagnosis; reuse C to label n 44: if L = 'no conflict' then goal test, part 2 (is n diagnosis?) 46: return valid n is a minimal diagnosis 47: else n is not a diagnosis 48: for e ∈ C do 53: Succ_Nodes ← ADD(n ∪ {e}, Succ_Nodes) 54: return Succ_Nodes for dpi , which implies that K \ ∅ = K is not a diagnosis for dpi by the Duality Property (cf. Sec. 2.1.6), which in turn means that no diagnosis can exist since every diagnosis is a subset of K and all supersets of diagnoses are diagnoses as well (weak fault model, cf. Sec. 2.1.1). The latter case holds iff there is no conflict at all for dpi , i.e., in particular, K is not a conflict, which is why K \ K = ∅ is a diagnosis by the Duality Property, and consequently no other minimal diagnosis can exist. If none of these trivial cases is given, the call of FINDMINCONFLICT (line 3) returns a non-empty minimal conflict C (line 8 is reached), which entails by the Hitting Set Property (cf. Sec. 2.1.6) that a non-empty (minimal) diagnosis will exist. For later reuse, C is added to the computed conflicts C, and then the recursive sub-procedure RBF-HS' is called (line 9). The arguments passed to RBF-HS' are the root node n 0 := ∅, its f -value, and the initial bound set to −∞.

Recursion: Principle
The basic principle of the recursion (RBF-HS' procedure) is very similar as sketched above for RBFS. That is, always explore the open node with best F -value in a depth-first manner, until the best node has worse costs than the globally best alternative node (whose cost is always stored by bound ). Then backtrack and propagate the best F -value among all child nodes up at each backtracking step. Based on their latest known F -value, the child nodes at each tree level are resorted in best-first order of F -value. When re-exploring an already explored but later forgotten subtree, the cost of nodes in this subtree is, if necessary, updated through a cost inheritance from parent to children. In this vein, a relearning of already learned backed-up cost-values, and thus repeated and redundant work, is avoided. Exploring a node in RBF-HS means labeling this node and assigning it to an appropriate collection of nodes based on the computed label (cf. Sec. 2.2.3 and Example 7). The recursion is executed until D comprises the desired number ld of minimal diagnoses or the hitting set tree has been explored in its entirety.

Recursion: Structure
To get a better impression of RBF-HS' on an abstract, structural level, it is instructive to look at RBF-HS' as a succession of the following blocks. An algorithm walkthrough with detailed descriptions of all these blocks is given in Appendix A.
The node labeling (function LABEL) can be further split into the following blocks: Note, the LABEL function of RBF-HS' is equal to the one used in Reiter's HS-Tree (cf. Example 7), except that the duplicate check is obsolete in RBF-HS'. The reason for this is that there cannot ever be any duplicate (i.e., set-equal) nodes in memory at the same time during the execution of RBF-HS. This holds because for all potential duplicates n i , n j , we must have |n i | = |n j |, but equal-sized nodes must be siblings (depth-first tree exploration) which is why n i and n j must contain |n i | − 1 equal elements (same path up to the parent of n i , n j ) and one necessarily different element (label of edge pointing from parent to n i and n j , respectively).

RBF-HS Exemplification
The following example illustrates the workings of RBF-HS. In addition, let all minimal conflicts for dpi be ax 1 , ax 2 , ax 5 , ax 2 , ax 4 , ax 6 , ax 1 , ax 3 , ax 4 , and ax 1 , ax 5 , ax 6 , ax 7 . Assume we want to use RBF-HS to find the ld := 4 most probable diagnoses for dpi (e.g., because we surmise the actual diagnosis to be amongst the most likely candidates). To this end, dpi , pr and ld are passed to RBF-HS (Alg. 2) as input arguments. ? [1,6], [5,4]] In the figures, we use the following notation. Axioms ax i are simply referred to by i (in node and edge labels). Numbers k indicate the chronological node labeling (expansion) order. Recall that nodes in Alg. 2 are sets of (integer) edge labels along tree branches. E.g., node 9 in Fig. 1 corresponds to the node n = {ax 2 , ax 4 }, i.e., to the assumption that components c 2 , c 4 are at fault whereas all others are working properly. The probability pr (n) (i.e., the original f -value) of a node n is shown by the black number from the interval (0, 1) that labels the edge pointing to n, e.g., the cost of node 9 is 0.18. We tag minimal conflicts . . . that label internal nodes by C if they are freshly computed (expensive; FINDMINCONFLICT call, line 44), and by R if they result from a reuse of some already computed and stored (see list C in Alg. 2) minimal conflict (cheap; reuse label check; lines [41][42][43]. Leaf nodes are labeled as follows: "?" is used .11 12 1, 3, 4 R .17  for open (i.e., generated, but not yet labeled) nodes; (Di) for a node labeled valid , i.e., a minimal diagnosis named D i , that is not yet stored in D; × (Expl) for a node labeled closed , i.e., one that constitutes a non-minimal diagnosis or a diagnosis that has already been found and stored in D; Expl is an explanation for the non-minimality in the former, and for the redundancy of node in the latter case, i.e., Expl names a minimal diagnosis in D that is a proper subset of the node, or it names a diagnosis in D which is equal to node, respectively. Whenever a new diagnosis is added to D (line 16), this is displayed in the figures by a box that shows the current state of D. For each expanded node, the value of the bound variable relevant to the subtree rooted at this node is denoted by a red-colored value above the node. By green color, we show the backed-up F -value returned in the course of each backtracking step (i.e., the best known probability of any node in the respective subtree). Further, f -values that have been updated by backed-up F -values are signalized by green-colored edge labels, see, e.g., in Fig. 1, the left edge emanating from the root node of the tree has been reduced from 0.41 (f -value) to 0.09 (F -value) after the first backtrack. Finally, F -values of parents inherited by child nodes (line 23) are indicated by brown color, see the edge between node 14 and node 15 in Fig. 2.
Discussion and Remarks. Initially, RBF-HS starts with an empty root node, labels it with the minimal conflict 1, 2, 5 at step 1 , generates the three corresponding child nodes {1}, {2}, {5} shown by the edges originating from the root node, and recursively processes the best child node (left edge, f -value 0.41) at step 2 . The bound for the subtree rooted at node 2 corresponds to the best edge label (F -value) of any open node other than node 2 , which is 0.25 in this case. In a similar manner, the next recursive step is taken in that the best child node of node 2 with an F -value not less than bound = 0.25 is processed. This leads to the labeling of node {1, 4} with F -value 0.28 ≥ bound at step 3 , which reveals the first (proven most probable) diagnosis D 1 := [1,4] with pr (D 1 ) = 0.28, which is added to the solution list D. Note that −∞ is at the same time returned for node 3 (which indicates that the node has already been explored and ensures that the next best node has now the highest F -value). After the next node has been processed and the second-most-probable minimal diagnosis D 2 := [1,6] with pr (D 2 ) = 0.27 has been detected, the by now best remaining child node of node 2 has an F -value of 0.09 (leftmost node). This value, however, is lower than bound . Due to the best-first property of RBF-HS, this node is not explored right away because bound suggests that there are more promising unexplored nodes elsewhere in the tree which have to be checked first. To keep the memory requirements linear, the current subtree rooted at node 2 is discarded before a new one is examined. Hence, the first backtrack is executed. This involves the storage of the best (currently known) F -value of any node in the subtree as the backed-up F -value of node 2 . This newly "learned" F -value is signalized by the green number (0.09) that by now labels the left edge emanating from the root. Analogously, RBF-HS proceeds for the other nodes, whereas the used bound value is always the best value among the bound value of the parent and all sibling's F -values. Please also observe the F -value inheritance that takes place when node {2, 4} is generated for the third time (node 15 , Fig. 2). The reason for this is that the original f -value of {2, 4} is 0.18 (see top of Fig. 1), but the meanwhile "learned" F -value of its parent {2} is 0.11 and thus smaller. This means that {2, 4} must have already been explored and the de-facto probability of any (minimal) diagnosis in the subtree rooted at {2, 4} must be less than or equal to 0.11.
Output. Finally, RBF-HS immediately terminates as soon as the ld -th (in this case: fourth) minimal diagnosis D 4 is located and added to D. The list D of minimal diagnoses arranged in descending order of probability pr is returned.

RBF-HS Complexity
The next theorem states the complexity of RBF-HS, derived in Appendix B.
Theorem 1 (Complexity of RBF-HS). . Let dpi = K, B, P , N be an arbitrary DPI, ld the finite positive natural number of diagnoses to be computed, n the number of nodes expanded by HS-Tree (without the duplicate criterion) for dpi and ld , t CC the worst-case time of a consistency check for dpi , minC the set of all minimal conflicts for dpi , and C max the conflict of maximal size for dpi . Further, let TPT := t CC |K|(|minC| + |ld |) (theorem proving time). Finally, assume |minC| is in O(1), i.e., independent of the size of dpi . Then:

RBF-HS Correctness
The next theorem shows that RBF-HS is correct. A proof is given in Appendix C.
Theorem 2 (Correctness of RBF-HS). Let FINDMINCONFLICT be a sound and complete method for conflict computation, i.e., given a DPI, it outputs a minimal conflict for this DPI if a minimal conflict exists, and 'no conflict' otherwise. RBF-HS is sound, complete and best-first, i.e., it computes all and only minimal diagnoses in descending order of probability as per the strictly antimonotonic probability measure pr .

RBF-HS: Potential Impact and Synergies with Other Techniques
Beside RBF-HS's direct usage • as a space-efficient alternative to (exponential-space) best-first diagnosis search algorithms such as HS-Tree [4], HST tree [17], DynamicHS [42], GDE [5], or StaticHS [21], or • as a best-first alternative to sound and complete linear-space any-first searches like Inv-HS-Tree [20], or • as a complete alternative to best-first, but incomplete algorithms like CDA* [31] or STACCATO [25], several uses of RBF-HS combined with existing techniques can be conceived of. We briefly sketch some of them next, before we discuss a hybrid method that combines HS-Tree [4] and RBF-HS in more detail in the next section: (A) Informed HS-Tree: The idea is to run RBF-HS as a preprocessor in order to provide more informed node probabilities, and to subsequently adopt HS-Tree using these "learned" probabilities as f -values. To this end, e.g., RBF-HS could be executed with a fixed time limit and modified to store backed-up F -values of (a subset of) the visited nodes-not only of the ones that are kept in memory after backtracking steps. Like a heuristic for classic A*, this additional "lookahead" information might lead to the finding of the preferred diagnoses by expanding significantly fewer nodes.
(B) RBF-HS as a Decision Heuristic: The rationale is to run RBF-HS for a certain limited time and to afterwards take the "learned" F -value(s) as an estimate of the hardness or some other relevant property of the diagnosis problem. Depending on how the node costs are set (cf. Sec. 3.2.1), the backed-up F -value can provide an estimation of the least depth of the search tree, i.e., of the least size of minimum-cardinality diagnoses, or an upper bound estimate of the probability of the minimal diagnoses. Such an estimate can then be used, e.g.: • To decide which algorithm to use, e.g., whether to drop some nice-to-have requirement(s) to the adopted diagnosis computation algorithm (such as completeness or the best-first property) in order to keep performance reasonable (cf., e.g., [20]). • For an informed selection of a limit for depth-limited or cost-limited search [57] (cf. Example 6). When using a suitable limit, the latter can be powerful linear-space strategies to find the preferred diagnoses, and might be substantially faster than iterative deepening, IDA* (hitting set) searches and RBF-HS.
(C) RBF-HS as a Plug-In: Given a diagnosis search method that uses a hitting set generation routine as a black-box, such as SDE [59], RBF-HS can be used as a plug-in, e.g., in case memory issues are faced when using other best-first algorithms.

Hybrid Best-First Hitting Set Search (HBF-HS)
In this section, we propose a hybrid technique called HBF-HS, shown by Alg. 3, that aims at combining the advantages of the more space-attractive RBF-HS with those of the more time-attractive HS-Tree.

HBF-HS Algorithm Description
The goal of HBF-HS is to allow for an as fast as possible diagnosis search while preserving soundness, completeness and best-firstness also in cases where state-of-the-art searches boasting these three properties (e.g., HS-Tree) run out of memory. Given a so-called switch criterion stop HS and otherwise the same inputs as RBF-HS, i.e., a DPI dpi , a probability measure pr , and a number ld of leading diagnoses to be computed, the principle of HBF-HS is as follows: Initially (line 2), execute standard HS-Tree, as described in Example 7. If the switch criterion stop HS (e.g., a maximal amount or fraction of memory consumed) is met, then the algorithm checks if sufficient (at least ld ) or all (indicated by an empty queue) minimal diagnoses have already been computed by HS-Tree (line 3). If so, the collection of minimal diagnoses is returned (line 4). Otherwise, a switch to RBF-HS is prompted. The recursive procedure RBF-HS' of the latter then continues the search (line 8) while only consuming a linear amount of additional memory. In this vein, HS-Tree can utilize as much memory as it needs while executing (focus on time optimization), and, before the available memory is depleted, RBF-HS takes over (focus on space optimization) so that the problem remains solvable.
The transfer of control between HS-Tree and RBF-HS' is rather straightforward while guaranteeing the retention of soundness, completeness and best-first properties. The idea is to extract the relevant information from the search tree produced by HS-Tree and use it to set up a new search tree, on which RBF-HS' can start operating. Specifically, after HS-Tree is stopped, the switch process (lines 5-8) is as follows: (S1) Create a virtual root node n 0 (line 5); set the bound of n 0 to −∞ (line 8). As the f -value of the root node is irrelevant to RBF-HS and thus arbitrary, set f (n 0 ) := 0. for e ∈ C do 15: if ISSET(e) then n = virtual root node ⇒ C = Child_Nodes root , each element of which is a set 16: Succ_Nodes ← ADD(e, Succ_Nodes) 17: else n = virtual root node ⇒ C is a min conflict, each element of which is not a set 18: Succ_Nodes ← ADD(n ∪ {e}, Succ_Nodes) 19: return Succ_Nodes

HBF-HS Exemplification
The following example illustrates the workings of HBF-HS.
Example 9 (HBF-HS) Let us reconsider the DPI introduced in Example 8 and have a look how HBF-HS would proceed for it. Assume the switch criterion stop HS is defined as "ten generated nodes". Specifically, this means: Execute HS-Tree until ten nodes are generated, then execute steps (S1)-(S3), and finally run RBF-HS'. Fig. 3 shows at the top the end state of HS-Tree before the switch is performed, and at the bottom the state of the transformed tree on which RBF-HS' will begin its operations. Observe the following things: • At the time the switch takes place, ten nodes have been generated and seven nodes are currently maintained by HS-Tree, encompassing i.a. five open nodes ("?") stored in the queue Q HS . Note, two of these nodes, the leftmost and fourth-leftmost one, are equal (i.e., the path labels {1, 2} and {2, 1} coincide). Hence, one of them is a duplicate and does not need to be further considered (recall that diagnoses are sets of edge labels). Now, Step (S1) of the switch process prompts the construction of a new tree through the generation of a virtual root node with bound (red color) set to −∞.
Step (S2) then effectuates the connection of this root node by one edge 1 Fig. 3. Observe that the labels of the edges emanating from the root node are now sets of elements from K. Still, all labels for other edges non-linked to the root node are singletons, just as in plain RBF-HS (cf. Example 8). Note, in RBF-HS we do not use set notation for edge labels simply because all these labels are single elements from K (cf. Figs. 1  and 2).
• Until the switch, two minimal diagnoses have already been located by HS-Tree (nodes 3 and 4 ; collection D HS ), and three minimal conflicts have been computed (node labels 1 , 2 and 5 ; collection C HS ). These are copied to the respective collections D and C maintained by RBF-HS in step (S3), as depicted on the right in the bottom part of Fig. 3.
• The execution of RBF-HS' works exactly as discussed in Example 8, with the difference that it starts with the partial hitting set tree displayed at the bottom of Fig. 3, where we have one root node, which has four elements in Child_Nodes. That is, the first explored node would be the rightmost one, {5}, with the maximal F -value 0.25 among Child_Nodes, and the bound for the processing of {5} would be 0.18, the second-best F -value (of node {2, 4}) among Child_Nodes. Intuitively, the RBF-HS' execution in the course of HBF-HS can be regarded as a warm-start version of RBF-HS with some conflicts and open nodes, and potentially also some diagnoses, provided from the outset.

HBF-HS Complexity
The complexity of HBF-HS depends largely on the used switch criterion stop HS , i.e., on how long HS-Tree runs until RBF-HS' continues. Unsurprisingly, the worst-case complexity of HBF-HS corresponds to the worse complexity among RBF-HS and HS-Tree for both time and space. In other words, the worst-case time complexity of HBF-HS equals the one of RBF-HS (if the switch takes place immediately and HS-Tree does not run at all), whereas its space complexity coincides with the one of HS-Tree (if the switch does not takes place until HS-Tree finishes executing). More formally: Theorem 3 (Complexity of HBF-HS). Let T be the worst-case time complexity of RBF-HS, and S the worst-case space complexity of HS-Tree. Then, HBF-HS has a worst-case time complexity of T and a worst-case space complexity of S.
Of greater practical interest than these extreme cases are more reasonable settings of stop HS . Since the bottleneck of HS-Tree is its (exponential) space complexity, and since it tends to exhibit better runtimes than RBF-HS (not only in the worst case, but also on average, as well shall see in Sec. 6), the (only) appropriate strategy seems to be to condition the switch criterion on a measure of space consumed by HS-Tree rather than time. Intuitively, the longer it is affordable (wrt. memory) to run HS-Tree, the faster termination we might expect from HBF-HS, since its time complexity in this case will tend more towards the time complexity of HS-Tree. The material question then is, how much additional memory the execution of RBF-HS' will require after the switch, or, how much memory one might safely concede to HS-Tree before the switch. This question is answered by the following theorem, a direct corollary of Theorem 1: Theorem 4 (Space Complexity of HBF-HS after Switch). Let the conditions of Theorem 1 apply, and let S be the amount of memory consumed by HS-Tree until stop HS is true. Then, the additional memory beyond S required by HBF-HS is in O(|K|), i.e., linear.

HBF-HS Correctness
The next theorem shows that HBF-HS is correct, and is proven in Appendix D.
Theorem 5 (Correctness of HBF-HS). Let FINDMINCONFLICT be a sound and complete method for conflict computation, i.e., given a DPI, it outputs a minimal conflict for this DPI if a minimal conflict exists, and 'no conflict' otherwise. Further, let stop HS be any predicate depending on the execution state of HS-Tree. Then, HBF-HS is sound, complete and best-first, i.e., it computes all and only minimal diagnoses in descending order of probability as per the strictly antimonotonic probability measure pr .

Classifying Diagnosis Computation Methods
Literature offers a wide variety of diagnosis computation algorithms, motivated by different diagnosis problems, domains and challenges. These algorithms can be compared along multiple dimensions, e.g., 3 • best-first (minimal diagnoses are output in order, most-preferred first, as per a given preference criterion) [4,5,9,19,21,27] vs. any-first (no particular order on output diagnoses can be guaranteed) [18,20,60], • complete (given sufficient runtime and memory, all minimal diagnoses are computed) [4,5,18,19,20,21,59,61,62] vs. incomplete (in general, not all minimal diagnoses are found) [25,31,40,63,64,65], • conflict-based (minimal diagnoses are built as hitting sets of conflicts) [4,5,9,17,18,19,21,59,62,66] vs. direct (minimal diagnoses are built without reliance on conflicts, e.g., through divide-and-conquer or compilation techniques) [20,67,68,69,70], • stateful (state of the search data structure is maintained and reused throughout a diagnosis session, even if the diagnosis problem changes through the acquisition of new information about the diagnosed system) [5,9,21,42,71] vs. stateless (whenever the diagnosis computation algorithm is called, it computes diagnoses by means of a fresh search data structure) [4,17,19,62,66], • black-box (the used theorem prover is seen as a pure oracle for consistency checks, which makes the diagnosis search very general in that no dependency on any particular logic or reasoning mechanism is given) [4,9,17,19,20,21] vs. glass-box (the used theorem prover is internally optimized or modified for diagnostic purposes, which can bring performance gains, but makes the method reliant on one particular reasoning mechanism and on certain logics used to describe the diagnosed system) [38,39,72,94], • on-the-fly (conflicts are computed on demand in the course of the diagnosis search) [4,5,17,19,21] vs.
preliminary (set of minimal conflicts must be known in advance and given as input to the diagnosis search) [18,25,60,62,64,66], • worst-case linear-space (memory requirements linear in the problem size, even in the worst case) [20,68] vs.

Towards Improving Existing Methods
Our study of these existing works suggests two different things. First, the best choice of algorithm, in general, depends largely on the particular tackled problem (domain and requirements). Consequently, there is little hope to find an algorithm that comes anywhere near improving all of the existing ones. Second, performance improvements for algorithms are often achieved at the cost of losing desirable properties (e.g., completeness or the best-first guarantee). Hence, it is particularly noteworthy that RBF-HS as well as HBF-HS aim to improve existing sound, complete and best-first diagnosis search while preserving all these favorable properties. Moreover, to the best of our knowledge, RBF-HS is the first linear-space diagnosis computation method that ensures soundness, completeness and the best-first property.
Nevertheless, we next discuss the (classes of the) diagnosis computation methods that are most closely related to RBF-HS and HBF-HS wrt. the categorization above.

Discussion of Related Works
In terms of the above-mentioned dimensions, RBF-HS and HBF-HS are best-first, complete, stateless, conflictbased, black-box, and on-the-fly. Moreover, RBF-HS is worst-case linear-space whereas HBF-HS is not. However, although no linear-space guarantee is given for HBF-HS, the latter is nevertheless meant to be an improved variant of RBF-HS which "is allowed" to consume more than a linear amount of memory in order to reduce computation time. The diagnosis algorithms most closely related to the ones proposed in this work can be divided into compilation-based, duality-based, and best-first ones, which we discuss in more detail next: 4

Compilation-Based Approaches
General differences to the proposed techniques: These approaches are not black-box, i.e., dependent on the logic used to represent the diagnosed system. They can be polynomial-space or linear-space, but only under certain circumstances. Discussion: These techniques compile the diagnosis problem into some target representation such as SAT [69], OBDD [70] or DNNF [67]. Often, the generation of (minimum-cardinality; but not maximal-probability) diagnoses can be accomplished in worst-case polynomial time in the size of the respective compilation. For a polynomial-sized compilation, this implies polynomial-time diagnosis generation. However, the size of the compilation might be exponential in the size of the diagnosis problem for all these approaches, which means that no guarantee for polynomial-space (or polynomial-time), let alone linear-space, diagnosis generation can be given. Second, for these compilation approaches to be applicable to a DPI, the diagnosed system must be amenable to a propositional-logic description, which is not always the case [34,38,39]. Beyond that, compilation approaches usually do not allow to take influence on the exact order in which diagnoses are output. In summary, these methods are in general not linear-space, not best-first, and not black-box.
A compilation-based approach that is based on abstraction techniques and especially suited for a sequential diagnosis [5] scenario is SDA [71]. One difference between RBF-HS and SDA is that only a single best diagnosis (instead of a set of best diagnoses) is output by SDA at the end of the sequential diagnosis process. Second, it is questionable if similar abstraction-techniques as used in SDA are applicable to logics more expressive than propositional logic and to systems that are structurally different from typical circuit topologies.
The authors of [73] present an approach that translates a circuit diagnosis problem into a constraint optimization problem. If this constraint problem is amenable to a tree representation, then the minimum-cardinality diagnoses can be generated in linear time and space. However, it is unclear if and how non-circuit-problems and more expressive or other types of logics can be addressed.

Duality-Based Approaches
General differences to the proposed techniques: These approaches are either not best-first or not worst-case linearspace.
Discussion: FastDiag [68] and its sequential diagnosis extension Inv-HS-Tree [20] perform a linear-space depthfirst diagnosis search that is grounded on the relationship between diagnoses and conflicts according to the Duality Property (cf. Sec. 2.1.6). The soundness and completeness of the diagnosis computation despite the depth-first search is accomplished by interchanging the role of conflicts and diagnoses in the hitting set tree. That is, in these approaches the node labels correspond to minimal diagnoses and the tree paths represent conflicts. The computation of minimal diagnoses instead of minimal conflicts during the labeling process is achieved by a suitable adaptation [20] of the QuickXplain algorithm [52,53]. The main difference between these approaches and RBF-HS (and HBF-HS) is that the former cannot ensure that the diagnoses are computed in any particular (preference) order.
The authors of [59] present a sound and complete approach that interleaves conflict and diagnosis computation in a way that information from conflict computation aids the diagnosis computation and vice versa. However, unlike RBF-HS, this approach is not linear-space in general. In addition, it cannot compute most probable but only minimumcardinality diagnoses.

Best-First-Search Approaches 5
General differences to the proposed techniques: Whenever these approaches are sound and complete, they are worstcase exponential-space. Discussion: First and foremost, there are the seminal methods HS-Tree [4], along with its amended version HS-DAG proposed by [19], and GDE [5], which are sound, complete 6 and best-first.
The works of [9,39] describe sound and complete uniform-cost search variants of HS-Tree which enumerate diagnoses in some order of preference. At this, [9] defines the preference order by means of a probability model over diagnoses (as characterized in Sec. 2.1.3) whereas [39] relies on a heuristic model that ranks single axioms based on their "importance". The sum over axioms included in a diagnosis is used to determine the rank of the diagnosis. The author of [28] goes one step further and incorporates a heuristic function (cf. Sec. 2.2.2) into the search, yielding a hitting set version of A*. To the best of our knowledge, the specification of a useful heuristic function, as suggested in [28] for an additive cost function, is an open problem in uniform-cost hitting set search with multiplicative costs, as in the case of our proposed methods.
[17] suggests a variant of HS-DAG which builds a hitting set tree based on a subset-enumeration strategy in order improve the diagnosis computation time. The same objective is pursued by [16], who propose parallelization techniques for HS-Tree.
Further, there are sound, complete and best-first diagnosis searches that are particularly useful for fault isolation and sequential diagnosis [5]: StaticHS [21] and DynamicHS [42]. These are stateful in that they exploit a persistently stored and incrementally adapted (search) data structure to make the diagnostic process more efficient. More specifically, DynamicHS targets the minimization of the computation time, whereas StaticHS aims at the reduction of the number of interactions necessary from a user, e.g., to make system measurements or answer system-generated queries.
In contrast to RBF-HS and HBF-HS, all these best-first search approaches are not "memory-aware" in that they require exponential space in general.

Diagnosis Computation in the Knowledge-Base Debugging Domain
Due to their independence of the adopted system description language and theorem prover, RBF-HS and HBF-HS appear to be particularly attractive for application domains where different logics and reasoning engines are regularly used. One such field is knowledge-base debugging, which is also the focus of our evaluations.
Methods for knowledge-base debugging can be divided into model-based and heu-ristics-based approaches. The former, e.g., [9,34,74,75], can be seen as principled, theorem-proving-based methods which draw on the general theory of model-based diagnosis [4,5]. While these techniques usually allow for the finding of a precise and succinct explanation of all identified problems in a knowledge base, they can be comparably costly in terms of computation time and space due to the necessity of logical reasoning. Heuristics-based approaches, e.g., [76,77,78,79], in contrast, can be regarded as experience-based techniques that draw on empirical knowledge such as common fault patterns, rules of thumb, or best practices. They are a fast alternative in case model-based techniques are too slow, but are often incomplete (i.e., they can only identify diagnoses revealing bugs for which appropriate heuristics were defined) and sometimes unsound (i.e., they might return diagnoses that comprise correct axioms).
Model-based approaches can be further classified into glass-box and black-box ones, based on the way how a logical reasoner is used in the debugging process (cf. Sec. 5.1). While glass-box approaches are highly optimized and performant for particular logics, black-box methods score with their flexible applicability for a multitude of logics and reasoning engines [39]. Orthogonal to their categorization as black-box or glass-box, model-based techniques usually focus on one of two main ways of locating the faulty axioms in a knowledge base. The first class are justificationbased approaches, e.g. [38,39], which assume that users directly analyze conflicts to mentally reason about the faultiness of the axioms occurring in the conflicts. The second class of diagnosis-based approaches, e.g., [34,75,80], to which the methods presented in this work belong, takes the intermediate step of computing diagnoses in order to assist users in this cognitively complex task. To this end, two main ways of diagnosis computation were proposed for knowledge-based systems, either as hitting sets of conflicts using Reiter's HS-Tree [9,38,39,75,81], or by means of duality-based techniques [20,82] (cf. discussion in Sec. 5.3.2).

Objective
The goals of our evaluation are • to demonstrate the out-of-the-box general applicability of the proposed algorithms to diagnosis problems over different and highly expressive knowledge representation languages, • to understand their practical runtime, memory efficiency, and scalability, and • to compare the suggested methods against one of the most widely used algorithms with the same properties in terms of the classification discussed in Sec. 5.1. Importantly, the goal is not to show that the proposed algorithms are better than all or most algorithms in literature, which is pointless as discussed in Sec. 5.2. Rather, we intend to show the advantage of using the proposed techniques in a diagnosis scenario where the properties soundness, completeness, the best-first enumeration of diagnoses, and the general applicability are of interest or even required.
One such domain is ontology and knowledge base 7 debugging, where practitioners and experts from the field usually 8 , and especially in critical applications of ontologies such as medicine [23], want a debugger to output exactly the faulty axioms that really explain the observed faults in the ontology (soundness and completeness) at the end of a debugging session. In addition, experts often wish to perpetually monitor the most promising fault explanations throughout the debugging process (best-first property) with the intention to stop the session early if they recognize the fault. As was recently studied by [82], the use of best-first algorithms often also involves efficiency gains in debugging as opposed to other strategies. Apart from that, it is a big advantage for users of knowledge-based systems to have a debugging solution that works out of the box for different logical languages and with different logical reasoners (general applicability, cf. black-box property in Sec. 5.1). The reasons for this are that (i) ontologies are formulated in a myriad of different (Description) logics [84] with the aim to achieve the required expressivity for each ontology domain of interest at the least cost for inference, and (ii) highly specialized reasoners exist for different logics (cf., e.g., [85,86]), and being able to flexibly switch to the most efficient reasoner for a particular debugging problem can bring significant performance improvements [87].
For these reasons, we use • real-world knowledge base debugging problems (cf. Sec. 6.2) formulated over a range of different logics with hard reasoning complexities to test our approaches, and 1): Description Logic expressivity: each calligraphic letter stands for a (set of) logical constructs that are allowed in the respective language, e.g., C denotes negation ("complement") of concepts, for details see [84,93]; intuitively, the more letters, the higher the expressivity of a logic and the complexity of reasoning for this logic tends to be. 2): #D/min/max denotes the number / the minimal size / the maximal size of minimal diagnoses for the DPI resulting from each input KB K. If tagged with a * , a value signifies the number or size determined within one hour using the suite of algorithms included in the OntoDebug tool [83] (for problems where the finding of all minimal diagnoses was impossible within reasonable time).

Dataset
The benchmark of inconsistent or incoherent 9 real-world ontologies we used for our experiments is given in Tab. 1. 10 Subsets of this dataset have been investigated i.a. in [34,39,95,96,97,98,99]. As the table shows, the ontologies cover a spectrum of different problem sizes (number of axioms or components; column 2), logical expressivities (which determine the complexity of consistency checking; column 3), as well as diagnostic structures (number and size of minimal diagnoses; column 4). Note that the complexity of consistency checks (used within the FINDMINCONFLICT procedure in Alg. 2) over the logics in Tab. 1 ranges from EXPTIME-complete to 2-NEXPTIMEcomplete [93,100]. Hence, from the point of view of model-based diagnosis, ontology debugging problems represent a particularly challenging domain as they usually deal with harder logics than more traditional diagnosis problems (which often use propositional knowledge representation languages that are not beyond NP-complete).

Experiment Settings 6.3.1. Different Diagnosis Scenarios
To study the performance and robustness of our approaches under varying circumstances, we considered a range of different diagnosis scenarios in our experiments. A diagnosis scenario is defined by the set of inputs given to Alg. 2, i.e., by a DPI dpi , a number ld of minimal diagnoses to be computed, as well as a setting of the component fault probabilities pr . The DPIs for our tests were defined as K, ∅, ∅, ∅ , one for each K in Tab. 1. That is, the task was to find a minimal set of axioms (faulty components) responsible for the inconsistency or incoherence of K, without any background knowledge or measurements initially given (cf. Example 8). For the parameter ld we used the values {2, 6, 10, 20}. The fault probability pr (ax ) of each axiom (component) ax ∈ K was either chosen uniformly at random from (0, 1) (maxProb), or specified in a way (cf. Remark 2 in Example 7) the diagnosis search returns minimum-cardinality diagnoses first (minCard). As a Description Logic reasoner, we adopted Pellet [101]. 9 A knowledge base K is called incoherent iff it entails that some predicate p must always be false; formally: K |= ∀X¬p(X) where p is a predicate with arity k and X a tuple of k variables. With regard to ontologies, incoherence means that some class (unary predicate) must not have any instances, and if so, the ontology becomes inconsistent (cf., e.g., [75]). 10 The benchmark problems can be downloaded from http://isbi.aau.at/ontodebug/evaluation.

Goal to Find Actual Diagnosis
To simulate as realistic as possible diagnosis circumstances, where the actual diagnosis (i.e., the de-facto faulty axioms) is of interest and needs to be isolated from a set of initial minimal diagnoses (cf. column 4 of Tab. 1), we ran five sequential diagnosis [5,34] sessions executed by each RBF-HS and HS-Tree, for each diagnosis scenario defined in Sec. 6.3.1. Roughly, the idea is to use a diagnosis search as a routine called multiple times during an iterative information acquisition process with the goal to find the actual fault with certainty.
Specifically, a sequential diagnosis session can be conceived of as having the following two alternating phases that are iterated until a single minimal diagnosis remains: • diagnosis search, and • measurement conduction. The former involves the determination of the ld most preferred minimal diagnoses D according to pr for a given DPI. The latter subsumes the selection of an optimal system measurement based on D (to rule out as many spurious diagnoses as possible), as well as the incorporation of the new system knowledge resulting from the measurement outcome into the DPI.
Measurement selection requires a measurement selection procedure [102,103,104] which gets a set of minimal diagnoses D as input, and outputs one system measurement such that (i) this measurement is optimal wrt. some measurement quality criterion, and (ii) any outcome for this measurement eliminates at least one spurious diagnosis in D.
As measurement quality criteria we adopted split-in-half (SPL) [34], which suggests a measurement with the highest worst-case number of spurious diagnoses in D eliminated, and entropy (ENT) [5], which selects a measurement with highest information gain. These two quality criteria appear to be the commonly adopted ones in model-based diagnosis, cf., e.g., [20,34,43,104,105,106,107,108,109].
In our experiments, a measurement was defined as a true-false question to an oracle [9,34,95,110], e.g., for a biological knowledge base one such query could be Q := Bird ∃hasCapability.Flying ("is every bird capable of flying?"). As a measurement selection procedure for computing optimal queries wrt. the criteria SPL and ENT we adopted the algorithm suggested in [111,112]. Given a positive (negative) answer for a query Q, the DPI is updated by assigning Q to the positive (negative) measurements (cf. Sec. 2.1.1). The resulting new DPI is then used in the next iteration of the sequential diagnosis session. That is, a new set D of the ld most preferred diagnoses is determined for this updated DPI, an optimal measurement is calculated for D, and so on. Once there is only a single minimal diagnosis for a current DPI, the session stops and outputs the remaining diagnosis.
To solve a different problem in each of the five executed sequential diagnosis sessions per diagnosis scenario, we predefined a different randomly chosen actual diagnosis as the target solution per session. 11 This preset actual diagnosis was also used to determine measurement outcomes (i.e., to answer the generated questions) in that each question was automatically answered in a way the actual diagnosis was not ruled out.
Advantages of concentrating on sequential diagnosis runs in our evaluations (instead of single-run diagnosis search executions 12 ) are: • Sequential diagnosis is one of the main applications of diagnosis searches.
• The potential impact (cf. [82]) of different measurement quality criteria on algorithms' performances can be assessed. • Without the information acquisition through sequential diagnosis it is in many cases practically infeasible to find the actual fault (cf. the large numbers of diagnoses in the fourth column of Tab. 1). • Multiple diagnosis searches, each for a different (updated) DPI, are executed during one sequential session and flow into the experiment results, which provides more representative evidence of the algorithms' robustness and real performance. 11 As no correct solutions are known for the knowledge-based benchmark problems in Tab. 1, any diagnosis might in principle be the right solution. Which diagnosis is the actual one depends on the desiderata of the stakeholders of a knowledge base and might differ depending on the particular modeled domain. E.g., when a knowledge base is about an academic domain, a professor might need to hold at least two courses per semester if some university X is modeled, while professors with less than two courses might legitimately exist if another university Y is described. This motivates the random selection of the target solution in our experiments, which is also common practice in the field (cf., e.g., [20,34,43]). 12 Appendix F provides results for single-run tests we have executed in addition to the presented sequential diagnosis experiments. The results for both scenarios are highly consistent.

Settings in a Nutshell
We ran five sequential diagnosis sessions, each searching for a randomly specified minimal diagnosis, for each algorithm among RBF-HS and HS-Tree, for each measurement quality criterion among ENT and SPL, for each DPI from Tab. 1, for each probability setting among maxProb and minCard, and for each number of diagnoses ld ∈ {2, 6, 10, 20} to be computed (in each iteration of the session, i.e., at each call of a diagnosis search algorithm).

Scalability Tests
In order to evaluate the scalability of the tested algorithms, we conducted an additional scalability experiment. To this end, we ran tests with ld := 100 and otherwise same settings as described above on all DPIs from our dataset for which at least 100 minimal diagnoses exist (cf. last column of Tab. 1). Note, the parameter ld has a major influence on the hardness of the diagnosis computation task, since, given a set of minimal diagnoses, only deciding whether an additional minimal diagnosis not in this set exists is already NP-complete (even if logical consistency checking is in P) [26]. 13 We first explain how to read the figures, and then discuss the experiment results.

Presentation of the Results
The results for the minCard experiments are shown by Fig. 4 (measurement quality criterion SPL), Fig. 5 (measurement quality criterion ENT), Fig. 7 (scalability tests), and Fig. 8 (analysis of the hardest cases). Each figure compares the runtime and memory consumption we measured for RBF-HS and HS-Tree averaged over the five performed sessions (note the logarithmic scale). More specifically, the figures depict the factor of less memory consumed by RBF-HS (blue bars), as well as the factor of more time needed by RBF-HS (orange bars), in relation to HS-Tree. That is, blue bars tending upwards (downwards) mean a better (worse) memory behavior of RBF-HS, whereas upwards (downwards) orange bars signify worse (better) runtime of RBF-HS. For instance, a blue bar of height 10 means that HS-Tree required 10 times as much memory as RBF-HS did in the same experiment; or a downwards orange bar representing the value 0.5 indicates that RBF-HS finished the diagnosis search task in half of HS-Tree's runtime. Regarding the absolute runtime and memory expenditure (not displayed in the figures), we measured a min / avg / max runtime of 0.02 / 25 / 2176 sec (ENT) and 0.03 / 29 / 4806 sec (SPL), as well as a min / avg / max space consumption of 7 / 10K / 2M tree nodes (ENT) and 7 / 7K / 1.2M tree nodes (SPL). Fig. 6 illustrates the impact of using HBF-HS in those cases where we observed significant (≥ 20 %) time overheads of RBF-HS (orange bars) versus HS-Tree (red horizontal line). The runtimes of HBF-HS as compared against HS-Tree are indicated by the blue bars, which are shaded in those cases where HBF-HS exhibited an even lower runtime than HS-Tree. Moreover, the gray circles along the black line plot the percental time savings achieved by HBF-HS compared to RBF-HS.

Discussion of the Results
We next summarize the main insights from our experiment results. As a preliminary point, please note that all the algorithms under test have to compute the same predefined target solution in each test run, and that all of them have the same features soundness, completeness and best-firstness. Hence, observed savings of one algorithm versus another neither arise at the cost of losing any theoretical guarantees, nor due to computing a different output.
(1) Favorable space-time tradeoff: Whenever the diagnosis problem was non-trivial to solve, i.e., required a runtime of more than one second (which was the case in 94 % of the tested cases), RBF-HS traded space favorably for time. In other words, compared to HS-Tree, the factor of memory saved by RBF-HS was higher than the factor of incurred time overhead in all interesting cases (blue bar is higher than orange one; cf. Figs. 4 and 5).
(2) Substantial space savings: Space savings of RBF-HS ranged from significant to tremendous (cf. Figs. 4 and 5), and often reached factors larger than 10 (in 45 % of the cases) and up to 50 (ENT) and 57 (SPL). In other words, HS-Tree required up to 57 times as much memory for the same tasks as RBF-HS did. On average, the factor of  100 10 20 factor space saved factor more time needed memory saved amounted to 14.1 for ENT and to 13.8 for SPL, i.e., RBF-HS required an average of less than 8 % of the memory HS-Tree consumed. Note, in five scenarios (involving the ontologies K and C), HS-Tree required slightly less memory than RBF-HS, which however does not carry weight due to the fact that hitting set trees were very small (diagnoses of cardinality one, cf. Tab. 1) in these runs. . This observation may appear surprising at first sight, since RBF-HS relies on forgetting and re-exploring, whereas HS-Tree keeps all relevant information in memory. However, also studies comparing classic (non-hitting-set) best-first searches have observed that linear-space approaches can outperform exponential-space ones in terms of runtime [113]. One reason for this is that, at the processing of each node, the management (node insertion and removal) of an exponential-sized priority queue of open nodes requires time linear in the current tree depth. Hence, when the queue management time of HS-Tree outweighs the time for redundant node regenerations expended by RBF-HS, then the latter will outperform the former.
(4) Whenever it takes RBF-HS long, use HBF-HS: In those cases where RBF-HS manifested a significant (20 % or higher) time overhead versus HS-Tree, the use of HBF-HS (with a mere allowance of 400 nodes in memory before the switch from HS-Tree to RBF-HS is triggered) could almost always reduce the runtime to times comparable with those of HS-Tree (cf. Fig. 6). The median time overhead of HBF-HS compared to HS-Tree was some 0 % for ENT and 3 % for SPL, where HBF-HS led to even better times than HS-Tree in 44 % (ENT) and 42 % (SPL) of the cases. At the same time, remarkably, the memory consumption of HBF-HS never exceeded 416 nodes, whereas HS-Tree required memory for up to more than half a million nodes, which amounts to a deterioration factor of over 1000 compared to HBF-HS. The time savings over RBF-HS achieved by HBF-HS were substantial in many cases, reaching up to 70 % for ENT and maxima of over 90 % for SPL (see the gray circles in Fig. 6). For instance, in the scenario SPL, 20 for ontology O, the runtime overhead factor of 10.4 versus HS-Tree (the worst value for RBF-HS measured in all experiments, cf. Fig. 4), could be reduced to a factor of 0.98 by means of HBF-HS. That is, the use of HBF-HS This suggests that, whenever RBF-HS gets caught in redundant re-explorations of subtrees and thus requires notably more time than HS-Tree, the allowance of a relatively short run of HS-Tree (until it has generated 400 nodes) before switching to RBF-HS can already yield a runtime comparable to HS-Tree. One reason for this phenomenon is that RBF-HS can save a significant number of re-explorations through the information gained by the initial breadth-first exploration of the top of the search tree. A potential second reason might be the above-mentioned high expense of managing an increasingly large queue of open nodes required by HS-Tree, as opposed to a set of open nodes of smaller and almost fixed size in case of HBF-HS.
(5) HBF-HS allows to almost "cap" the used memory: The number of nodes in memory additionally consumed by HBF-HS after the switch (at 400 nodes) to RBF-HS was less than 2 % on average, and never more than 4 %, compared to the number of nodes in memory at the time the switch was executed. Similar and only slightly higher values could be observed for HBF-HS performing the switch at 200 (3 % exceedence on average) and 100 (7 %) generated nodes. This suggests that the consumed amount of memory can practically be more or less arbitrarily limited by the definition of a suitable switch condition (which sets a memory limit that is not much lower than the-very low-memory requirement of standalone RBF-HS). These findings are theoretically supported by Theorem 4.
(6) Performance independent of number of computed diagnoses and measurement quality criteria: The relative performance of RBF-HS versus HS-Tree appears to be largely independent of the number ld of computed minimal diagnoses as well as of the used measurement quality criterion (cf. Figs. 4 and 5).
(7) Performance improves for harder diagnosis problems: The gain of using RBF-HS instead of HS-Tree gets the larger, the harder the considered diagnosis problem is. This tendency can be clearly seen in Figs. 4 and 5, where the ontologies on the x-axis are sorted in ascending order of RBF-HS's memory reduction achieved, for each value of ld . Note that roughly the same group of (more difficult / easy to solve) diagnosis problems ranks high / low for all values of ld .  (8) Performance dependent on diagnosis preference criterion: The discussion of the results so far concentrated on the consistently good results attained by RBF-HS for the minCard setting. In case of the maxProb setting, we see a pretty different picture, where time was more or less traded one-to-one for space, i.e., k orders of magnitude savings in space against HS-Tree required approximately k orders of magnitude more runtime of RBF-HS (blue and orange bars roughly equal). The reason for this performance degradation in case of maxProb is a known property of Korf's RBFS algorithm to perform relatively poorly when original f -values (in our case: probabilities) of nodes vary only slightly [114] (cf. Appendix B.1). As a result, RBF-HS suffers from too many "mind shifts" and spends most of the time doing backtracking and re-exploration steps while making very little progress in the search tree.
However, as in the case of minCard, when we allowed for the utilization of a small amount of more memory than RBF-HS used, this problem was remedied to a great extent. In fact, adopting HBF-HS with a switch at 400 generated nodes mostly led to a runtime comparable to HS-Tree's, and in 43 % of the scenarios to an even lower one. Only in a single scenario, i.e., O, SPL, 20, HBF-HS (with a switch at 400 nodes) still required substantially more time than HS-Tree did. Obviously, this exact combination represented a particularly demanding case for HBF-HS and RBF-HS (cf. Bullet (4)).
As additional tests turned out, the answer to this problem is the employment of HBF-HS equipped with a relative switch criterion (instead of an absolute one). Concretely, we allowed HS-Tree to consume 60 % of the available memory before handing over to RBF-HS. Runtimes as for HS-Tree could be achieved in this way (while making use of only marginally more than 60 % of the disposable memory, cf. Bullet (5)).
(9) Scalability tests: The observations discussed so far have brought to light that DPIs with thousands of components (axioms) and diagnoses (cf. columns 2 and 4 in Tab. 1) could be well handled by RBF-HS in our tests (Figs. 4 and 5), and even led to a better relative performance in comparison to HS-Tree than problems with fewer components and possible faults. In our scalability experiments (cf. Sec. 6.3.4), we additionally tested the algorithm performance when a large number of diagnoses is computed. The results for the minCard setting are presented in Fig. 7. It displays that enormous space savings (in all cases) oppose • reasonable runtime overheads (9 cases), which were always lower than a factor of 1.65 except for the ontology cce with factors 3.18 (SPL) and 2.09 (ENT), • roundly equal runtimes (2 cases), and • even runtime savings (5 cases) ranging between 7 % and 22 %. Space savings achieved by RBF-HS ranged from 84 % (case O, SPL) to more than 99.9 % (case cce, SPL+ENT) and exceeded 95 % in all but a single case. Even the combination of the measurement quality criterion SPL and ontology O, which proved to be a particularly unfavorable case as regards runtime in the normal experiments (cf. Bullets (4) and (8)), turned out to be unproblematic in the scalability tests. This shows that RBF-HS scales very well when minimum-cardinality diagnoses are of interest.  Note, for three of the five runs for the case cce with measurement quality criterion ENT, we observed that HS-Tree ran out of memory (after a runtime of three minutes or less) whereas RBF-HS could successfully solve these problems in an average time of two and a half minutes while requiring only a negligible amount of memory for no more than 102 tree nodes.
For the maxProb setting, the insight was that RBF-HS, in general, does not scale to large numbers of computed diagnoses like ld = 100, as it required up to several hours of computation time per executed sequential session. HS-Tree as well as HBF-HS (with a relative switch criterion of 60 % consumption of the available memory, cf. Bullet (8)), on the other hand, could finish the same tasks in the range of few minutes. The conclusion is that, for the computation of most probable diagnoses, HBF-HS with a relative switch criterion should be used rather than RBF-HS.
(10) Results for the hardest cases: For the purpose of clarity of Figs. 4 and 5, we excluded the results for the two DPIs ccc and cce. These two DPIs result from the integration (alignment [115]) of two ontologies describing a common domain (in this case: a conference management system) in a different way. As a consequence of the automatized alignment process, a multitude of independent issues in terms of (minimal) conflicts emerge at once in the resulting ontology. This leads to large sizes of minimal diagnoses (cf. Tab. 1, column 4), which causes a high depth and thus enormous size of the hitting set tree. The runtime and memory measurements for these hard cases are demonstrated by Fig. 8. We detect gigantic space savings up to nearly four orders of magnitude achieved by RBF-HS while runtime still remained in most cases comparable with HS-Tree; in 38 % of the cases RBF-HS's runtime was even better. For instance, for the case ccc, ENT, 2, we observed that HS-Tree required more than 800 times the memory used by RBF-HS, while RBF-HS exhibited also a 3 % lower runtime. Even more impressingly, RBF-HS reduced the memory consumption by a factor of more than 4200 while at the same time decreasing the computation time by 15 % in the case ccc, SPL, 6. Moreover, in the case ccc, SPL, 20, we registered substantial time savings of 65 % coupled with a 99.9 % memory reduction achieved by RBF-HS. Finally, note that in one run for the cce, SPL, 20 setting, HS-Tree ran out of memory after running 37 minutes while RBF-HS solved the same problem in less than 11 minutes and required memory for merely 125 tree nodes. Again, as discussed above, the use of HBF-HS allows to level any significant time overheads of RBF-HS while consuming a limited amount of memory.

Conclusions and Future Work
In this work, we introduced two new diagnostic search techniques, RBF-HS and HBF-HS, which borrow ideas from Korf's seminal RBFS algorithm [58]. The unique characteristic of RBF-HS is that it requires only linear space for the computation of an arbitrary fixed finite number of minimal diagnoses (fault explanations) while preserving the desired features soundness (only actual fault explanations are computed), completeness (all fault explanations can be computed), and the best-first property (fault explanations are computed in order based on a given preference criterion). HBF-HS is a hybrid strategy that aims at leveraging synergies between Reiter's HS-Tree [4] and RBF-HS in a way that problems can be solved in reasonable time without depleting the required memory. Both suggested algorithms are generally applicable to any diagnosis problem according to Reiter's theory of model-based diagnosis [4]; in particular, they are independent of the (monotonic) knowledge representation language used to describe the diagnosed system and of the adopted inference engine.
In comprehensive experiments on a corpus of real-world knowledge-based diagnosis problems of various size, diagnostic structure and reasoning complexities beyond NP-complete, we compared our approaches against HS-Tree, a widely used diagnosis computation algorithm with the same properties (soundness, completeness, best-firstness, general applicability) as the proposed methods. The results testify that RBF-HS, when computing minimum-cardinality diagnoses, scales to large numbers of computed leading diagnoses and achieves a significant memory reduction up to several orders of magnitude for all non-easy problem instances while in addition reducing also the runtime by up to 90 % in more than a third of the cases. When used to determine the most probable diagnoses, RBF-HS trades space for time more or less one-to-one compared to HS-Tree. Moreover, for both minimum-cardinality and most probable diagnoses, whenever the runtime of RBF-HS was significantly higher than that of HS-Tree, the use of HBF-HS could level this overhead while still reasonably limiting the used memory. Overall, this demonstrates that the suggested techniques allow for memory-aware model-based diagnosis, which can contribute, e.g., to the successful diagnosis of memory-restricted devices or memory-intensive problem instances.
Since our approaches are not restricted to diagnosis problems, but applicable to best-first hitting set computation in general, and since a multitude of real-world problems can be formulated as hitting set problems, our methods have the potential to impact research and application domains beyond the frontiers of model-based diagnosis.
Future work topics include the integration of RBF-HS and HBF-HS into the ontology debugging plug-in OntoDebug 14 [37,83] for Protégé 15 [116], closer investigations of applications of RBF-HS discussed in Sec. 3.6, as well as further research on hitting set variants of other heuristic search approaches.
Node Cost Inheritance. 16 Next, the F -value of each of the newly-generated child nodes n i is set (lines [21][22][23][24][25]. Note, this is necessary at each node expansion since a (child) node's F -value exists only as long as the node is in memory; it is no longer stored after a node is discarded through a backtracking step of the algorithm. Intuitively, the ideal Fvalue would be: (a) the original f -value for child nodes never explored before, for which there cannot be a "learned" F -value yet, (b) the last known F -value for child nodes already explored before.
Basically, there are two possibilities how RBF-HS may specify the F -value of a child node n i : either the F -value of the parent n is inherited to the child node, or n i 's (original) f -value is used. In fact, the algorithm first checks whether n has already been explored before, which is true if f (n) > F (n) (line 22).
In case f (n) > F (n), the child nodes can be partitioned into those that have been explored before, and those that have not. For the latter class, we have F (n) ≥ f (n i ), which involves that each non-explored child node keeps its original f -cost (min in line 23). For the former class, it indeed holds that F (n) < f (n i ), which is why all alreadyexplored nodes inherit the F -value of the parent n (min in line 23). Note, the child nodes' last known F -value (before they were discarded) might have been lower than the inherited F (n) because only one F -value is remembered by the algorithm when a subtree is forgotten; however, F (n) is at least to some extent lower than f (n i ) which implies that at least some "fraction" of n i 's already learned backed-up cost is restored by the inheritance.
Alternatively, given f (n) = F (n) (note that f (n) ≤ F (n) for all nodes n is an invariant throughout RBF-HS'), n can, but does not need to, have been explored already. If n has not yet been explored, then clearly none of its child nodes n i can have been explored either, which is why it is reasonable to set the F -value of all children to their f -value (line 25). Otherwise, i.e., if n has already been explored before, then the latest backed-up value F (n) (which was necessarily less than f (n)) must have been forgotten in the course of backtracking steps (which is possible, e.g., if one of n's siblings had a greater F -value than n at the point where RBF-HS' backtracked after exploring n's parent node). Now, since the f -value of each node is greater than the f -value of any of its successors (anti-monotonicity of f , cf. Bullet (VI) on page 8), it must hold that (a) F (n) = f (n) > f (n i ) for all child nodes n i of n, and (b) any solution in a subtree rooted at some n i will have cost lower than or equal to f (n i ). Since the "learned" F -value for any node should not be a worse estimate of the cost of a solution in the respective subtree than the original estimation given by the node's f -value, it does not make sense to set the F -value of any child node n i to the value F (n) (> f (n i )). Hence, it is most plausible also in this case to set the F -value of all children to their original f -value (line 25).
Child Node Preparation. Once all nodes in Child_Nodes have been assigned their F -value, Child_Nodes is prepared for node exploration (while-loop, line 31) in the following way: First, if there is only a single node in Child_Nodes, then a second "dummy" node is added. The reason for this is that lines 30 and 35 require a second node to be present in Child_Nodes. In order not to compromise the correctness of RBF-HS, the F -value of this dummy node has to be set to the worst possible value −∞ (cf. argumentation for Node Assignment above). Second, the nodes in Child_Nodes are sorted in descending order of F -value, such that exactly the nodes with the highest and second-highest F -value are extracted from Child_Nodes in lines 29 and 34, respectively.
Recursive Child Node Exploration. Now, as the child nodes have been generated, their F -costs have been set, and the list Child_Nodes has been prepared for being processed, the final block of RBF-HS' involves the best-first exploration of nodes in Child_Nodes by means of the algorithm's while-loop. Throughout the iteration of the loop, the variables n 1 and n 2 always comprise the best and second-best node, respectively, among Child_Nodes, according to their (backedup) F -value. This is guaranteed by lines 33, 34, and 35, where INSERTSORTEDBYF inserts a node to a list such that the sorting of the list according to F is preserved. The while-loop is iterated by always exploring the best node n 1 through a recursive call of RBF-HS' (line 32) as long as the current n 1 's F -value is better than bound . The latter stores the maximal F -value over all child nodes of all ancestors of n 1 (see the max which determines the bound at each recursive downward step in line 32). This value at the same time corresponds to the maximal F -value of any alternative node in the entire hitting set tree, which in turn is greater than or equal to the f -value (i.e., the probability pr ) of any existing solution other than n 1 (see the proof of Theorem 2 for a precise argumentation why these things hold). Hence, the use of bound as a ruler of backtracking actions guarantees that the most probable (remaining) solution is always found first (next). At the point where all nodes in Child_Nodes have an F -value lower than bound , the while-loop is exited and the currently best F -value among the nodes in Child_Nodes is returned, i.e., propagated upward to their parent node n. Note, in the course of the recursive explorations of the subtrees rooted at nodes in Child_Nodes throughout the iteration of the while-loop, solutions might be located and added to D.
Termination. Whenever D is extended, a check is run which tests if the list of solutions D has already reached the stipulated size ld (line 17). If so, the RBF-HS' procedure terminates (line 18). Otherwise, i.e., if there are fewer than ld minimal diagnoses existent for the tackled DPI, RBF-HS' terminates once all nodes in the hitting set tree have been explored and assigned the backed-up value −∞, which is why all recursive while-loops must stop (condition in line 31). In any case, RBF-HS finally returns D (line 10).

Appendix B. RBF-HS: Derivation of Time and Space Complexity
In this appendix, we provide the argumentation that proves Theorem 1 in Sec. 3.4.

Appendix B.1. Time Complexity
We can distinguish between two sources of time complexity inherent in RBF-HS: (t1) logical consistency checking, and (t2) tree construction and management.
As to (t1), both the hardness and the number of performed consistency checks are of relevance. First, the hardness of consistency checks executed by RBF-HS depends on the knowledge representation language adopted to model the diagnosed system and thus cannot be generally assessed. It might range from polynomial in the case of Horn logic over NP-complete for propositional system descriptions to even much harder, such as (2)NEXPTIME-complete for some Description Logics [100] (cf. our evaluation dataset in Sec. 6). Note, despite these somewhat discouraging theoretical complexities, experience with real-world diagnosis cases has shown that practical runtimes for consistency checks are often reasonable, even for interactive scenarios and very expressive logics [20,34,38,39,95,117].
Regarding the number of consistency checks, in contrast, we are able to derive the upper bound O(|K|(|minC| + |ld |)) where minC denotes the set of all minimal conflicts for the DPI dealt with. To see why this holds, observe that • the only place where RBF-HS issues consistency checks is in line 44 (FINDMIN-CONFLICT), • each FINDMINCONFLICT call either yields a minimal conflict (line 49) or a minimal diagnosis (line 46), • RBF-HS terminates once the desired ld minimal diagnoses have been found, • each minimal conflict is actually computed only once (but it might be reused multiple times by means of the stored list of conflicts C), and • one call of FINDMINCONFLICT requires O(|K|) consistency checks in the worst case [54] if a minimal conflict C is returned, and only a single check if a minimal diagnosis is found (i.e., 'no conflict' is output).
Hence, no more than |minC| + |ld | calls of FINDMINCONFLICT, each issuing no more than |K| consistency checks, can be made throughout the execution of RBF-HS. Factor (t2) is somewhat harder to estimate, as one and the same node might be explored multiple times (cf., e.g., node {2, 4}, which is processed three times in Example 8). Essentially, there are two main aspects that affect this factor: (i) The larger the number of different f -values among all nodes is, and (ii) the higher the distribution of promising nodes in the search tree is, the more backtrackings and node re-explorations RBF-HS will do [114]. In the worst case, each node has a different f -value and, when sorting all nodes according to their f -value, any two neighbors in this sorting are in different subtrees of the root node. In such scenario, O(n) node explorations have to be executed per newly expanded node, where n is the number of all nodes in the complete hitting set tree (as constructed by HS-Tree). The reason for this is that each node expansion requires forgetting the entire last explored subtree of the root and expanding another one until the newly expanded node is reached. Since n nodes will be explored overall (as many as HS-Tree explores 17 ), we have a resulting complexity of O(n 2 ) (cf. the analogue argumentation in [114] for RBFS). However, this scenario is only possible when RBF-HS is used to compute diagnoses in the order of decreasing probability (most probable first).
If diagnoses should be generated in the order of increasing cardinality (minimal cardinality first), in contrast, we can deduce 18 from the findings of [33] that RBF-HS explores O(n) nodes, i.e., for sufficiently large problem size, no more than a constant number as many as HS-Tree does. Intuitively, the plausibility of this can be verified by considering (i) and (ii) above. As to (i), we have only d different node costs (i.e., cardinalities) where d is the size of the minimal diagnosis with maximal cardinality. Regarding (ii), it is straightforward to see that the next explored node of any node n will be the sibling of n's closest ancestor 19 which has not been processed in the current iteration. 20 Thus, each next-best node will be "close" to the current node and a minimum number of backtracking steps will have to be performed to reach the next-best node from the current one.

Appendix B.2. Space Complexity
First, the space complexity of Korf's original RBFS algorithm, that acts as a basis for RBF-HS, is linear [33], i.e., in O(bd) where b is the maximal number of successor states of any state (a.k.a. branching factor) and d the maximal length of any path in the search space. Second, no amendments to the recursive (depth-first) nature of RBFS have been made while deriving RBF-HS (cf. Sec. 3.1.3). Third, RBF-HS stores computed minimal conflicts and minimal diagnoses, information RBFS does not need. In RBF-HS, recorded conflicts allow for a more efficient labeling of nodes (reuse instead of recalculation), whereas the storage of diagnoses is essential for the algorithm's correctness and moreover trivially necessary as diagnoses constitute exactly the solutions which should finally be returned.
Hence, the space complexity of RBF-HS is affected by three factors: (s1) |D| (number of stored minimal diagnoses), (s2) |C| (number of stored minimal conflicts), and (s3) the space required to store the search tree.
Factor (s1) is bounded by the fixed input argument ld , which is arbitrarily preset by the user of RBF-HS, and thus in O(1). 21 Factor (s2) is bounded by |minC| where minC is the set of all minimal conflicts for the considered DPI. Analogously to RBFS, factor (s3) is bounded by |C max | * |minC| where C max is the minimal conflict for DPI with maximal cardinality. The explanation for this is that • no node can have more than |C max | child nodes (reason: exactly k successors result from a node-labeling conflict of size k, cf. EXPAND function in Alg. 2; no other ways of successor generation exist in RBF-HS, cf. Alg. 2), • no node (set of edge labels along tree path) can include more than |minC| elements (reason: any node including |minC| elements must hit all minimal conflicts and thus must be a diagnosis; diagnoses are labeled valid or closed and never further expanded by RBF-HS), and • at any tree depth, only a single node can be expanded at one particular point in time (reason: depth-first recursion, line 32).
All in all, given finite ld , we thus have a space complexity of O(|C max | * |minC|) which can be interpreted as branching factor (b) times maximal depth (d), equivalently as for RBFS. Experience in the diagnosis field suggests that generally the number of minimal conflicts does not depend on or grow with the size of the diagnosed system. 22 There are small systems with a higher number of minimal conflicts, as well as there are huge systems with negligible numbers of minimal conflicts. So, from an empirical perspective it appears to be in many cases justified to interpret |minC| to be in O(1), i.e., to be independent of the size of the given DPI. This assumption implies that RBF-HS requires space linear in the size of the DPI K, B, P , N , because clearly |C max | ≤ |K| due to C max ⊆ K (cf. Sec. 2.1.4). Note, if both b and d are assumed to be not in O(1) (i.e., are dependent on the problem size), then also the original RBFS algorithm loses its linear space bounds.  Lemma 3. If a node n corresponding to a minimal diagnosis D is processed for the first time by RBF-HS, then n will be (directly) added to D in line 16. (Equivalently: After any call of RBF-HS' which processes a node n corresponding to D returns, D is an element of D.) Proof. Assume that, for the first time throughout the execution of RBF-HS, a node n equal to D is processed, where D is a minimal diagnosis. Initially, in line 12, a label L is computed for n. Within the LABEL function, the first thing executed is the non-minimality check in lines 38-40, where a node n i is sought in D which is a subset of n. Since (1) only diagnoses can be in D as per Lemma 1, (2) n = D is a minimal diagnosis, and (3) it is the first time that a node equal to D is processed, there cannot be any subset n i of n in D. Hence, line 41 is reached. Due to the Hitting Set Property (cf. Sec. 2.1) and the fact that n is a (minimal) diagnosis, there cannot be any (minimal) conflict C such that C ∩ n = ∅. Consequently, line 44 is reached. The FINDMINCONFLICT call in line 44 will return 'no conflict' due to the Duality Property and because n is a diagnosis. As a result, LABEL will return in line 46, which means that n will be added to D in line 16.
(The equivalent statement of the lemma holds since no element once added to D can ever be removed from it, for the simple reason that there is no statement in RBF-HS that modifies D except for the one that adds elements to D in line 16.) Lemma 4. For any call RBF-HS'(n, F (n), bound ), a value X < F (n) is returned (unless the RBF-HS'-procedure is exited in line 18 before a return takes place).
Proof. Assume an execution of some call of RBF-HS'(n, F (n), bound ) throughout which no exit of the RBF-HS'procedure takes place in line 18. Observe that there are three spots where RBF-HS'might return, i.e., in any of the lines 14, 19 or 36. For the returns in lines 14 and 19, −∞ is returned. However, F (n) > −∞ must hold. To prove this, let us consider the two places where the RBF-HS'-call can have been issued, i.e., lines 9 or 32. In the former case, F (n) is equal to f (∅), which can only attain values in (0, 1) (cf. Sec. 2.1). In the latter case, n is equal to a child node n 1 of some node and F (n) = F (n 1 ) > −∞ due to the while-condition in line 31. Therefore, the statement of the lemma holds for the returns in lines 14 and 19. For the return in line 36, we first point out that, for any call RBF-HS'(n, F (n), bound ), F (n) ≥ bound must hold. To see this, consider again lines 9 and 32, where RBF-HS' can be invoked. In the former case, bound = −∞ and F (n) > bound follows from the argumentation in the previous paragraph. In the second case, as explained above, n is equal to a child node n 1 of some node. Through the while-condition, we thus know that F (n) = F (n 1 ) is larger than or equal to the old value of the bound. Moreover, we know by the sorting of Child_Nodes and the fact that n 1 is the node in Child_Nodes with the largest F -value (due to lines 28, 29, 33 and 34), that F (n) = F (n 1 ) ≥ F (n 2 ) for the node n 2 with second-largest F -value in Child_Nodes (cf. lines 30 and 35). Since bound is defined as the maximum among the old value of bound and F (n 2 ), F (n) ≥ bound must be true.
Finally, note that a return in line 36, which is our current assumption, can only take place if the condition of the while-loop is violated. This implies that the returned value (F (n 1 )) is either equal to −∞ or strictly less than bound . As F (n) must be greater than −∞, as demonstrated in the first paragraph of this proof, we deduce that the statement of the lemma also holds for the return in lines 36.
Lemma 5. Throughout the entire execution of RBF-HS and for any node n, the following invariant holds: F (n) ≤ f (n).
Proof. First, note that each node's f -value remains constant throughout the entire execution of RBF-HS. We now prove that (1) when set, each node's F -value is smaller than or equal to its f -value, and that (2) as long as a node (and thus its F -value) remains in memory, its F -value can never increase.
Proof of (1): Here, we consider the root node and all other nodes separately. First, the root node's F -value is set to its f -value in line 9, i.e., F (n) = f (n) holds. Second, each other node's F -value is set in line 23 or 25 after it that, for any two minimal diagnoses D , D with f (D ) < f (D ), some node equal to D is processed prior to all nodes equal to D .
To this end, let ld = ∞ (the algorithm does not terminate before all minimal diagnoses have been found) and assume the opposite, i.e., some node corresponding to D is processed earlier than all nodes equal to D . Take the (first ever) call RBF-HS'(n, F (n), bound ) with n = D (i.e., the first call that processes D ). Then we have that F (n) ≥ bound (while-condition) and bound = max{F (n 1 2bst ), F (n 2 2bst ), . . . , F (n k 2bst )} with k = |D | − 1 where n r 2bst denotes the best alternative node (according to F -value) at tree depth r. (Note that, at any time during its execution, RBF-HS' involves only one expanded node at each tree level; amongst the generated nodes at one level r, the best one is expanded and the second best one is precisely n r 2bst . To see that bound is equal to the maximum of the stated set of best alternative nodes, observe that bound = −∞ at the very first call of RBF-HS' in line 9, and for each node that is expanded, the new bound is the maximum of the current bound and the current best alternative node, cf. line 32). Now, let n * be the deepest common ancestor node of D and D in the tree, i.e., n * = D ∩ D . Since both D and D are minimal diagnoses, n * ⊂ D and n * ⊂ D . Moreover, let n * r,D denote the r-th successor node of n * along a path to a node equal to D . E.g., n * 1,D describes the child node of n * along the path to D ; note that n * r,D = D for r = |D | − |n * | and that n * r,D is a node at tree depth |n * | + r. For s = |n * | + 1, we know from above (F (n) ≥ bound ) that F (n) ≥ F (n s 2bst ) and, since n s 2bst is the best alternative node at level s, that F (n s 2bst ) ≥ F (n * 1,D ). Furthermore, by Lemma 5, f (n) ≥ F (n) must hold. Overall, since n = D , we so far have f (D ) ≥ F (n * 1,D ). If |D | − |n * | = 1, i.e., n * 1,D = D , then (*) F (n * 1,D ) = f (n * 1,D ) = f (D ) must be true. The reason for this is Lemma 7 and that no node corresponding to D can have been processed yet, as this would be a contradiction to our assumption that we are considering the first call that processes a node equal to D and that this one is processed earlier than any node equal to D . Thus, we have deduced that f (D ) ≥ f (D ), which gives a contradiction to our assumption.
Finally, assume (b). From Lemma 7, we know that n * 1,D must already have been processed. In addition, since D is a minimal diagnosis and n * 1,D ⊂ D , we have that n * 1,D can never be labeled valid or closed when it is processed, due to Lemma 6. Therefore, and because ld = ∞, every (and, in particular, the last) call of RBF-HS' that processed n * 1,D must have returned in line 36. From this, we infer that F (n * 1,D ) = max n∈Child_Nodes (F (n)) where Child_Nodes refers to the child nodes of n * 1,D . Since n * 2,D ⊆ D is one node among Child_Nodes, we obtain that F (n * 1,D ) ≥ F (n * 2,D ). If |D | − |n * | = 2, i.e., n * 2,D = D , then the same argumentation as in (*) above can be applied to show that f (D ) ≥ f (D ), a contradiction.

Appendix C.2.4. Soundness
We have to prove that every node that is added to D is a minimal diagnosis. To this end, assume that some D ∈ D is not a minimal diagnosis. That is, D is (a) not a diagnosis or (b) a diagnosis, but not minimal. Suppose (a). Here we immediately get a contradiction to Lemma 1. Now, suppose (b). That is, D is a non-minimal diagnosis, or, in other words, there is a minimal diagnosis D ⊂ D . By the fact that f is strictly antimonotonic, f (D ) > f (D ) must hold. Further, D must have been added to D in line 16 as node n because this is the only place in RBF-HS where D is extended. Thus, the LABEL function must have been executed for n, in particular lines 38-40. However, no return can have taken place in line 40 due to the fact that n was assigned the label valid which implies that line 46 must have been reached. As a consequence, the test n ⊇ n i in line 39 must have been negative for all n i ∈ D. Hence, no node in D is a subset of n = D , which means that, in particular, D / ∈ D at the time D is processed. Now, since D is a minimal diagnosis and has a higher f -value than D , we obtain a contradiction to the completeness and best-first properties shown above. This completes the soundness proof.

Appendix D. HBF-HS: Proof of Correctness
We next prove the following theorem, which is stated in Sec 4.4: Theorem 5(Correctness of HBF-HS). Let FINDMINCONFLICT be a sound and complete method for conflict computation, i.e., given a DPI, it outputs a minimal conflict for this DPI if a minimal conflict exists, and 'no conflict' otherwise. Further, let stop HS be any predicate depending on the execution state of HS-Tree. Then, HBF-HS is sound, complete and best-first, i.e., it computes all and only minimal diagnoses in descending order of probability as per the strictly antimonotonic probability measure pr .
Proof. First, HS-TREE is executed (line 2). There are three possible reasons for the termination of HS-TREE: (i) the queue of open nodes maintained by HS-TREE is empty, (ii) ld minimal diagnoses have been computed, or (iii) stop HS is true. In case (i), we have that Q HS = [ ], which implies (due to line 3) that HBF-HS will return D HS , the set of all minimal diagnoses computed by HS-TREE. By [9,Prop. 4.15], D HS is the set of all minimal diagnoses for the given DPI, and, by [9,Corollary 4.7], the diagnoses in D HS are sorted in descending order of their probability as per pr . Hence, the theorem holds if case (i) applies. In case (ii), we have that |D HS | ≥ ld , which is why HBF-HS will return D HS (due to line 3). By [9,Corollary 4.7], D HS comprises the ld most probable minimal diagnoses for the given DPI as per pr , which are sorted by pr in descending order. Consequently, the theorem is true for case (ii). If case (iii) holds (and case (i), already discussed, does not hold), it must be true that Q HS is not empty. The reasons for this are that the node queue of HS-TREE comprises one node (the root node ∅) at the time HS-TREE starts executing (cf. Sec. 2.2.3), that HS-TREE terminates as soon as the node queue becomes empty, and that case (i) above does not hold by assumption. Therefore, the condition Q HS = [ ] in line 3 is false. Hence, the other condition, D HS ≥ ld , in line 3, determines whether line 4 will be executed. If this condition is true, then case (ii) applies and the theorem holds, as argued above. Otherwise, the algorithm continues its execution after line 4, which means that RBF-HS' is called given the virtual root node n 0 created in line 5. Note that the same bound := −∞ is used for the root node as in the original RBF-HS algorithm (cf. line 9 in Alg. 2), and that the f-value of the root, set as f (n 0 ) := 0 in this case, is irrelevant to the correctness of RBF-HS (cf. Appendix C). The only differences to the original RBF-HS algorithm are that RBF-HS' within HBF-HS starts with an initially given (1) (possibly empty) collection D := D HS of minimal diagnoses (line 7), (2) (possibly empty) collection C := C HS of minimal conflicts (line 7), and (3) (non-empty) collection Child_Nodes root of child nodes of the root n 0 (lines 11 and 15-16). We next argue why these differences do not affect the correctness of RBF-HS'.
Ad (1): First, D is correct in that it comprises the |D| most probable minimal diagnoses as per pr due to [9,Corollary 4.7]. Second, a non-empty collection D given from the outset does not harm the correctness of RBF-HS'. To see this, observe that, beside allowing the check if ld diagnoses have been computed and the algorithm can stop (line 17 in Alg. 2), the only function of D in RBF-HS' is the detection and closing of redundantly computed or nonminimal diagnoses (lines 38-40 in Alg. 2). More specifically, no node can be wrongly closed since each element of D is a minimal diagnosis; and, every node that must be closed because it is a redundant minimal diagnosis or a non-minimal diagnosis will actually be closed due to the strict antimonotonicity of pr (which implies that minimal diagnoses must be found before non-minimal ones).
Ad (2): First, C is correct in that it contains only minimal conflicts, since HS-TREE uses FINDMINCONFLICT for conflict computation which is sound and complete by assumption. Second, a non-empty set C given from the beginning does not affect the correctness of RBF-HS'. To realize this, note that the only function of C in RBF-HS' is the avoidance of redundant conflict computations (lines 41-43 in Alg. 2).
Ad (3): By line 6, Child_Nodes root corresponds exactly to the open nodes Q HS returned by HS-TREE when it was stopped due to stop HS , after all duplicate nodes have been deleted from Q HS . Due to the completeness of HS-TREE ([9, Prop. 4.15, Lemma 4.18]), Q HS must, for each minimal diagnosis D not already in D HS , include some node that is a subset of D. Hence, all remaining minimal diagnoses not in D HS can be constructed by extending the nodes in Q HS . Note also that neither the soundness nor the completeness of RBF-HS' is harmed by the deletion of duplicates (i.e., set-equal nodes) from Q HS because RBF-HS' treats nodes as sets. Consequently, starting with Child_Nodes root as the initial nodes, RBF-HS' can find all minimal diagnoses not in D HS . Finally, observe that the correctness proofs of RBF-HS (cf. Appendix C) do not make any assumptions about the size |n| of the nodes n in Child_Nodes. Hence,   Table 1. The ontologies along the x-axis are sorted from low to high space savings achieved by RBF-HS (blue bars).