Boosting Optimal Symbolic Planning: Operator-Potential Heuristics

Heuristic search guides the exploration of states via heuristic functions h estimating remaining cost. Symbolic search instead replaces the exploration of individual states with that of state sets, compactly represented using binary decision diagrams (BDDs). In cost-optimal planning, heuristic explicit search performs best overall, but symbolic search performs best in many individual domains, so both approaches together constitute the state of the art. Yet combinations of the two have so far not been an unqualified success, because (i) h must be applicable to sets of states rather than individual ones, and (ii) the different state partitioning induced by h may be detrimental for BDD size. Many competitive heuristic functions in planning do not qualify for (i), and it has been shown that even extremely informed heuristics can deteriorate search performance due to (ii). Here we show how to achieve (i) for a state-of-the-art family of heuristic functions, namely potential heuristics. These assign a fixed potential value to each state-variable/value pair, ensuring by LP constraints that the sum over these values, for any state, yields an admissible and consistent heuristic function. Our key observation is that we can express potential heuristics through fixed potential values for operators instead, capturing the change of heuristic value induced by each operator. These reformulated heuristics satisfy (i) because we can express the heuristic value change as part of the BDD transition relation in symbolic search steps. We run exhaustive experiments on IPC benchmarks, evaluating several different instantiations of potential heuristics in forward, backward, and bi-directional symbolic search. Our operator-potential heuristics turn out to be highly beneficial, in particular they hardly ever suffer from (ii). Our best configurations soundly beat previous optimal symbolic planning algorithms, bringing them on par with the state of the art in optimal heuristic explicit search planning in overall performance.


Introduction
Classical planning deals with problems of finding a sequence of operators (or actions) leading from an initial state to one of the goal states in a fully observable deterministic environment.In this paper, we are concerned with two families of methods designed to solve such problems: heuristic explicit search and symbolic search.Heuristic explicit search guides the exploration of states using heuristic functions that estimate remaining 5 cost.A * search (Hart et al., 1968) guarantees cost-optimality-it returns a solution whose summed-up operator cost is minimal if the heuristic is admissible.The design of admissible heuristic functions has been intensively investigated in planning (e.g., Haslum & Geffner, 2000;Edelkamp, 2001;Helmert & Domshlak, 2009;Helmert et al., 2014;Pommerening et al., 2014;Seipp & Helmert, 2018), and planning algorithms based on these techniques are state-of-the-art for many benchmark domains in cost-optimal planning.
In contrast to heuristic explicit search, symbolic search replaces the exploration of individual states with that of state sets, compactly represented using binary decision diagrams (BDDs) (Bryant, 1986).The primary operations needed for search can be implemented at the level of BDDs, in time polynomial in the size of the BDDs.This greatly improves exhaustive (blind) search, as it allows to represent and manipulate large state-space fractions efficiently (Burch et al., 1992).Thus, symbolic search is very effective whenever large portions of the state space need to be traversed (Speck et al., 2020b;Speck & Katz, 2021).In cost-optimal planning, algorithms of this kind (Edelkamp & Helmert, 1999;Edelkamp & Kissmann, 2009, 2011;Torralba et al., 2017) slightly lag behind heuristic explicit search in terms of overall performance across benchmark domains, but are highly complementary and beat heuristic explicit search in a range of benchmark domains.In short, both approaches together constitute the state-of-the-art in cost-optimal planning.
Yet, this combination has not been an unqualified success.For a heuristic function h to be usable in heuristic symbolic search, (i) h must be applicable to sets of states rather than individual ones, as evaluating the heuristic on each state individually would defeat the purpose of symbolic search.Furthermore, as heuristic symbolic search requires to distinguish states with different heuristic values, (ii) the partitioning of states into BDD-represented sets is different when using h, which may be detrimental for BDD size.Condition (i) has been achieved for some strong heuristics in planning, in particular for pattern databases (PDBs) (Kissmann & Edelkamp, 2011;Torralba et al., 2018).But it remains elusive for many other competitive heuristic functions.Regarding (ii), it has been shown that even extremely informed heuristics can exponentially deteriorate search performance (Speck et al., 2020a), increasing BDD size to the extent of massively outweighing the reduction in search space size.
Due to all this, symbolic bi-directional blind search, without heuristics, is at this time considered the dominant symbolic search approach, and the use of heuristic search in this context has lost traction.
Here we challenge this trend by showing that potential heuristics (Pommerening et al., 2015), denoted in what follows by h P , yield fresh synergy between heuristic and symbolic search.We focus on the simplest kind of potential heuristics (i.e.those of dimension one), which assign a fixed potential value P(f ) to each fact f (i.e., each state-variable/value pair) in a given planning task, and obtain the heuristic value h P (s) of a state s as the sum h P (s) = f ∈s P(f ) of potential values of the facts f true in s.It is ensured via linear program (LP) constraints over the fact potentials that h P is an admissible and consistent heuristic function.
This family of heuristic functions does not per se satisfy condition (i).Here we show how to reformulate them in a way that addresses this problem.Our key observation is that we can express potential heuristics through a fixed operator potential value Q(o) for each of the task's operators o instead, capturing the change of heuristic value h P (s ′ ) − h P (s) for any state transition s → s ′ induced by o.We show that, under a mild assumption on the planning task structure (discussed below), h P (s) for a state s reached via an operator sequence ⟨o 1 , . . ., o k ⟩ is equal to the value of h P in the initial state plus the sum Q(o 1 ) + • • • + Q(o k ) of operator potentials.This heuristic function satisfies (i) in the sense that we can express the heuristic value change as part of the BDD transition relation (TR) in symbolic search steps.Specifically, this reformulated potential heuristic fits into the symbolic heuristic search algorithm GHSETA * (Jensen et al., 2008), which partitions TRs by both their costs and the change of heuristic values they induce.
The assumption required for the above is that every state variable V affected by the effect of an operator o is constrained by o's precondition.This is true in many standard planning benchmarks, but is of course not true in general.For input tasks that do not satisfy this assumption, our reformulation can also be applied and still yields admissible heuristics, but these are path-dependent and inconsistent, necessitating node reopening in search.Therefore, they do not tend to work well in practice (Fišer et al., 2022a,b).For this reason and because this setting unnecessarily complicates the theory, we discuss them only in Appendix A. We present two better remedies.First, disambiguations (Alcázar et al., 2013;Fišer et al., 2020) allow to weaken the assumption, and also yield much stronger potential heuristics.Second, one can use task transformations to transition normal form (Pommerening & Helmert, 2015), where necessary, to achieve the assumption.
Another technical difficulty is that the operator potentials are real (floating-point) numbers, which can lead to rounding and precision issues.Naïvely rounding these values may lead to inconsistent heuristics.We show that this can instead be dealt with by extending the potential-heuristic LP to a mixed-integer linear program (MIP) that forces the operator potentials to be integers.
Putting the above pieces together, we obtain a new heuristic function for forward symbolic heuristic search: the forward search direction is enforced as computing h P (s) requires to know the operator sequence ⟨o 1 , . . ., o n ⟩ leading to s.This is at odds with backward search and bi-directional search, which are traditional key strengths of symbolic search.However, as it turns out, our approach applies to such searches as well, through a different reformulation where ⟨o k+1 , . . ., o n ⟩ are the operators on the path from the search state to the goal node, and h P (s) equals the value of h P in the initial state plus the sum of Q(o i ) over o k+1 to o n , i.e., it turns out summing operator-potentials over sequences of operators works both in forward and backward direction.
This equality for the backward direction requires not only the mild assumption discussed above, but additionally requires the strong assumption that there is a single unique goal state.One can, again, apply our approach anyway to obtain path-dependent and inconsistent heuristics, but this does not always pay off in practice.What turns out to be effective instead is to partition the goal states over their heuristic values at the beginning of symbolic backward search.The operator-potentials then work analogously to forward search.This in turn extends to symbolic bi-directional search where we can choose any combination of operator-potential or blind heuristics for each search direction. 1e run exhaustive experiments on IPC benchmarks, evaluating several different instantiations of potential heuristics in forward, backward, and bi-directional symbolic search, and comparing these configurations to the state-of-the-art in cost-optimal planning.Our operator-potential heuristics turn out to be highly beneficial.They hardly ever suffer from the risk (ii) of possibly increased BDD sizes.The key observations are: • Our combination of symbolic search with potential heuristics vastly outperforms each of its components, showing that this combination is (much) more than the sum of its parts.
• Our best configurations soundly beat previous optimal symbolic planning algorithms, establishing a new state-of-the-art for this method family.
• Our best configurations furthermore bring symbolic search on par with the state of the art in optimal heuristic explicit search planning in overall performance, while maintaining the high level of complementarity.Thus we improve the state of the art in cost-optimal planning overall.
This paper is a combination and extension of two of our previous publications (Fišer et al., 2022a,b).In (Fišer et al., 2022a), we introduced operator-potential heuristics and we showed how to efficiently combine them with the forward symbolic search.In (Fišer et al., 2022b), we addressed the application of operatorpotential heuristics in the backward and bi-directional symbolic search resulting in a path-dependent inconsistent but admissible variant of operator-potential heuristics for the backward direction.In this paper, we unify the formulations of operator-potential heuristics from those two previous publications, and we present a coherent description of operator-potential heuristics and their integration to forward, backward, and bidirectional symbolic search.Moreover, we extend these findings by showing how to turn path-dependent (inconsistent) variant of operator-potential heuristics for the backward search into heuristics that are consistent, which leads to a significant improvement of the backward symbolic search.Lastly, we present a comprehensive and detailed experimental analysis of virtually all aspects of operator-potential heuristics and their integration into symbolic search.The paper is organized as follows.We next give the necessary background on planning framework and notations, potential heuristics, and symbolic search (Section 2).We then introduce our reformulated operatorpotential heuristics, analyzing possible designs for forward and backward search (Section 3).We show that these heuristics can easily be used in symbolic search (Section 4).We give a detailed empirical evaluation (Section 5) before concluding the paper (Section 6).
For ease of reading, we limit our analysis in Section 3 to the case where all effect variables are constrained by the precondition; Appendix A discusses operator-potential heuristics not making that assumption.

Background
We consider the finite domain representation (FDR) of planning tasks (Bäckström & Nebel, 1995).An FDR planning task Π is specified by a tuple Π = ⟨V, O, I, G⟩.V is a finite set of variables, each variable V ∈ V has a finite domain dom(V ).A fact ⟨V, v⟩ is a pair of a variable V ∈ V and one of its values v ∈ dom(V ).The set of all facts is denoted by F = {⟨V, v⟩ | V ∈ V, v ∈ dom(V )}, and the set of facts of variable V is denoted by A partial state p is a variable assignment over some variables vars(p) ⊆ V. We write p[V ] for the value assigned to the variable V ∈ vars(p) in the partial state p.
We also identify p with the set of facts contained in p, i.e., p = {⟨V, p  and [k, n] is defined as an empty set for k > n.Moreover, [n] denotes a shorthand for [1, n].A sum over an empty set is considered to be zero.

A sequence of operators
The resulting state of this application is π s 0 = s n and cost(π) = i∈[n] cost(o i ) denotes the cost of this sequence of operators.We also consider an empty sequence of operators π which is applicable in every state s and π s = s.
A sequence of operators π = ⟨o 1 , . . ., o n ⟩ is called an s-t-path if there exist states s and t such that π is applicable in s and π s = t.A sequence of operators π is called an s-plan if it is applicable in the state s and π s is a goal state, and I-plan is called simply a plan.An s-t-path (s-plan, plan) π is called optimal if its cost is minimal among all s-t-paths (s-plans, plans).A state s is forward reachable if there exists an I-s-path, otherwise we say it is forward unreachable.A state s is backward reachable if there exists an s-plan, otherwise we say it is backward unreachable.
An operator o is forward (backward) reachable iff it is applicable in some forward (backward) reachable state.The set of all forward reachable states is denoted by S fw , the set of all I-s-paths for all s ∈ S fw is denoted by E fw , the set of all backward reachable states is denoted by S bw , and the set of all s-plans for all s ∈ S bw is denoted by E bw .A state s ∈ S fw \ S bw that is forward reachable but not backward reachable is called forward dead-end (i.e., forward dead-ends s are states that are reachable from the initial state, but there does not exist any s-plan), and a state s ∈ S bw \ S fw that is backward reachable but not forward reachable is called backward dead-end (i.e., backward dead-ends s are states for which there exist an s-plan, but they are not reachable from the initial state).
A forward heuristic h fw : S fw → R ∪ {∞} estimates the cost of optimal s-plans for all forward reachable states s ∈ S fw .The optimal forward heuristic h ⋆ fw (s) maps each forward reachable state s to the cost of Note that we allow negative heuristic values as is usual in works on potential heuristics, because it allows to find more informed potential heuristics (e.g., Pommerening et al., 2015), and we can treat negative estimates as zeros during the search.Admissibility and consistency is usually defined for all states whereas here we define them for forward and backward reachable states only.Clearly, if a forward (backward) heuristic is goal-aware (init-aware) and forward (backward) consistent, then it is also forward (backward) admissible.Sometimes we omit the adjective forward or backward when it is clear from the context.In particular, admissibility and consistency of a forward heuristic will always mean forward admissibility and forward consistency, respectively, and admissibility and consistency of a backward heuristic will always mean backward admissibility and backward consistency, respectively.
We also consider heuristic functions over all states, h : S → R ∪ {∞}.Nevertheless, admissibility and consistency is used only for forward and backward heuristics, goal-awareness is used only for forward heuristics, and init-awareness only for backward heuristics.
In the context of heuristic search, h-value of a state node s refers to the heuristic value of s, g-value to the cost of the sequence of operators leading to s, and f -value is the sum of g-value and the maximum of h-value and zero (since we allow negative h-values).
We define heuristics as state-dependent meaning they are functions mapping states to numbers.We also deal with path-dependent heuristics that map sequences of operators to numbers, i.e., a path-dependent heuristic can return different numerical values for the same state depending on the sequence of operators that leads to it.The exact definition of path-dependent heuristics is provided in Appendix A as we deal with them formally there.
A set of facts M ⊆ F is a mutex if M ̸ ⊆ s for every forward reachable state s ∈ S fw .We will leverage prior work on so-called disambiguation (Alcázar et al., 2013;Fišer et al., 2020).Given a variable V ∈ V and a partial state p, a set of facts F ⊆ F V is called a disambiguation of V for p if for every forward reachable state s ∈ S fw such that p ⊆ s it holds that F ∩ s ̸ = ∅ (i.e., ⟨V, s[V ]⟩ ∈ F ).
Disambiguation allows us to infer which facts cannot be part of any forward reachable state extending a given partial state.For example, suppose we have three variables V a , V b and V c each having two facts: Moreover, suppose there is no forward reachable state containing a 1 and b 1 at the same time, or b 2 and c 2 at the same time, i.e., {a 1 , b 1 } and {b 2 , c 2 } are mutexes.Now, given a partial state p = {b 1 , c 1 }, we can infer from the aforementioned mutexes that there is no forward reachable state extending p containing a 1 because every such state already contains b 1 .Therefore, the set {a 2 } is a disambiguation of V a for p.If we consider the variable V b that is already defined in p, then we get that the set {b 1 } is a disambiguation of V b for p, because we can safely say that any forward reachable state extending p must contain b 1 .As another example, consider a partial state p ′ = {a 1 , c 2 }.In this case, we have that the empty set ∅ is a disambiguation of V b for p ′ , because we can infer from the mutexes that neither b 1 or b 2 can be part of any forward reachable state extending p ′ .Therefore, we can conclude that p ′ itself is a mutex as there is no forward reachable state (i.e., variable assignment over all variables) containing p ′ .In other words, disambiguation tells us which facts can potentially appear in forward reachable states extending a given partial state.Note that disambiguations are allowed to overapproximate these sets which is necessary because we are usually not able to find a complete set of mutexes-there can be exponentially many of them, and it is as hard as planning to prove that a given set of facts is mutex (Fišer & Komenda, 2018).
Clearly, every F V is a disambiguation of V for all possible partial states, and if ⟨V, v⟩ ∈ p then {⟨V, v⟩} is a disambiguation of V for p.Moreover, if the disambiguation of V for p is an empty set (for any V ), then all states extending p are unreachable.Therefore, we can use empty disambiguations to determine unsolvability of planning tasks (if G extends p), or to prune unreachable operators (if a precondition or prevail condition of the operator extends p).So, from now on we will consider only non-empty disambiguations, and we will assume that, for every partial state p and a variable V ∈ vars(p), the disambiguation of V for p is exactly {⟨V, p[V ]⟩}.Fišer et al. (2020) showed how to use mutexes to find disambiguations, so here we will assume we already have disambiguations inferred.Given an operator o ∈ O, D o (V ) denotes a disambiguation of V for pre(o) ∪ prv(o), and D G (V ) denotes a disambiguation of V for the goal G.Note that as per our assumption above, we have that D o (V ) = {⟨V, v⟩} for every ⟨V, v⟩ ∈ pre(o) ∪ prv(o), and D G (V ) = {⟨V, v⟩} for every ⟨V, v⟩ ∈ G.
A planning task Π is in Transition Normal Form (TNF) if (i) vars(pre(o)) = vars(eff(o)) for every o ∈ O and (ii) the goal is a fully defined state.Here, we are interested only in the first condition, so we say that the planning task Π is normalized if vars(pre(o)) = vars(eff(o)) for every o ∈ O. From now on, we assume the given planning task is normalized.This simplifies the presentation and proofs, but we discuss the general case in Appendix A.
Every planning task can be normalized in polynomial time by introducing new auxiliary zero-cost operators, which grow the representation only polynomially (Pommerening & Helmert, 2015), and it can be further improved with disambiguations (Fišer et al., 2020).Unfortunately, this transformation turns out to be detrimental to symbolic search as we show in Section 5.2.
However, we can also use a more straightforward "multiplication" method that, for every operator o ∈ O and every of its affected variable not appearing in its precondition V ∈ eff(o) \ pre(o), enumerates all possible values of V and creates the corresponding operators.This method can be improved with disambiguations as we do not need to enumerate all values of V , but we can consider only the disambiguation D o (V ).It turns out that, despite its worst-case exponential increase in task size, it very rarely happens in our benchmarks that a task cannot be transformed with this method, and it has a good synergy with the symbolic search.

Background on Potential Heuristics
Potential heuristics (Pommerening et al., 2015(Pommerening et al., , 2017) ) are defined as weighted sums over a set of simple state features that correspond to conjunction of facts.The dimension of a feature is the number of facts in the corresponding conjunction.We consider here the simplest variant, one-dimensional potential heuristics (also sometimes called atomic potential heuristics), where all features are single facts.It assigns a numerical value to each fact, and the heuristic value for a state s is then simply a sum of the potentials of all facts in s.
Definition 1.Let Π denote a planning task with facts F. A potential function is a function P : F → R.
A potential heuristic for P maps each state s ∈ S to the sum of potentials of facts in s, i.e., h P (s) = f ∈s P(f ).
(1) Moreover, we use h P fw to denote h P restricted to forward reachable states, i.e., h P fw (s) = h P (s) for every forward reachable state s ∈ S fw .Now we can state sufficient conditions for the potential heuristic to be forward consistent, goal-aware, and forward admissible, which we will need later on.We use the formulation using disambiguation previously introduced by Fišer et al. (2020) and adapted to our notation and the assumption that we have a normalized planning task.In contrast to the prior formulation (Fišer et al., 2020, Theorem 7), we simplify the condition ensuring forward consistency (Equation (3) below), because we assume we have a normalized planning task where vars(pre(o)) = vars(eff(o)) for every o ∈ O, i.e., for every affected variable V ∈ vars(eff(o)) we know exactly what is the value of V in the precondition of o (and thus also in the state where o is applicable).
Theorem 2. Let Π = ⟨V, O, I, G⟩ denote a normalized planning task with facts F, and let P denote a potential function.If and for every operator o ∈ O it holds that then h P fw is goal-aware, forward consistent, and forward admissible.
Equation (2) ensures goal-awareness, and Equation (3) ensures forward consistency.Note that Equation (2) uses the disambiguation D G (V ) because we do not need to consider all values of every variable not appearing in the goal G, but just those that can be part of a forward reachable goal state.In practice, we can obtain potentials as a solution to a linear program (LP) with constraints corresponding to conditions from Theorem 2 (Pommerening et al., 2015) as follows.
1.For each f ∈ F, we create a (real-valued) variable P(f ).
2. To ensure goal-awareness, we use the constraint Equation (2).The maximization in Equation ( 2) can be transformed into a set of linear inequality constraints in a standard way: For every variable V ∈ V, we create an auxiliary real-valued variable X V , then for every V ∈ V and f ∈ D G (V ), we add the constraint P(f ) ≤ X V , and finally we replace Equation (2) with the constraint V ∈V X V ≤ 0.
3. To ensure consistency, we add the constraint Equation (3) for every operator o ∈ O.
Any solution of such LP for any objective function results in a goal-aware and forward consistent potential function.Since we can choose any objective function, we can look for potential heuristics maximizing the heuristic estimate for the initial state (Pommerening et al., 2015), we can maximize the average heuristic estimates for all (syntactic) states S (Seipp et al., 2015), use mutexes to disregard some states that are not reachable (Fišer et al., 2020), or we can even combine some of the above.For example, we can construct a LP so that we obtain a potential heuristic that maximizes h P fw (I) while maximizing the average estimate over all states (Fišer et al., 2020).

Background on Symbolic Search
While explicit state-space search algorithms operate on individual states, symbolic search (McMillan, 1993) works on sets of states compactly represented as Binary Decision Diagrams (BDDs) (Bryant, 1986).
BDDs are an efficient data-structure to represent Boolean functions {0, 1} n → {0, 1} in the form of a directed acyclic graph.A set of states S ⊆ S is represented as a BDD via its characteristic function S → {0, 1} assigning 1 to states that belong to S and 0 to states that do not belong to S. Note that this assumes a binary encoding of states.We use the standard representation and variable ordering used in previous works on symbolic search for classical planning (Kissmann & Edelkamp, 2011;Torralba et al., 2017).The size of a BDD B, denoted as |B|, refers to the number of nodes in B. The advantage of using this representation comes from the fact that BDDs can be exponentially smaller than the number of states they represent.
Once we have sets of states represented as BDDs, we can use operations on BDDs to operate with sets of states without enumerating them one by one.Operations like the union (∪), intersection (∩), and complement of sets of states correspond to the disjunction (∨), conjunction (∧), and negation (¬) of their characteristic functions, respectively.For example, if we have two BDDs B 1 and B 2 , representing sets of states S 1 and S 2 , the operation B 1 ∧ B 2 results in a BDD which represents S 1 ∩ S 2 .These operations take only polynomial time in the size of the input BDDs O(|B 1 ||B 2 |), which enables efficient manipulation of large sets of states.
To perform symbolic search, the operators of the planning task are represented as transition relations (TRs), also using BDDs.A TR of an operator o is a characteristic function T o : S × S → {0, 1} that represents all pairs of states ⟨s, o s ⟩ such that o is applicable in s.Having a TR T o for every operator o ∈ O, we can construct a TR representing all operators with the same cost c as T c = o∈O,cost(o)=c T o .That is, T c represents all pairs of states ⟨s, s ′ ⟩ such that s ′ can be reached from s by applying some operator with cost c.As the size of T c may be exponential in the number of operators with cost c, in practice, it is often a good idea to use disjunctive partitioning to keep the size at bay (Jensen et al., 2008;Torralba et al., 2013Torralba et al., , 2017)).
Having a representation for sets of states as well as sets of operators, one can efficiently perform forward search by iteratively applying the image operation starting with the BDD representing the initial state.Given a BDD S representing a set of states and a TR T c , image(S, T c ) computes the set of successor states reachable from any state in S by applying any operator represented by T c .By using a separate TR per operator cost c, one can easily keep track of the cost of reaching a state.If S g represents a set of states reachable with cost g, then all states in image(S g , T c ) are reachable with a cost of g + c.By repeatedly applying this operation, one can enumerate all states, classified into sets S 0 , S 1 , . . .according to the distance from the initial state.Whenever the representation of each S g is compact, one can get exponential gains with respect to explicit-state search (Edelkamp & Kissmann, 2008).
For the search in the backward direction, one can start with the BDD representing all goal states and use the operation pre-image instead of image, i.e., pre-image(S, T c ) computes the set of all predecessor states S ′ from which a state in S can be reached by applying an operator represented by T c .Torralba et al. (2017) provide a comprehensive description of how to efficiently implement image and pre-image operations.
The most prominent implementation of symbolic heuristic search in the context of automated planning is BDDA * (Edelkamp & Reffel, 1998) which is a variant of A * (Hart et al., 1968) using BDDs to represent sets of states.Like A * , BDDA * expands states by ascending order of their f -value.To take advantage of the symbolic representation, BDDA * represents all states with the same g and h value in a single BDD S g,h (disjunctive partitioning of S g,h can also be used).Given a set of states S g,h and a TR T c , the g-value of the resulting set of successor states image(S g,h , T c ) is simply g + c.However, these successor states have to be split according to their h-value.This can usually be performed efficiently with, e.g., symbolic pattern databases (Kissmann & Edelkamp, 2011), by partitioning all states into BDDs S h , where each S h represents the set of all states with the heuristic value h.Then a conjunction of the successor states and a set S h will give us the sub-set of successor states with heuristic value h.To fully partition a set of states according to their heuristic value, we then need to compute such a conjunction for every partition S h .
GHSETA * (Jensen et al., 2008) encodes the heuristic function as part of the transition relation, creating multiple TRs depending on the impact of the operators on heuristic value.That is, we need a function δ h : O → R mapping operators to numbers so that if the heuristic value for the state s is h(s) and the operator o ∈ O is applicable in s, then h(o s ) = h(s) + δ h (o) is the heuristic value for the successor state o s .Then we can partition operators into TRs not only by their costs but also by the change of the heuristic value δ h (o) they induce, i.e., instead of having a TR T c for every operator with the cost c, we have a TR T c,q for every operator cost c and every possible value q = δ h (o).With this approach, computing g and h-values of successor states is much more straightforward than in the previous case: image(S g,h , T c,q ) directly results in the BDD S g+c,h+q representing all successor states of S g,h with g-value g + c and h-value h + q.This is a very efficient way of evaluating the heuristics within symbolic search.However, up to now, all heuristics known to be suitable for this representation were either non-informative, inadmissible, or domain dependent.We show, in the next two sections, that potential heuristics can be adapted to this schema to smoothly integrate them into the GHSETA * algorithm.
Algorithm 1 shows the pseudo-code of the GHSETA * algorithm in the forward direction.It takes a planning task, a heuristic estimate h I for the initial state, and a function δ h inducing, together with h I , a consistent admissible forward heuristic, i.e., we assume that for every sequence of operators π = ⟨o 1 , . . ., o n ⟩ applicable in I it holds that h I + i∈[n] δ h (o i ) is a forward consistent and forward admissible heuristic estimate for the state π I .Lines 1 and 2 describe the partitioning of operators into TRs based on their cost and the change of heuristic value they induce via the function δ h .The rest is a standard A * algorithm without re-opening states (because we assume a consistent heuristic) adapted for searching over sets of states and computing heuristic values by summing over sequences of operators rather than calling a heuristic function for every expanded state.The main distinctions to the state-space A * are the following: (a) As in the standard A * , we maintain the set of closed states (lines 5 and 11).However, since we operate on sets of states instead of individual states, we represent the set of closed states as a BDD (possibly with disjunctive partitioning), and we skip closed states stored in the set closed by removing closed from all expanded and generated sets of states (lines 8 and 13).
(b) GHSETA * also maintains an open list as a priority queue ordering states by the increasing f -values, but all states with the same g and h-values are merged into one BDD (line 16 and the function InsertOrUpdate).So, in each cycle, a set of states with the lowest f -value are processed at once using the BDD S g,h with minimal g-value among those with minimal f = g + max(h, 0) value.
(c) Given a set of states S g,h (with the g-value g and h-value h) and a TR T c,q (with the cost c and inducing the change of h-value by q), we can easily compute the g and h-value of the successor states image(S g,h , T c,q ) as g + c and h + q, respectively (line 13).
Algorithm 1: GHSETA * in the forward direction with a consistent heuristic.
Input: A planning task Π, a heuristic estimate hI ≥ 0 for the initial state, and a function δ h : O → R so that hI and δ h induce a consistent and admissible heuristic.Output: An optimal plan or "unsolvable".
12 for each Tc,q do 13 S g+c,h+q ← image(S g,h , Tc,q) ∧ ¬closed ; (d) Instead of terminating when a goal state is removed from the queue, we terminate when we remove a set of states containing a goal state.The plan extraction in GHSETA * (line 10) is a little bit more complicated than in state-space A * , because a simple backchaining from a goal state is not possible here.Nevertheless, it is still polynomial in the length of the plan-a detailed description is provided by Torralba et al. (2017).
Finally, we adapt the GHSETA * algorithm to support negative h-values.Instead of considering the fvalue of a state to be g + h, we use instead f = g + max(h, 0).Therefore, the f -value of the successor states is g + c + max(0, h + q) (in line 15).This is not only an optimization (i.e., avoiding the expansion of any bucket where g > h ⋆ fw (I) even if g + h < h ⋆ fw (I)).In fact, this is also needed for correctness of the stopping condition, as otherwise goal states with negative heuristic value could be expanded even if they do not correspond to an optimal plan.The common solution of simply changing the heuristic function to max(h, 0) is not possible as that cannot always be expressed as a δ h function.However, by keeping the original (negative) h value for the BDD representation, and making the heuristic non-negative only when computing the f -value, we get the best of both worlds: an efficient BDD representation without unnecessarily expanding any set of states with negative heuristic value.
Note that Jensen et al. (2008) also introduce the FSETA * algorithm where the change of heuristic values is compiled directly into operators' costs.This is similar to encoding the heuristic as a task transformation, i.e., by changing the cost of each operator to be cost(o) + δ h (o).It is well-known that running Dijkstra on the reformulated task is equivalent to running A * on the original task (Martelli, 1977).However, it is not entirely clear how to apply these approaches in the presence of heuristics that can take negative heuristic values.We leave this question to future research.
If no heuristic is used (i.e., h = 0 for all states), performing backward search is straightforward.One can simply run Algorithm 1, starting with the BDD representing the set of all goal states instead of the initial state (line 3).Then, at each step, it uses the pre-image operation instead of image (line 13), and it terminates when a BDD containing the initial state is removed from the priority queue (line 9).In Sections 3 and 4, we explain how to extend this for performing backward search with any (backward) consistent and admissible heuristic.
The bi-directional search combines the forward and backward search by keeping separate open and closed lists for each direction and then alternating between the forward and backward steps.In each iteration of the algorithm, it is decided whether to expand a set of states from the forward or the backward open list.A common criteria is to select the search direction whose next step is estimated to be easiest (e.g., by selecting the set of states whose BDD representation is smallest).In our implementation, we use the criteria used by Torralba et al. (2017), which besides the BDD size, also considers the time spent in previous iterations to estimate which direction will take less time in completing the next step.
The bi-directional search stops when both directions meet and we are able to prove that the plan combined from both directions is an optimal plan.That is, instead of checking whether the current set of states selected for expansion contains a goal state (line 9), we check whether the intersection with the closed list from the opposite direction is non-empty.If the intersection is not empty, then any state in such intersection is part of a plan.The algorithm keeps track of the the best plan π found so far, and terminates as soon as no better plan can be found, i.e., whenever cost(π) , where open f and open b are the open lists of the forward and backward search, respectively, and f (s) and g(s) denote f and g-values of a state s, respectively.2This guarantees that the bi-directional search terminates with an optimal plan, as long as an admissible heuristic is used in both directions, even when different heuristics are used in each direction.Note also that each direction can use a different partitioning of operators into TRs.

Operator-Potential Heuristics
Potential heuristics map facts to numerical values.Here, we show that instead of mapping facts to numerical values, we can map each operator o to a numerical value, called operator-potential, corresponding to the change of the heuristic value over a transition induced by o.More precisely, we show how to transform a potential function P : F → R to an operator-potential function Q : O → R so that for every state s and each operator o applicable in s it holds that h P (o s ) = h P (s) + Q(o).In other words, we define Q in such a way that Q(o) is exactly equal to the change of heuristic value of the corresponding potential heuristic over a transition between states induced by the operator o.
Recall that we assume vars(pre(o)) = vars(eff(o)) for every operator o ∈ O (the general case is discussed in Appendix A).As pointed out by Seipp et al. (2016) in the context of proving the limitations of onedimensional potential heuristics, this means that we know exactly how each operator o changes the state s on which it is applied, i.e., for every fact ⟨V, v⟩ ∈ eff(o) we know exactly what is the value s[V ] because ⟨V, s[V ]⟩ ∈ pre(o).(Note that the same is not true for higher-dimensional potential heuristics, so operatorpotential functions are defined for potential heuristics of dimension one only.) for every operator o ∈ O.
In the following proposition, we show that Q(o) is exactly equal to the change of the heuristic value of the potential heuristic from a state to state.Note that Proposition 4 holds for any state s, in particular, for every forward reachable as well as every backward reachable state.
Proposition 4. Let s ∈ S denote a state, and let o ∈ O denote an operator applicable in s.Then Proof.Let t = s \ pre(o).Since we assume vars(pre(o)) = vars(eff(o)), it follows that t = o s \ eff(o).Therefore, we have that Next, we show that the property from Proposition 4 extends over sequences of operators.That is, for any two states s, s ′ ∈ S and any sequence of operators π = ⟨o 1 , . . ., o n ⟩ leading from s to s ′ (i.e., π is applicable in s and π s = s ′ ) it holds that the sum over operator potentials of operators from the sequence π, i∈[n] Q(o i ), is exactly equal to the change of heuristic value of the potential heuristic from s to s ′ , i.e., h ).Note that this property holds for any sequence of operators π between states s and s ′ .In other words, for a fixed pair of states s, s ′ ∈ S and any two sequences of operators π = ⟨o 1 , . . ., o n ⟩ and π ′ = ⟨q 1 , . . ., q m ⟩ both leading from s to s ′ , it holds that h P (s) Therefore for any such π and π ′ the sums over operator potentials are exactly the same, i.e., i∈ Therefore, summing operator potentials over sequences of operators preserves state-dependency as long as the planning task is normalized.We will use this property later when we define state-dependent operatorpotential heuristics in forward and backward direction.
Proposition 5. Let s ∈ S denote a state, and let π = ⟨o 1 , . . ., o n ⟩ denote a sequence of operators applicable in s.Then Proof.(By induction) The claim clearly holds for the empty sequence π.Now, assume the claim holds for some sequence of operators π ′ = ⟨o 1 , . . ., o k−1 ⟩ such that π ′ is applicable in s and k ≤ n, and we prove that it also holds for the sequence of operators From Proposition 4 we have that

and from the assumption we have that f ∈s
which concludes the proof.Now that we have shown how to define operator-potential functions and we proved their fundamental properties in relation to the corresponding potential heuristics, we move to the introduction of a new family of operator-potential heuristics in forward and backward direction.In Section 3.1, we show how to construct operator-potential forward heuristics that are goal-aware, forward consistent and thus also forward admissible.In Section 3.2, we focus on operator-potential backward heuristics.We show that the same approach used for the operator-potential forward heuristics can be used also in the backward direction.Although it leads to backward admissible estimates, it can also result in path-dependent heuristics.So, we show how to remedy this issue and obtain operator-potential backward heuristics that are state-dependent, init-aware, backward consistent, and backward admissible.

Forward Direction
Under the assumption that vars(pre(o)) = vars(eff(o)) for every operator and with Proposition 5 in place, the construction of a forward consistent operator-potential heuristic h Q fw is straightforward.Given a potential function P and its corresponding operator-potential function Q, we start by setting the heuristic value for the initial state h Q fw (I) to h P fw (I) = f ∈I P(f ), and then it follows from Proposition 5 that adding a sum of operator-potentials over any sequence of operators π = ⟨o 1 , . . ., o n ⟩ applicable in the initial state results in the heuristic value h P fw (π I ), i.e., such construction exactly preserves heuristic values of the potential heuristic h P fw along with its properties such as consistency, goal-awareness, and admissibility.
Definition 6.Let Q denote an operator-potential function for P.An operator-potential forward heuris- for every sequence of operators π = ⟨o 1 , . . ., o n ⟩ such that π I = s.
Now we need to show that h Q fw is well-defined, i.e., Equation ( 5), indeed, expresses a function mapping forward-reachable states to numbers.In other words, we need to show that h Q fw (s) is the same for every sequence of operators π leading from the initial state to s.This follows directly from Proposition 5 as it also shows that h Q fw is exactly equal to h P fw and therefore h Q fw has exactly the same properties as h P fw .Theorem 7. h Q fw is well-defined, and h Q fw (s) = h P fw (s) for every forward-reachable state s ∈ S fw , and h Q fw is forward admissible (goal-aware, forward consistent) if h P fw is forward admissible (goal-aware, forward consistent).Proof.Let s ∈ S fw denote a forward reachable state, and let π = ⟨o 1 , . . ., o n ⟩ denote a sequence of operators such that π is applicable in I and π I = s.From Proposition 5 it follows that f ∈I P(f f ∈s P(f ), and from definitions of h Q fw and h P fw it further follows that Therefore h Q fw is well-defined, h Q fw (s) = h P fw (s) for every s ∈ S fw , and therefore if h P fw is forward admissible (goal aware, forward consistent), then so is h Q fw .Note that h Q fw is a state-dependent heuristic even though it is computed from I-s-paths.This is because every I-s-path results in exactly the same value of h Q fw (s).Moreover, note that h Q fw can be used in an incremental way: For every forward reachable state s ∈ S fw and an operator o ∈ O applicable in s, we have In other words, h Q fw can be used in search so that we assign the heuristic value h Q fw (I) = f ∈I P(f ) to the initial state and then whenever we expand a state s with an operator o, we compute the heuristic value for the resulting state simply by adding Q(o) to the heuristic value we have previously stored for s, i.e., . This property of h Q fw will be particularly useful in the context of symbolic search.
As we show later, a frictionless application of operator-potentials in symbolic search requires partitioning of operators using Q(o) values, i.e., we need to group together operators that induce the same change of operator-potential heuristic values.Therefore, we need to compare Q(o) values on equality.However, P is typically inferred using a linear program which results in Q(o) values represented as floating-point numbers.This could significantly reduce efficiency of the partitioning as each partition can consist of a single operator even when Q(o) values differ only slightly.Moreover, the strength of symbolic search lies in its ability to aggregate states with the same heuristic and g-values into a BDD.Therefore, having floating-point heuristic values is an even larger problem.
Assuming operator costs are integers, both issues can be resolved if all Q(o) values and h P fw (I) are integers.It is easy to see that h P fw (I) can be safely rounded up to the nearest integer, because if costs of operators are integers, then also costs of plans must be integer-valued.It may also seem that rounding operator potentials down to the nearest integer may resolve this issue as the sums over the rounded operator potentials would result in admissible estimates.However, rounding Q(o) values down may result in path-dependent estimates.
Consider the planning task depicted in Figure 1.The operator-potential heuristic h Q fw is clearly forward consistent, goal-aware, and forward admissible.Let Q denote a function mapping each operator , we get two different values depending on the path used to reach s 2 .Moreover, note that path-dependency may also result in inconsistency: Consider the states s 1 and s 2 , and operator sequence ⟨o 1 ⟩ used to reach s 1 and ⟨o 3 , o 4 ⟩ used to reach s 2 .In this case, the estimate with Q for s 1 would be 1, but the estimate for s 2 would be 0 as well as the cost of o 2 .
We resolve this issue by restricting the potential functions to always result in integer-valued operatorpotentials.Note that this approach still allows to round h P fw (I) up to the nearest integer as it clearly preserves forward consistency, goal-awareness, and forward admissibility of the resulting heuristics.
To obtain integer operator-potentials, we propose to use the following mixed-integer linear program (MIP): 1.For every fact f ∈ F, we create the real-valued variable P(f ).
2. For every operator o ∈ O, we create the integer-valued variable Q(o).
3. To ensure goal-awareness and forward consistency of the resulting potential function P, we add the constraint Equation ( 2), and, for every o ∈ O, we add the constraint Equation (3).
4. For every operator o ∈ O, we add the constraint Equation ( 4).This ensures that Q(o) will be, indeed, an operator-potential as per Definition 3, and since Q(o) is an integer-valued variable, the resulting operator-potential will be also integer.
Clearly, any solution to such MIP results in an operator-potential function according to Equation (4) with integer Q(o) values and the corresponding h Q fw will be forward consistent, goal-aware, and forward admissible.So, we can use any optimization criteria that was previously proposed for the potential heuristics (Pommerening et al., 2015;Seipp et al., 2015;Fišer et al., 2020).The disadvantage of using MIP is that it is harder to solve than LP because MIP is NP-hard in general whereas LP can be solved by a polynomial algorithm.However, this seems to be rarely a bottleneck in practice as we show in the experimental evaluation in Section 5.1.

Backward Direction
Interestingly, under certain conditions, the very same operator-potential function and Equation ( 5) can be used to obtain backward admissible estimates also in the backward direction.To be more precise, given a potential function P such that the conditions from Theorem 2 hold and the operator-potential function Q for P, for every s-plan π = ⟨o 1 , . . ., o n ⟩, the estimate is a lower bound on the cost of every I-s-path.That is, Equation ( 6) is a backward admissible heuristic estimate for the backward search even when P and Q are computed in the same way as for the forward direction which we prove in the following Proposition 8.
Proposition 8. Let P denote a potential function such that Equation (2) and Equation (3) hold, let Q denote an operator-potential function for P, let s ∈ S fw ∩ S bw denote a state that is both forward and backward reachable, let π = ⟨o 1 , . . ., o n ⟩ denote an s-plan, and let Proof.Note that h Q fw (I) = f ∈I P(f ) and that ⟨o ′ 1 , . . ., o ′ m , o 1 , . . ., o n ⟩ is a plan.Let g = π s .From Equation (2) it follows that h P fw (g) ≤ 0, and from Theorem 7 it follows that h h P fw (g) ≤ 0. Therefore we have that h Finally, From Definition 3 and Now, it may seem that we are ready to formulate the backward variant of operator-potential heuristics.Unfortunately, the aforementioned Equation ( 6) can result in different values depending on the given s-plan π, i.e., such heuristic estimates are path-dependent.Consider the planning task depicted in Figure 2, and the state "yB" backward-reachable from both goal states "xC" and "yC".For "yC" and ⟨o 2 ⟩, Equation ( 6) Both are backward admissible estimates for the backward search as the cost of the remaining operator o 1 is 1, but the estimates are path-dependent.This behavior is caused by the fact that we allow negative heuristic estimates possibly resulting in different heuristic values of different goal states.For example, h Q fw -value for "xC" is −1 whereas h Q fw -value for "yC" is zero.This observation suggests that we can fix this issue by incorporating heuristic values of goal states in the computation.And indeed, it turns out that if Equation (3) holds for the potential function P, then subtracting the sum of potentials over goal state facts from Equation (6) resolves the issue.So, we define goal-corrected operator-potential backward heuristic accordingly and then we prove that it is well-defined (i.e., state-dependent), init-aware, backward consistent, and therefore also backward admissible.
Definition 9. Let Q denote an operator-potential function for P such that Equation (3) holds for P. A goal-corrected operator-potential backward heuristic h for every backward reachable state s ∈ S bw and every s-plan π = ⟨o 1 , . . ., o n ⟩ ∈ E bw .
We start by showing in the following Lemma 10 that h Q bw from Definition 9 is state-dependent, i.e., for every backward reachable state s ∈ S bw , h Q bw (s) evaluates to the same value for all s-plans.This is a consequence of Proposition 5 because it shows that for a given (backward reachable) state s ∈ S bw the sum denote two s-plans such that π s = s g and π ′ s = s ′ g .Then Proof.Since f ∈I P(f ) appears on both sides of the equation, we need to prove that Since π s = s g , it follows from Proposition 5 that f ∈s P(f , and similarly for π ′ and s ′ g we have that which concludes the proof.
Next, we show that h Q bw is init-aware.Lemma 11.Let Q denote an operator-potential function for P, let π = ⟨o 1 , . . ., o n ⟩ denote a plan, and let Proof.It follows directly from Proposition 5, because s g = π I and therefore f ∈I P(f In the following Lemma 12 we show that h Q bw is backward consistent under the assumption that Equation (3) holds for P. It follows from the state-dependency of h Q bw and the fact that if Equation (3) holds, then −Q(o) ≤ cost(o) holds, i.e., transitioning over an operator o cannot decrease the heuristic estimate by more than cost(o).
Lemma 12. Let Q denote an operator-potential function for P, let s, s ′ ∈ S bw and o ∈ O denote two backward reachable states and an operator such that o s = s ′ , let π = ⟨q 1 , . . ., q n ⟩ ∈ E bw denote an s-plan, let π ′ = ⟨q ′ 1 , . . ., q ′ m ⟩ ∈ E bw denote an s ′ -plan, and let s g = π s and s ′ g = π ′ s ′ .If Equation (3) holds for P, then Proof.Let ρ = ⟨o, q ′ 1 , . . ., q ′ n ⟩ (see illustration in Figure 3).From Lemma 10 and the fact that ρ ∈ E bw and it is applicable in s and ρ s = s ′ g , it follows that Therefore it is enough to prove that −Q(o) ≤ cost(o) which follows directly from Definition 3 and Equation (3).Now we are ready to prove that goal-corrected operator-potential backward heuristics are well-defined (i.e., state-dependent), init-aware, backward consistent, and therefore also backward admissible.
Algorithm 2: Partitioning of the goal states such that all states within each partition have the same h Pvalue.
Input: A set of variables V = {V1, . . ., Vn}, a disambiguation map DG for goal G, a potential function P.
Output: Partitioning P G of the goal states by their h P -values. 1 P G ← {⟨0, S⟩} where S is a set of all states; Theorem 13. h Q bw is well-defined, init-aware, backward consistent, and backward admissible.Proof.It follows from Lemma 10 that h Q bw is well-defined because given any backward reachable state s, the value of h Q bw (s) is the same for all s-plans.Init-awareness follows directly from Lemma 11, backward consistency follows directly from Lemma 12, and backward admissibility follows from init-awareness and backward consistency.

Partitioning of Goal States into BDDs for Backward Search
Backward search starts in goal states and proceeds towards the initial state.So, given a state s reached during the backward search, backward heuristics estimate the cost of the optimal I-s-path.Incorporating heuristic values of goal states into Equation (7) allows us to define a backward heuristic that is statedependent, backward consistent, backward admissible, and at the same time we can associate each operator with the change of the heuristic value it induces.However, it also comes with a price.Goal conditions of planning tasks are partial states, so they can define an exponential number of goal states.It may seem we need to enumerate all (forward reachable) goal states in order to compute heuristic values for them, which would be in general infeasible.Fortunately, in symbolic search, sets of states are represented as BDDs whose size can be exponentially smaller than the number of states they represent.So, in order to use h Q bw in the context of symbolic search, we do not need to enumerate all forward reachable goal states, but rather partition those goal states into multiple BDDs so that each BDD represents all forward reachable goal states with the same h P -value (or an overapproximation of them).Algorithm 2 encapsulates the algorithm that does exactly that.
The main idea behind Algorithm 2 is as follows.Given a fact f , let S f = {s ∈ S | f ∈ s} denote a set of all states containing f .First, it is easy to see that, given a variable V ∈ V and its value v ∈ dom(V ), for every state s ∈ S ⟨V,v⟩ it holds that s[V ] = v and therefore P(⟨V, s[V ]⟩) = P(⟨V, v⟩).So, given a set of distinct variables V 1 , . . ., V n ∈ V and their respective values In other words, starting from sets of states S f1 , . . ., S fn where each f i is from a different variable V i , we can construct a set of more specific states by taking the intersection between sets S fi while keeping track of the sum of potentials over variables V i .
Second, given a variable V and two values v, v ′ ∈ dom(V ), for every state s ∈ S ⟨V,v⟩ ∪ S ⟨V,v ′ ⟩ it holds that s[V ] = v or s[V ] = v ′ .So, if P(⟨V, v⟩) = P(⟨V, v ′ ⟩), then also P(⟨V, s[V ]⟩) = P(⟨V, v⟩) for every state s ∈ S ⟨V,v⟩ ∪ S ⟨V,v ′ ⟩ .And we can generalize this idea to a set of distinct variables V 1 , . . ., V n ∈ V and their , and therefore also i∈ Algorithm 2 puts these two ideas together.It iterates over all variables one by one (outer cycle on lines 2 to 7).For each variable V i , it considers only the values of V i that can be part of some (forward reachable) goal state (line 4), and partitions all states having those values by their potential values (lines 3 to 6).Finally, it merges the partitioning over the variable V i into the partitioning over the variables V 1 , . . ., V i−1 achieved in the previous step while keeping track of the sum of potentials over the variables V 1 , . . ., V i .
Since only the facts that can be part of a goal state are considered, the resulting partitioning P G is a partitioning of goal states only (Theorem 14(A1)).Since the disambiguation map D G overapproximates forward reachable states by definition, P G contains all forward reachable goal states (Theorem 14(A2)).Since the function Merge is always called only for a partitioning M over variables V 1 , . . ., V i−1 and a partitioning M ′ over the variable V i , taking the intersections on line 18 must result in a partitioning over variables V 1 , . . ., V i eventually terminating with the partitioning over all variables (Theorem 14(A3)).And finally, since, on line 18, h is the sum of potentials over variables V 1 , . . ., V i−1 , and h ′ is the potential over the variable V i , then h + h ′ is the sum of potentials over variables V 1 , . . ., V i which eventually results in the sum over all variables, i.e., the value of the potential heuristic (Theorem 14(A4)).
Theorem 14.Let V = {V 1 , . . ., V n }, D G , and P denote inputs of Algorithm 2, and let P G = {⟨h 1 , P G 1 ⟩, . . ., ⟨h m , P G m ⟩} denote the output of Algorithm 2. Then (A1) for every s ∈ j∈[m] P G j it holds that G ⊆ s, i.e., P G contains only goal states and nothing else; and (A2) for every forward-reachable goal state s g it holds that s g ∈ j∈[m] P G j , i.e., P G contains all forwardreachable goal states; and (A3) for every j, k ∈ [m] such that j ̸ = k it holds that P G j ∩ P G k = ∅ and h j ̸ = h k , i.e., P G is, indeed, a partitioning; and (A4) for every j ∈ [m] and every s ∈ P G j it holds that h j = h P (s), i.e., Algorithm 2 partitions goal states based on their h P -values.
Proof.Given a set of variables X ⊆ V and a partial state s, let s| X denote a restriction of s to X, i.e., s| X = {⟨V, v⟩ | ⟨V, v⟩ ∈ s, V ∈ X}; and given a set of partial states S, let S| X = {s| X | s ∈ S}.
We start with four invariants that hold in every cycle i of the outer loop (lines 2-7) with respect to the construction of the set M (constructed on lines 3-6): (I1) For every ⟨h, B⟩ ∈ M and every s ∈ B it holds that (I3) For every ⟨h, B⟩ ∈ M it holds that B| V\{Vi} = S| V\{Vi} , i.e., B restricted to all variables excluding V i is a set of all syntactic partial states over V \ {V i }.This follows from the fact that every B on line 5 is constructed so that B| V\{Vi} = S| V\{Vi} and therefore for every union X of such sets it also holds that InsertOrUpdate maintains the set M so that there no two elements with the same h-value, we have Since P G is initialized with the set of all (syntactic) states S and, in the function Merge, the sets of states are constructed only using intersections, it follows from (I3) that in every cycle i of the outer loop, the function Merge is called on line 7 with the argument P G such that for every ⟨h, B⟩ ∈ P G it holds that B| {Vi,...,Vm} = S| {Vi,...,Vm} .Therefore, the function Merge returns P G such that for every ⟨h, B⟩ ∈ P G it holds that B| {Vi+1,...,Vm} = S| {Vi+1,...,Vm} , and furthermore from (I1) it follows that for every s ∈ B and every V ∈ {V 1 , . . ., V i }, it holds that ⟨V, s[V ]⟩ ∈ D G (V ).Therefore, at the end of the algorithm, for every ⟨h, B⟩ ∈ P G and every s ∈ B and every V ∈ V, it holds that ⟨V, s[V ]⟩ ∈ D G (V ).Therefore, it follows from the definition of D G that (A1) holds because D G (V ) = {⟨V, v⟩} for every ⟨V, v⟩ ∈ G, and also (A2) holds because we considered all possible facts that can appear in any forward reachable goal state.
From (I4) and the fact that Merge uses only intersections to construct sets of states, it follows that P G j ∩ P G k = ∅ for every j, k ∈ [m] s.t.j ̸ = k.And since InsertOrUpdate makes sure that the output set does not contain two elements with the same h-value, it follows that (A3) holds.
Finally, from (I2) and the fact that Merge in cycle i sums h and h ′ such that h is a sum of potentials over variables V 1 , . . ., V i−1 and h ′ is the potential over variable V i , it follows that (A4) holds.
Note that all sets of states can be represented as BDDs, and union (∪) and intersection (∩) between sets of states can be computed as a disjunction (∨) and conjunction (∧) between BDDs.
Also note that Algorithm 2 can be easily used for generating a symbolic pattern database equivalent to the potential heuristic.Running Algorithm 2 with empty G as the input will produce a partitioning of all states (i.e., all states extending ∅) by their h P -values.Such symbolic pattern database can be used directly in the variant of BDDA * introduced by Kissmann & Edelkamp (2011) (discussed in Section 2.2).However, there are two main reasons why not to use such symbolic pattern databases.First, the computation of the partitioning can be easily infeasible in practice because there is no guarantee that the resulting BDDs will concisely represent the underlying set of states (i.e., in the worst case the size of the BDD can be linear in the number of states it represents, therefore it can grow exponentially).This guarantee does not exist even for the partitioning of goal states and we will focus on this aspect in our experimental evaluation in Section 5.4.Second, and more importantly, symbolic pattern databases generated with Algorithm 2 cannot be more informative than the corresponding potential heuristic which, in turn, is equivalent to the operator-potential heuristics h Q fw (and to h Q bw on forward reachable states).As we discuss in the next section, h Q fw and h Q bw can be applied in the symbolic search by a straightforward integration of the underlying operator-potential function Q into the GHSETA * search, which makes the computation of heuristic values using symbolic pattern databases much more expensive than using GHSETA * with h Q fw or h Q bw instead.

Symbolic Search with Operator-Potential Heuristics
The integration of the operator-potential forward heuristic in the forward GHSETA * is straightforward.
The operator-potential forward heuristic h Q fw is defined as 5)) and therefore we have that h . Moreover, we have shown in Theorem 7 that h Q fw is forward consistent and forward admissible (assuming the underlying h P fw is forward consistent and forward admissible).Therefore, we can integrate h Q fw into GHSETA * described in Algorithm 1 by simply setting h I to h P (I) and using Q as the δ h function.That is, we set the heuristic value of the initial state to h P (I), and partition operators by their costs and Q(o) values.
For the backward direction, we also use Q as the δ h function, but on top of that we need to partition the set of goal states using Algorithm 2. That is, we cannot start with the BDD representing the set of all goal states (and initialize h I to h P (I)) because this could result in a path-dependent inconsistent heuristic.What we need to do instead is to generate the partitioning of the (forward reachable) goal states by their h P -values and initialize the open list accordingly.Let P G = {P G h1 , . . ., P G hn } denote all partitions returned by Algorithm 2, where h 1 , . . ., h n are all distinct h P -values goal states can have, i.e., for every P G hi ∈ P G and every state s g ∈ P G hi it holds that h P (s g ) = h i .Moreover, let h I i = h P (I) − h i for every i ∈ [n].Then we need to initialize the open list (lines 3 and 4 in Algorithm 1) as {⟨max(0, h I i ), S 0,h I i ⟩ | P G hi ∈ P G , S 0,h I i = P G hi }, i.e., we insert every partition P G hi into the open list with the g-value set to zero, h-value set to h I i = h P (I) − h i , and f -value set to max(0, h I i ).This is all that is needed, because it ensures that any sequence of operators ⟨o 1 , . . ., o m ⟩ applied on any state from any S 0,h I i results in the h-value h is exactly what we need in order to obtain a goal-corrected operator-potential backward heuristic according to Definition 9 that is backward consistent and backward admissible (Theorem 13).
The bi-directional GHSETA * combines the aforementioned approaches and therefore we can use different operator-potential heuristics in each direction or choose to use blind symbolic search in one direction and an operator-potential heuristic in the other.

Experimental Evaluation
The proposed heuristics and the GHSETA * algorithm was implemented in C as a part of the cpddl planning library. 3 The inference of potential and operator-potential functions was implemented using CPLEX LP/MIP solver v22.1.0.For the manipulation of BDDs we used the CUDD library v3.0.0.
The translation from PDDL to FDR uses the inference of lifted mutex groups proposed by Fišer (2020) that are subsequently used for the creation of FDR variables.Operators and facts are pruned with the h 2 heuristic in forward and backward direction (Alcázar & Torralba, 2015), and we used mutex pairs from the forward h 2 heuristic for disambiguation.
Performing operations on BDDs can sometimes be very time-consuming, which significantly reduces performance of the symbolic search.Therefore, we follow the approach used in previous implementations of symbolic planners by applying various time limits on BDD operations to mitigate their negative effect whenever we can.We use a time limit of 30 seconds for applying mutexes on the BDDs representing goal states.When the time limit is reached in the forward or backward GHSETA * , the search is simply performed without mutexes applied on the goal BDDs.In case of bi-directional GHSETA * , when the time limit is reached, the search in the backward direction is disabled, because it is a strong indication that computing successor states in the backward direction will be very slow or it will require a large amount of memory.We also applied 10 seconds time limit on merging transition relation BDDs, i.e., for each cost and operator-potential value we try to build a single BDD representing all operators in that partition, but if we fail to do that within the time limit, we use the disjunctive partitioning (Jensen et al., 2008;Torralba et al., 2017).In case of bi-directional GHSETA * , we also turn off backward search once the step in the backward direction takes longer than three minutes (i.e., one tenth of the overall time limit as we describe below).This helps symbolic search to proceed without getting stuck in a fruitless attempt to compute a set of states that cannot be efficiently represented as a BDD.We do not apply the same time limit in the forward direction, because computing successors in the backward direction is usually more time-consuming than in the forward direction (i.e., if it takes long in the forward direction, it will probably take even longer in the backward direction).
The experiments were conducted on a cluster of computing nodes with Intel Xeon Scalable Gold 6146 processors.The time and memory limits were set to 30 minutes and 8 GB, respectively.We used all planning domains from the optimal track of International Planning Competitions (IPCs) from 1998 to 2018 excluding the ones containing conditional effects after translation and those that could not be grounded and pruned with h 2 within the time and memory limits.We merged, for each domain, all benchmark suites across different IPCs eliminating duplicate instances, resulting in a total of 1648 planning tasks across 48 domains. 4otential and operator-potential functions were inferred with the following optimization criteria: • I: maximize the heuristic value of the initial state (Pommerening et al., 2015), i.e., we set the optimization criteria of the LP/MIP to maximize f ∈I P(f ).
• A + I: maximize the heuristic value for the average (syntactic) state while enforcing the maximum heuristic value for the initial state (Seipp et al., 2015;Fišer et al., 2020).We first compute I to obtain the maximal heuristic value for the initial state h I .Then we extend the LP/MIP with the additional constraint f ∈I P(f ) ≥ h I , and we maximize the sum ⟨V,v⟩∈F P(⟨V, v⟩) |dom(V )| . .
• S 1k + I: maximize the average heuristic value for 1 000 states sampled using random walks, while enforcing the maximum heuristic value for the initial state (Seipp et al., 2015;Fišer et al., 2020).We enforce the maximum heuristic value for the initial state as in A + I. Then we sample 1 000 states S by random walks starting from the initial state with binomially distributed length of walks centered around the double of the maximum h-value for the initial state.Finally, we set the optimization criteria to the maximization of the heuristic value over states S, i.e., we maximize 1 |S| s∈S f ∈s P(f ).
• M 2 + I: maximize the average heuristic value for all reachable states approximated with mutexes while enforcing the maximum heuristic value for the initial state (Fišer et al., 2020).The maximum h-value for the initial state is enforced as in A + I and S 1k + I.The optimization criteria is based on estimating, for each fact f ∈ F, the number of forward reachable states containing f .The details are described by A + I used in the forward direction, and h Q bw optimized for I in the backward direction is denoted by For the blind bi-directional symbolic search, we use the shorthand ← → b .For symbolic search with operator-potential heuristics (h Q fw and h Q bw ), we transformed planning tasks so that vars(pre(o)) = vars(eff(o)) for every operator o by the "multiplication" method described in Section 2 using the h 2 mutexes for disambiguation.We also compare to the variant where tasks are transformed to TNF using the (polynomial) method proposed by Pommerening & Helmert (2015) improved with disambiguations (Fišer et al., 2020).We show, however, that this method is almost always detrimental to the performance.Note that the transformed planning tasks are not only used for the computation of potential functions, but must also be used for the symbolic search, because the inferred operator-potentials correspond to the operators of the transformed planning task, not the original task.The time spent in the transformation of planning tasks is always counted as a part of the running time.
We do not show a detailed comparison to the implementation of blind symbolic search competing in IPC 2011 and 2014 (smb), because our implementation has overall better performance.The overall number of solved tasks is 943 by , respectively.The cgm planner is compared only on subsets of domains because of its limited support of PDDL features like conditional effects, inequality preconditions, or quantifiers (we had to exclude domains caldera, cavediving, GED, maintenance, movie, mprime, snake, spider, termes, and trucks).

Operator-Potential Functions via Mixed-Integer Linear Programs
Potential functions (for state-space search) are typically inferred using linear programs (LPs).Nevertheless, a smooth integration of operator-potential heuristics in the symbolic search requires integer-valued operator-potentials which in turn requires solving mixed-integer linear programs (MIPs).
Although there always exists an operator-potential heuristic (e.g., assigning zero to all operators), solving MIP instead of LP may result in a less informative heuristic because the MIP used for operator-potentials is more restricted than the corresponding LP for (fact) potentials.To get a sense of how much does using integer-valued operator-potentials cost in terms of a loss of informativeness, we focus on the potential heuristic optimized for the initial state (I) which gives us the highest possible estimate for the initial state.Comparing how its values for initial states change if the MIP is used instead of LP shows that it actually almost never changes.We found that we get a smaller heuristic value in only 17 tasks from four domains (two tasks in nomystery, three in pegsol, eight in pipesworld-notankage, and four in pipesworld-tankage), and the heuristic values always differ only by one.So, using MIP instead of LP almost never leads to a loss of informativeness.However, solving a MIP, which is NP-complete, is typically much more time and memory demanding than solving a LP, which can be done in polynomial time.
Figure 4 shows per-task comparisons of the runtime of LP and MIP solvers for different variants of (operator-) potential heuristics.Note that I requires solving one LP (or MIP), whereas A + I, M 2 + I, and S 1k + I require solving two LPs (MIPs)-the first one for getting maximal heuristic value for the initial state which is then used in the second one as an additional constraint.Although solving the MIP is indeed almost always slower (sometimes by more than two orders of magnitude), the runtime is not a significant limiting factor in most tasks.This can be observed in Figure 5a depicting a cumulative number of tasks with successfully inferred operator-potentials on y axis versus the ratio between the runtime of MIP and LP variants on x axis, i.e., the point (x, y) corresponds to y tasks where the ratio between the runtime of MIP over LP is x or less.For all tested optimization criteria of potential heuristics, the slowdown is well below the factor of ten for most of the tasks, and the median slowdown is 1.4 for I, 2.6 for A + I, 2.3 for S 1k + I, and 1.6 for M 2 + I.  for most tasks, and under ten seconds for almost all tasks.The runtime higher than ten seconds occurred in only 25, 74, 127, and 130 tasks for I, A + I, S 1k + I, and M 2 + I, respectively.In contrast to the LP variant, MIP could not be solved within the time or memory limit in only one task from the caldera domain, four from pipesworld-tankage, and two from spider for I and A + I, and additionally three more tasks from airport and four more from pipesworld-notankage for S 1k + I and M 2 + I. Overall, using MIP instead of LP is rarely a bottleneck, primarily because it is computed only once before the search starts.

Normalization of Planning Tasks
State-dependent and (forward and backward) consistent heuristics h Q fw and h Q bw require that vars(pre(o)) = vars(eff(o)) for every operator o ∈ O.In the set of benchmarks we use here, this is the case in 485 out of 1 648 tasks.The rest of the tasks has to be transformed to this form.As already described in Section 2, it can be done either by the polynomial method described by Pommerening & Helmert (2015) and Fišer et al. (2020), denoted by poly, or by the (more brute-force) "multiplication" method (mult).The disadvantage of the poly method is that it can introduce many zero-cost operators, and the disadvantage of the mult method is that it can incur in an exponential blow-up of the number of operators.Nevertheless, Figure 6 shows that it rarely happens that the number of operators is significantly increased by mult in our dataset.In fact, only one task (from the caldera domain) could not be transformed by mult due to the memory limit, and the number of operators grew more than two-fold in only two domains: In maintenance, the number of operators was between 2.3 and 2.7 times higher with mult.In agricola, the number of operators increased 7-to 16-fold.
The runtime also does not seem to be an issue.The transformation takes more than one second in only 28 tasks for mult in contrast to 34 tasks for poly.The maximum runtime of mult is 6.7 seconds in contrast to 9 seconds of poly, and the median of the ratio between the runtimes of mult and poly is one.Therefore, both methods are about as fast as the other.
So, neither of the methods seem to be detrimental in terms of the size of the resulting planning task or the runtime overhead they incur.However, it turns out that the mult method has a much better synergy with the symbolic search with operator-potential heuristics than poly, as can be observed in Table 1.For example, there is only a single task from the whole dataset where using poly is beneficial over using mult in the forward search-it is the one task from the caldera domain mentioned above, where mult could not successfully transform the planning task within the memory limit.We think the clear superiority of mult is caused by the auxiliary zero-cost operators created by poly, because a single operator in the task created  2 4 -22 22 21 26 30 27 4 5 -140 141 142 110 170 189 1  Table 2: "Domain Dominance": the row x and column y shows the number of domains where the method x solved more tasks than the method y. "Task Dominance": the row x and column y shows the number of tasks solved by x but not by y. "tot": the overall number of solved tasks (coverage).The number in the cell (x, y) is in bold if it is higher than the number in (y, x).
The most interesting numbers are highlighted with grey background: We highlight comparisons between GHSETA * and A * with the same (operator-) potential heuristics, and we highlight comparisons between the forward blind symbolic search ( − → b ) and the forward GHSETA * with h Q fw .
with mult may correspond to a sequence of operators in the task created with poly.Thus the change of the heuristic value induced by such an operator is dissolved into multiple operators for poly.For these reasons, in all of the following experiments, we consider the transformation method mult only.

Forward Search
As we discussed in Section 5.1, operator-potential heuristics tend to retain informativeness of the corresponding potential heuristics.So, the next question is whether the information provided by operator-potential heuristics increases the efficiency of the symbolic search.Table 2 compares all variants of GHSETA * , statespace search A * with the same potential heuristics, and the blind forward and bi-directional symbolic search.GHSETA * variants are clearly superior to their A * counterparts in overall numbers-be it the overall number of solved tasks, number of domains in which GHSETA * dominates A * , or the number of tasks solved by GHSETA * but not A * .However, we can still observe some complementarity between the methods (in particular, for the potentials optimized for the initial state (I)).
The more detailed per-domain comparison in Table 3 indicates that A * with a potential heuristic solves more tasks than GHSETA * with the corresponding operator-potential heuristic mostly in domains where − → b performs worse than A * with potential heuristics.Nevertheless, GHSETA * performs at least as good as (and usually better than) the corresponding A * with potential heuristics in an overwhelming majority of domains.The comparison to − → b in Table 2 shows that enhancing symbolic search with operator-potential heuristics greatly increases the overall number of solved tasks, and it is rarely detrimental.Table 3 shows a great synergy between the methods across the whole benchmark set.This suggests that the partitioning of operators by their operator-potentials induces a compact representation of sets of states using BDDs which can also be observed in Figure 7.
Figure 7a shows that the size of BDDs measured as the number of nodes of the BDDs consistently decreases when the operator-potential heuristic is used, and this, as expected, leads to a speedup per expanded BDD (Figure 7b) as most operations on BDDs are polynomial in the number of BDD nodes.Figure 7c shows that the number of expanded BDDs (sets of states) increases, which, again, is expected, because the sets of states during the search are partitioned not only by g-values but also h-values.Nevertheless, the overall search effort is reduced, which can be observed in Figure 7d comparing the number of BDD nodes from all expanded BDDs, and in Figure 7e showing the overall runtime in seconds.As the plots show, the number of total BDD nodes across all BDDs involved in the search is rarely significantly increased when using operator-potential heuristics.This shows that operator-potential heuristics can overcome the limitations of other "more informed" heuristics that do not induce a good BDD representation of sets of states during the Table 3: Per-domain comparison of the number of solved tasks for the forward GHSETA * with h Q fw , A * with potential heuristics, and forward and bi-directional blind symbolic search.The row "others" sums over domains with exactly the same number of solved tasks by all compared methods."+" indicates that GHSETA * solved every task that was solved by A * with the same potential heuristic; " " indicates that GHSETA * solved every task solved by − → b ; and "⊕" indicates the combination of both + and occurring at the same time.Finally, we highlight in blue (⊕) the cases where GHSETA * has strictly more coverage than any of the two methods it combines.
search (Speck et al., 2020a).In terms of runtime, this translates into speed ups of up to several orders of magnitude, while being detrimental in very few cases.
Overall, it seems GHSETA * with operator-potential heuristics tends to get the best from both the symbolic search and the heuristic search with potential heuristics.Furthermore, one can observe that the combination of symbolic search and operator-potential heuristics is often better than the sum of its parts, i.e., in many domains the combination solves every task solved by any of the two techniques it combines, and it achieves a strictly higher coverage than the best of them.

Backward Search
Goal-corrected operator-potential backward heuristics require partitioning of goal states by their heuristic values.Algorithm 2 from Section 3.3 can provide such partitioning as a set of BDDs, but it can have exponential runtime and it can generate an exponential number of partitions in the worst case.
Table 4 shows the number of tasks per domain where Algorithm 2 did not finish partitioning of goal states within the time limit (the memory limit was never an issue).The partitioning is not possible in only 900 a relatively small number of tasks from the IPC domains and it is highly dependent on the domain and operator-potential heuristic.Moreover, only three tasks where the partitioning failed could be solved by some variant of the backward symbolic search.Table 4: The number of tasks in which the partitioning of goal states using Algorithm 2 failed.The row "others" sums over domains where there is no difference between the compared methods.5: "Domain Dominance": the row x and column y shows the number of domains where the method x solved more tasks than the method y. "Task Dominance": the row x and column y shows the number of tasks solved by x but not by y. "tot": the overall number of solved tasks (coverage).

Domain Dominance Task Dominance
failed to determine partitioning in one task from the tpp domain which was solved by ← − I .Overall, partitioning of goal states fails (with very few exceptions) only in tasks that cannot be solved by any variant of backward symbolic search.The median runtime of Algorithm 2 for all variants is about 1 millisecond, and the averages are 8.3, 3.5, 2.6 and 5 seconds for ← − I , ←− A + I, ← −− − S 1k + I and ←−− M 2 + I, respectively.Therefore, running partitioning of goal states does not seem to be a limiting factor in practice.
To see how many partitions we get for a different operator-potential heuristic, i.e., how many different heuristic values goal states have, we plot cumulative graphs in Figure 8a showing the number of tasks (on y-axis) having at least the number of goal BDD partitions given on x-axis.The median of the number of partitions is 5 for I, and 1 for A + I, S 1k + I, M 2 + I.The average is 42.1, 3.8, 17.8 and 3.8 for I, A + I, S 1k + I and M 2 + I, respectively.The number of tasks with 10 (100) or less partitions is 1 007 (1 438) for I, 1 503 (1 580) for A + I, 1 311 (1 546) for S 1k + I, and 1 469 (1 564) for M 2 + I. So, in a majority of tasks the partitioning results   in a very low number of partitions, it rarely happens that the number of partitions exceeds 100, and the optimization for the initial state (I) tends to generate more partitions than other methods.The number of partitions is also highly domain-dependent, and larger tasks tend to have more goal partitions than smaller tasks from the same domain.
Figure 8b shows the size of the representation of goal states as a cumulative graph similar to Figure 8a, i.e., it shows the number of tasks (y-axis) where the sum of the number of BDD nodes over all goal BDDs is at least the number given on the x-axis.Note that the number of goal states in every task is the same for all variants of operator-potential heuristics, but the partitioning of goal states may be different.The graph shows that a higher number of goal BDDs results in a less concise representation of the underlying states which is not surprising.Nevertheless, the difference between the size of representations for different operator-potential heuristics seems to be less profound than the difference between the number of partitions.Table 5 compares GHSETA * with different variants of h Q bw and the blind backward symbolic search in terms of the number of domains where one method solved more tasks than the other ("Domain Dominance"), and the number of tasks solved by one method but not the other ("Task Dominance").On one hand, we can observe that using operator-potential heuristics I, A + I, and M 2 + I instead of blind search is beneficial in more domains than it is detrimental, ← − I solves more tasks overall than ← − b , and the overall coverage of ← − b is almost the same as ←− A + I and ←−− M 2 + I. On the other hand, all methods using h Q bw seem to be complementary to the blind search and to each other in terms of both domain and task dominance.Nevertheless, ← − I stands out in this comparison as it not only solves more tasks than any other method, but is also superior to all other methods in the number of dominated domains and tasks.Moreover, we were not able to ascertain any correlation between the number of goal partitions and the number of solved tasks.
Similarly to the forward search, we can also observe in this case that the average size of an expanded BDD is often decreased (Figure 9a), which leads to a speedup per expanded BDD (Figure 9b).The number of expanded BDDs is also increased (Figure 9c) as expected because of the partitioning of states by both gand h-values.Where the backward search significantly differs from its forward counterpart is the comparison of the size of the BDD representation of sets of states (Figure 9d) and the overall runtime of backward search (Figure 9e) which show complementarity of the blind backward search and the backward search with  oracle 1 130 1 185 1 136 1 159 1 152 1 150  1 204 Table 6: Overall number of tasks solved by different combinations of forward and backward symbolic search.A value in the row x and the column y is the overall number of solved tasks where x was used for the forward direction and y for the backward direction of the symbolic bi-directional search.∅ means that no forward or backward search was used.oracle refers to selecting the best variant for each task for the respective search direction.The highest coverage of all non-oracle variants is in bold.Table 7: "Domain Dominance": the row x and column y shows the number of domains where the method x solved more tasks than the method y. "Task Dominance": the row x and column y shows the number of tasks solved by x but not by y. "tot": the overall number of solved tasks (coverage).

Domain Dominance Task Dominance
operator-potential heuristic rather than a clear-cut improvement thanks to more informed search as was the case for the forward symbolic search (cf. Figure 7d and 7e).

Bi-directional Search
Symbolic bi-directional search allows us to combine any variant of the operator-potential heuristics in forward and backward directions.Table 6 shows the comparison of the overall number of solved tasks for all variants of forward and backward GHSETA * with operator-potential heuristics and blind symbolic search.The baseline of blind bi-directional symbolic search ( ← → b ), which was the state-of-the-art variant of symbolic search until now, solved 1 055 tasks, but all variants of GHSETA * using an operator-potential heuristic other than I in the forward direction overcome this result-even the forward-only GHSETA * .The table also shows that using operator-potential heuristics in the forward direction has much bigger impact on the overall number of solved tasks than operator-potential heuristics in the backward direction.This is in-line with our previous findings regarding forward-only and backward-only GHSETA * .
Table 7 provides even more insights as it compares selected bi-directional variants with operator-potential heuristics with the best-performing variants of the forward-only and backward-only GHSETA * with operatorpotential heuristics and the blind variants of symbolic search in all directions.For the bi-directional GHSETA * , we selected the best-performing variant −→ A + I-← − b and two more best-performing variants among those that do not use the blind backward search.The bi-directional variants with operator-potential heuristics are clearly superior to the forward-only and backward-only GHSETA * and blind variants in the overall number of solved tasks, in the number of domains where one variant solves more tasks than the other, and the number of tasks solved by one variant but not the other.The only exception is A + I in more domains than the other way around (although it is still worse in the overall numbers).(cf. Figure 7).This is not surprising as the difference between the methods is exactly replacing blind forward search with −→ A + I, which was shown to greatly improve performance in the unidirectional search case.The bottom row of Figure 10 comparing I shows that replacing b with I in the backward direction of the bi-directional search has much less profound effect than in the backward-only search (cf. Figure 9).Nevertheless, we can still observe that using operator-potential heuristics in the backward direction (instead of blind search) results in more expanded BDDs (sets of states) because I induces more fine-grained partitioning of sets of states (Figure 10c, bottom), and we can see that the size of the expanded BDDs tends to be smaller with I (Figure 10a, bottom).However, as we already noted, −→ A + I seems to be more complementary with ← − b than with ← − I which results in a better overall performance of Table 8: "Domain Dominance": the row x and column y shows the number of domains where the method x solved more tasks than the method y. "Task Dominance": the row x and column y shows the number of tasks solved by x but not by y. "tot": the overall number of solved tasks (coverage).For cgm, we considered only the subset of domains supported by the planner, i.e., both the row and columns for cgm disregard domains caldera, cavediving, GED, maintenance, movie, mprime, snake, spider, termes, and trucks.

Comparison to State-of-the-Art
What remains is the comparison to other state-of-the-art planning methods.From the newly proposed methods, we consider the best variants of GHSETA * in all directions ( , and the bi-directional GHSETA * combining the best forward and backward variant ( −→ A + I-← − I ).We compare those to heuristic statebased planners (lmc, ms, comp2, and scrp), the blind bi-directional symbolic search ( ← → b ), and the symbolic search with pattern databases (cgm).Table 8 compares the methods by counting the number of domains where one method solved more tasks than the other, and the number of tasks solved by one method but not the other.
I perform better than any other compared method in the overall number of solved tasks.Since cgm performs worse than the blind bi-directional symbolic search, it is not surprising it performs much worse than our best methods.Moreover, there is not much complementarity between cgm and our best methods, i.e., there are only few domains or tasks where cgm performs better than A similar picture can be observed with the heuristic planners lmc and ms: They perform significantly worse than our best methods and there is not much complementarity.
scrp and comp2 both solve less tasks than our best methods overall, but they also seem to be complementary to our approach.In particular, the number of domains where scrp performs better than our methods is higher than the other way around.In fact, scrp is the best-performing planner (from the ones compared here) in 28 domains, whereas −→ A + I-← − b is the best-performing planner in 22 domains ( −→ A + I-← − I in 14 domains, and −→ A + I in 15 domains).Nevertheless, the overall numbers are in favor of the bi-directional GHSETA * with operator-potential heuristics, and the difference seems to be spread over large number of structurally different domains.
In terms of runtime, Figure 11 shows a comparison of our best approach, −→ A + I-← − b , against several explicitstate search planners.The comparison with scrp and comp2 is not very insightful as they use an expensive preprocessing phase to compute the heuristic with a fixed time limit of 900 and 300 seconds, respectively.Due to this, −→ A + I-← − b is faster in the majority of instances.The comparison with lmc and ms shows that, despite the huge advantage in coverage of −→ A + I-← − b , there are still a number of instances solved faster by these planners.This is partially due to the preprocessing phase of −→ A + I-← − b , which requires some time to compute the potential heuristics and initialize the data-structures to perform symbolic search.But overall, −→ A + I-← − b is still up to several orders of magnitude faster on many instances.

Conclusion
Heuristic state space search and symbolic search are complementary enhancements to the same basic algorithm-state space search-through the use of heuristic search guidance functions h, and of compact state-set representations, respectively.It is natural to combine both approaches, yet that combination has not been an unqualified success.One key reason for this is that, in symbolic search, h must be (efficiently) applicable to sets of states rather than individual ones.Here we show that potential heuristics can be reformulated in a manner allowing to do just that.The resulting methods empirically do not tend to suffer from the second key problem (detrimental state partitionings).They soundly beat the previous state of the art in symbolic search for optimal planning; they are on par with, as well as highly complementary to, the state of the art in optimal heuristic search planning.
This result boosts our ability to plan optimally, and it re-emphasizes the role of symbolic search, in particular heuristic symbolic search, as part of the state of the art, suggesting that further research effort may be well placed in this area which recently received scant attention.
A specific question opened up by our research is whether the key to our method-the transformation of heuristic values into a sum of heuristic-value changes per operator-may be applicable to other kinds of heuristic functions as well.For example, for abstraction heuristics this is certainly not true per se, as the change in heuristic value (abstract goal distance) is highly dependent on the state in general.But perhaps abstractions can be designed so as to avoid that phenomenon.Similarly, it may be possible to adjust the design of other admissible estimators, like landmark heuristics, to this end.
Beyond this, there is a number of issues that our work sheds light upon, and that would be worth being explored more broadly.Our experimental analysis shows that potential heuristics defined in previous work obtain great performance in forward search, and a mild improvement in backward search.This suggests to investigate what kind of potential heuristics can do better in each direction, as well as defining new optimization criteria for potential heuristics, and/or investigating whether higher-dimensional potential heuristics (Pommerening et al., 2017) can further improve performance.More generally, a key issue is to improve our understanding of what makes a heuristic good in symbolic search.Operator-potential heuristics offer a clear positive example, in contrast to previous analysis (Speck et al., 2020a).A promising avenue of research is to characterize what kind of heuristics induce good state partitionings for sets of states represented as BDDs, and how the choice of representation (e.g., using EVMDDs instead (Lai et al., 1996;Ciardo & Siminiceanu, 2002;Speck et al., 2018)) affects the usefulness of different heuristic functions.Another promising line of research is on how to best make use of heuristics in symbolic bi-directional search.In recent years, there have been significant advances in explicit state bi-directional heuristic search (Holte et al., 2017;Shaham et al., 2019;Shperberg et al., 2020;Alcázar et al., 2020;Alcázar, 2021), which could shed light on how to make the most out of heuristics in the symbolic search case too.Typically, we would like to use the largest value satisfying the inequality, i.e., as that leads to the most informative heuristic.However, it is still safe to use lower values if that is convenient for some reason.For example, this allows us to compute potential functions with floating-point operator potentials and round them down to the nearest integer.Also note that if the input planning task is normalized, then Next, we show that Q(o) (constructed from the potential function P) is a lower bound on the change of the heuristic value of h P fw induced by the operator o.
Proposition 17.Let Q denote a general operator-potential function for P, s ∈ S fw denote a forward reachable state, and let o ∈ O denote an operator applicable in s.Then f ∈s P(f ) + Q(o) ≤ f ∈o s P(f ).
Proof.We prove the case where Q It is easy to see that vars(x) = vars(eff(o)) and from the definition of disambiguation it follows that for every V ∈ vars(x) it holds that x[V ] ∈ D o (V ).Therefore, for every V ∈ vars(x), we have that P(x[V ]) ≤ max f ∈Do(V ) P(f ), which concludes the proof.
Unlike Q in case of normalized FDR tasks, Q for non-normalized tasks does not necessarily capture the change of the heuristic value exactly, even if we take the maximal value satisfying the inequality in Equation A.2. Consider the simple non-normalized planning task in Figure A.1 and states "aX" and "bX", and the operator o 1 .The h P fw -value for "aX" is zero and h P fw -value for "bX" is one, i.e., the change of the heuristic value is one, but Q(o 1 ) = 0 (as Equation A.2 needs to account also for the self-loop resulting from applying o 1 in the state "bX").

Forward Direction
Path-dependent operator-potential forward heuristics are defined analogously to their state-dependent counterparts (Definition 6), except that here we use Q instead of Q.
Definition 18.Let Q denote a general operator-potential function for P. A path-dependent operatorpotential forward heuristic h Q fw : E fw → R ∪ {∞} for Q is defined as .4)for any sequence of operators π = ⟨o 1 , . . ., o n ⟩ applicable in I.
Observe that h Q fw can, indeed, be path-dependent for non-normalized planning tasks.Consider the example planning task in Figure A.1 again.The goal state "bY" can be reached from the initial state "aX" by two different paths, π = ⟨o 1 , o 2 ⟩ and π ′ = ⟨o 2 , o 3 ⟩, and we obtain two different heuristic values for π and π ′ .Namely, h Q fw (π) = h P fw (I) + Q(o 1 ) + Q(o 2 ) = −1 and h Q fw (π ′ ) = h P fw (I) + Q(o 2 ) + Q(o 3 ) = 0. Next, we show that if the underlying potential heuristic h P fw is forward admissible, then the path-dependent operator-potential forward heuristic h Q fw is also forward admissible.This follows simply from the fact that Q provides lower bounds on the change of the heuristic value of h P fw induced by each operator.So, if we start from the admissible estimate for the initial state, then adding these lower bounds results in an admissible estimate for all forward reachable states.
Theorem 19.If h P fw is forward admissible, then h Q fw is forward admissible.Proof.We need to prove that for any π = ⟨o 1 , . . ., o n ⟩ ∈ E fw and s = π I it holds that f ∈I P(f ) + i∈[n] Q(o i ) ≤ f ′ ∈s P(f ′ ), which we will prove by induction.
From the assumption f ∈I P(f From Proposition 17, it follows that f ∈s ′ P(f ) + Q(o k+1 ) ≤ f ∈o k+1 s ′ P(f ).Therefore, we have which concludes the proof.
This allows us to use operator-potential heuristics also for non-normalized planning tasks, but, as we noted before, it almost always pays off to normalize planning tasks and use h Q fw (which is forward consistent) instead of using the path-dependent variant h Q fw with the original planning task.

Backward Direction
Using Q instead of Q to compute path-dependent operator-potential backward heuristics allows us to use it also for non-normalized planning tasks and without goal-splitting.Note that in the case of path-dependent heuristics we also require Equation (A.1) to hold for the potential heuristic P, which is necessary to ensure backward admissibility of the heuristics.
Definition 20.Let Q denote a general operator-potential function for P such that Equation (2) and Equation (A.1) hold.A path-dependent operator-potential backward heuristic h Q bw : for every sequence of operators π = ⟨o 1 , . . ., o n ⟩ ∈ E bw , i.e., for every s-plan π.Symbolic Search with Path-Dependent Heuristics Algorithm 1 describes the GHSETA * algorithm when assuming consistent heuristics.However, pathdependent heuristics may create inconsistencies, leading to states being expanded with sub-optimal g-values.This, in turn, may lead to states being incorrectly pruned in lines 8 or 13, and the algorithm could return a sub-optimal plan.However, the algorithm can easily be adapted to support inconsistent (and path-dependent) heuristics by re-expanding any state if it is reached again with a lower g-value.This requires to replace the closed set (which is used in Algorithm 1 to hold all expanded states) by multiple subsets, closed g , each of which contains the set of all states expanded with the corresponding g-value.Therefore, in line 11, states are inserted in closed g .And in lines 8 and 13, we loop over all g ′ ≤ g to remove any state in closed g ′ and closed g ′ +c , respectively.

Results
We evaluate symbolic search with state-dependent (h Q fw /h Q bw ) and path-dependent ( h Q fw / h Q bw ) operatorpotential heuristics.The state-dependent variant is the one we analyzed in detail in Section 5: It requires to normalize planning tasks and partitioning of goal states (in case of backward symbolic search).For pathdependent heuristics, we avoid normalizing the task and partitioning of goal states.But, in exchange, we need to re-expand states within the GHSETA * algorithm as explained above.Comparison of the path-dependent operator-potential backward heuristics on the original planning tasks ( h Q bw -orig), on the tasks normalized with the "multiplication" method but without goal splitting ( h Q bw -norm), and the operator-potential backward heuristics on normalized tasks with goal-splitting (h Q bw ).We compare the number of domains where one method solved more tasks than the other, the number of tasks solved by one method but not the other, and the overall number of solved tasks.Table A.1 summarizes the comparison between the two variants in forward search.There are at most three tasks where using h Q fw is beneficial to increase coverage, whereas h Q fw performs better in 37 to 44 tasks, Table A.2 shows results for backward search.Here, we include as well results for the path-dependent heuristic on the normalized task (but without goal-splitting), to better understand the effect that task normalization and goal-splitting has on the overall performance.

− → I -
As in the previous case, using path-dependent variant of operator-potential backward heuristics is rarely beneficial over using the state-dependent variant with goal-splitting h Q bw .Normalizing the task is actually an advantage, due to improving the informativeness of the operator-potential heuristics (as the max expression in Equation (A.2) is basically an admissible approximation).
Finally, as for h Q fw and h Q bw , h Q fw and h Q bw can be combined into bi-directional symbolic search, but as in the previous cases, it rarely pays off to use the path-dependent variant of the heuristics.Table A.3 summarizes the comparison on selected variants.Overall, we observe that using the consistent state-dependent operatorpotential heuristics is mostly beneficial compared to the same heuristics without normalizing the task.
evaluates to exactly the same value for all s-plans π = ⟨o 1 , . . ., o n ⟩.Lemma 10.Let Q denote an operator-potential function for P, let s ∈ S bw denote a backward reachable state, let s g ⊇ G and s ′ g ⊇ G denote two goal states, and let π = ⟨o 1 , . . ., o n ⟩ ∈ E bw and π
This follows from the fact that B is built from the union of sets of states {s ∈ S | f ∈ s} where f ∈ D G (V i ).(I2) For every ⟨h, B⟩ ∈ M and every s ∈ B it holds that h = P(⟨V i , s[V i ]⟩).This holds because InsertOrUpdate inserts a set of states B = {s ∈ S | f ∈ s} with h = P(f ) only if the value h is not yet in M , and it replaces ⟨h, B⟩ with ⟨h, B ∪ B ′ ⟩ where B ′ = {s ∈ S | f ′ ∈ s} only if P(f ′ ) = h.
Fišer et al. (2020, Section 5.1) as the optimization criteria opt k M which we use for k = 2.The blind symbolic search is denoted by b.The forward symbolic search is denoted by − → • , and the backward symbolic search by ← − • : For example, the blind forward search is denoted by − → b , the backward search with h Q bw optimized for A + I is denoted by ←− A + I, and the bi-directional search with h Q fw optimized for

Figure 4 :
Figure 4: Per-task comparison of the time in seconds needed for solving LP formulations of potential functions I, A + I, M 2 + I, and S1k + I (on horizontal axis), and the corresponding MIP formulations of operator-potential functions (on vertical axis).

Figure 5 :
Figure 5: Cumulative graphs comparing solving MIP and LP variants of (operator-) potential heuristics.Only tasks where both MIP and LP was solved within time and memory limits are considered.

FigureFigure 6 :
Figure5bdepicts the runtime in absolute numbers as a cumulative graph of the number of tasks over the runtime of the MIP variant.It shows that the operator-potential function can be found within one second ← − I failed to compute partitioning of goal states in one task in airport which was solved by ←− A + I, ← −− − S 1k + I, ←−− M 2 + I and ← − b , and one task in tetris which was solved by ←− A + I,

Figure 8 :
Figure 8: Cumulative graphs comparing the number of goal BDDs and the number of nodes they consist of.

Figure 11 :
Figure 11: Per-task comparison of the runtime (in seconds) of the best variant of GHSETA * against the explicit state search methods scrp and comp2.

Figure A. 1 :
Figure A.1: Example planning task Π = ⟨V, O, I, G⟩ showing path dependency of operator-potential heuristics for non-normalized FDR tasks.Q is computed using Equation (A.3).
is a partial state called goal, and a state s is a goal state iff G ⊆ s. S denotes the set of all states.Let p, t be partial states.We say that t extends p if p ⊆ t.O is a finite set of operators, each operator o ∈ O has a precondition pre(o), prevail condition prv(o), and effect eff(o), which are partial states over V, and a cost cost(o) ∈ R + 0 .For every operator o ∈ O it holds that vars(pre(o)) ⊆ vars(eff(o)), and vars(pre(o)) ∩ vars(prv(o)) = ∅, and vars(prv(o)) ∩ vars(eff(o)) = ∅, i.e., preconditions are defined only over affected variables, preconditions and prevail conditions are defined over a different set of variables, and prevail conditions cannot be defined over any affected variable.We also assume that pre(o)[V ] ̸ = eff(o)[V ] for every V ∈ vars(pre(o)) ∩ vars(eff(o)).
An operator o is applicable in a state s iff prv(o) ∪ pre(o) ⊆ s.The resulting state of applying an applicable operator o in a state s is another state o s such that o s ⋆ fw (s) for every forward reachable state s ∈ S fw ; (b) goal-aware if h fw (s) ≤ 0 for every forward reachable goal state s; and (c) forward consistent if h fw (s) ≤ h(o s ) + cost(o) for all forward reachable states s ∈ S fw and operators o ∈ O applicable in s.
the optimal s-plan or to ∞ if s is a forward dead-end state.A forward heuristic h fw is called (a) forward admissible if h fw (s) ≤ hA backward heuristic h bw : S bw → R ∪ {∞} estimates the cost of optimal I-s-paths.The optimal backward heuristic h ⋆ bw (s) maps each backward reachable state s to the cost of the optimal I-s-path or to ∞ if s is a backward dead-end.A backward heuristic h bw is called(a) backward admissible if h bw (s) ≤ h ⋆ bw (s)for every backward reachable state s ∈ S bw ; (b) init-aware if h bw (I) ≤ 0; (c) backward consistent if h bw (o s ) ≤ h bw (s) + cost(o) for all backward reachable states s ∈ S bw and operators o ∈ O such that o is applicable in s and o s is backward reachable.

Table 1 :
Comparison of the poly and mult methods in terms of the number of domains where one method solved more tasks than the other, the number of tasks solved by one method but not the other, and the overall number of solved tasks.
I A + I S1k + I M2 + I TableA.1:Comparison of the path-dependent operator-potential forward heuristics ( h Q fw ) on the original planning task, and the (consistent) operator-potential forward heuristics (h Q fw ) on the normalized tasks.Whenever the original planning task is already normalized, h Q fw is used as a consistent heuristic, i.e., it equals to h Q fw .We compare the number of domains where one method solved more tasks than the other, the number of tasks solved by one method but not the other, and the overall number of solved tasks.Theorem 21. h Q bw is backward admissible.Proof.Any estimate for a backward dead-end state is admissible.Let π = ⟨o 1 , ..., o n ⟩ denote a plan, andlet m ∈ [n].It is enough to show that f ∈I P(f ) + i∈[m+1,n] Q(o i ) ≤ j∈[m] cost(o j ).We show this forQ(o) = f ∈eff(o) P(f ) − V ∈vars(eff(o)) max f ∈Do(V ) P(f ),and it is clear that the inequality holds for lower values of Q(o).Since Equation (2) and Equation (A.1) hold, it follows from Theorem 15 that h P fw is forward consistent, goal-aware, and forward admissible, and therefore h Q fw is forward admissible (Theorem 19).Therefore, we have that h Q fw (π I ) = I A + I S1k + I M2 + I Table A.2: Table A.3: Same as Table A.1 and Table A.2, but for bi-directional symbolic search.Note that in the case of h Q fw -h Q bw -norm, we use inconsistent h Q bw , but forward consistent h Q fw as the tasks are already normalized.