Action Graphs for Performing Goal Recognition Design on Human-Inhabited Environments †

Goal recognition is an important component of many context-aware and smart environment services; however, a person’s goal often cannot be determined until their plan nears completion. Therefore, by modifying the state of the environment, our work aims to reduce the number of observations required to recognise a human’s goal. These modifications result in either: Actions in the available plans being replaced with more distinctive actions; or removing the possibility of performing some actions, so humans are forced to take an alternative (more distinctive) plan. In our solution, a symbolic representation of actions and the world state is transformed into an Action Graph, which is then traversed to discover the non-distinctive plan prefixes. These prefixes are processed to determine which actions should be replaced or removed. For action replacement, we developed an exhaustive approach and an approach that shrinks the plans then reduces the non-distinctive plan prefixes, namely Shrink–Reduce. Exhaustive is guaranteed to find the minimal distinctiveness but is more computationally expensive than Shrink–Reduce. These approaches are compared using a test domain with varying amounts of goals, variables and values, and a realistic kitchen domain. Our action removal method is shown to increase the distinctiveness of various grid-based navigation problems, with a width/height ranging from 4 to 16 and between 2 and 14 randomly selected goals, by an average of 3.27 actions in an average time of 4.69 s, whereas a state-of-the-art approach often breaches a 10 min time limit.


Introduction
Through the deployment of numerous IoT sensors, smart environments can attempt to recognise the goal of a human from the actions they perform, and thus become more context-aware. Despite recent advances in Goal Recognition (GR) techniques [1,2], a person's goal often cannot be determined until their plan nears completion. This is because the plans to reach different goals can initially be identical. Nonetheless, recognising a human's goal promptly is important in many situations. In environments where security is essential, such as airports [3], improved goal distinctiveness can allow security personnel to intercept a person sooner. In a kitchen environment [4,5], by minimising the number of observations required to recognise a human's goal, a robot can provide earlier assistance [6]. In this grid-based navigation example, a human can move horizontally and vertically. There are multiple optimal plans to each of the goals, but for readability only a single plan to each goal is shown (indicated with arrows). On the left figure, the plans with the longest non-distinctive prefix are displayed. Reproduced from prior work [11].
GRD is a more complex problem than task planning. In task planning the aim is (normally) to find an optimal plan to reach a goal, whereas in GRD there are multiple goals defined and all optimal plans containing non-distinctive prefixes must be found before the environment can be redesigned. Current approaches to GRD [12][13][14][15] usually focus on removing the ability to perform actions or placing order constraints on the actions. To our knowledge, this paper is the first to propose state changes that cause actions in the plans to be replaced, which could result in the length of a plan as well as its non-distinctive prefix changing. As the lengths of the plans can change, we propose a new metric, complementary to WCD, to measure the distinctiveness of an environment.
Our novel approach to GRD transforms a problem defined in Planning Domain Definition Language (PDDL), a popular domain-independent language to model the behaviour of deterministic agents, into an Action Graph. Non-distinctive plan prefixes are extracted from the graph, and processed to determine which actions should be removed or replaced to increase the goals' distinctiveness. Two methods for selecting which actions to replace, namely exhaustive and a less computationally expensive method Shrink-Reduce, are introduced and compared to one another. An overview of our approach is provided in Figure 2. Conceptual overview of our novel approach to goal recognition design (GRD). An Action Graph is created from a GRD problem defined in PDDL. This Action Graph is traversed to identify non-distinctive prefixes. In the third step, modifications are performed to reduce the non-distinctive prefixes, leading to a redesigned environment with an increased goal distinctiveness.
In our previous work [11], we described a new method for generating an Action Graph for navigation problems. In the current paper, we introduce a new distinctiveness metric, expand the graph creation method to domains with fewer constraints on the order of actions, provide a new algorithm to find the non-distinctive prefixes, and develop a method to change the state of the environment so that actions are replaced. Our approach is not applicable to domains in which cycles exist within the goals' plans, as Action Graphs are acyclic and no action is repeated within its structure. Moreover, the world is assumed to be fully observable; however, after redesigning the environment fewer actions tend to require observing. The algorithms are evaluated on a grid-based navigation domain, and on a kitchen domain developed by Ramírez and Geffner [7] from the work of Wu et al. [16]. While a domain from [7] is used, a comparison to their approach is not provided as their approach performs GR and not GRD.
Section 2 provides an overview of related work. Section 3 states the formal definition of GRD problems and briefly mentions the GR methods that inspired our approach. The metrics to measure the distinctiveness of an environment are presented in Section 4. Section 5 describes the structure and construction of an Action Graph. The algorithm to find all the non-distinctive plan prefixes in an Action Graph is introduced in Section 6. Section 7 describes how the actions to replace are selected. How actions are removed to increase the distinctiveness is described in Section 8. Finally, we present experimental results in Section 9, by measuring the computational efficiency of our Shrink-Reduce action replacement heuristic and comparing our action removal method to a state-of-the-art approach [10].

Related Work
The term goal recognition design was coined by Keren et al. [10]. In their approach the WCD of a problem is calculated by transforming a GRD problem into multiple planning problems containing pairs of goals. An optimal plan, with the longest possible non-distinctive prefix, to each pair of goals is searched for. The longest non-distinctive plan prefix, across all joint plans, is the WCD. To reduce WCD an increasing number of actions are removed until either the WCD is 0, or the search space has been exhausted, in which case the best environment design discovered so far is returned. Their pruned-reduce method improves the computational efficiency by reducing the number of action combinations whose removal requires testing. We compare our solution to their pruned-reduce algorithm, and show that for navigation domains we have greatly reduced the time required to solve goal recognition design problems. Further to removing actions, their approach has been extended to action conditioning, in which a partial ordering is enforced [15]. We do not investigate enforcing action ordering, as our focus is on human-inhabited environments, and it is difficult to force action order constraints on humans; but further to their work, we explore domains in which modifications to the state cause actions within a plan to be replaced and that result in the plan's length changing. Their approach has also been extended for non-optimal agents [17], and to determine where to place sensors within the environment [18]. In this paper we assume the agent/human is optimal and do not investigate sensor placement.
Wayllace et al. [12,14] investigate goal recognition design involving stochastic action outcomes with Markov Decision Processes (MDPs). To calculate WCD, a MDP is created for each goal, the states that are common to pairs of goals are discovered and the Bellman equation is used to calculate the cost of reaching a state. To reduce the WCD their algorithm removes a set of actions, checks that the optimal cost to reach a goal has not been affected, and calculates the WCD to find out if it has been reduced. Their approach creates a MDP multiple times for each of the goals, which results in large computational costs. In our approach a single model (Action Graph) is created, which contains the actions to reach all goals, moreover it is only created once.
In [19], a new metric is introduced, namely expected-case distinctiveness (ECD), which is applicable to stochastic domains and solves a shortcoming of WCD. WCD only incorporates knowledge of the least distinctive pair of goals, rather than all goals' distinctiveness. To solve this, ECD is calculated recursively, starting from the actions applicable to the initial state and ending with the actions furthest from that state (i.e., that result in a goal state). The resulting ECD is the sum of all weighted plan lengths, with the weights based on prior probabilities of an agent choosing a certain goal. We also introduce a new metric which addresses the mentioned shortcoming of WCD, but for a deterministic setting. Moreover, our metric also accounts for the length of both the plan and non-distinctive prefix being altered during the environment design process.
Son et al. [13] propose an approach based on Answer Set Programming (ASP), as an alternative to PDDL, to reduce the computational cost. Their results only show a maximum of two actions being removed, which greatly limits how much WCD can be reduced. Our approach takes PDDL as input, as we build on the work from [20], which performed well (i.e., computational time and accuracy) on goal recognition problems.
Planning libraries were provided as input to the goal recognition design method by Mirsky et al. [21]. Further, they introduce plan recognition design, in which the aim is to make the plans, rather than goals, distinctive. They present a brute-force method, which gradually removes a higher number of rules from the planning library, and a constraint-based search method. Their constraint-based search attempts to remove different combinations of rules within the non-distinctive plans starting from the rules contained within the least distinctive plans. Like Mirsky et al. [21] and Keren et al. [15], we developed an exhaustive strategy and less computationally expensive method but, rather than preventing actions, removing rules or constraining the order actions must be performed in, our approach changes the state of the environment in such a way that the actions within plans are replaced by others (with the same effects), which could affect the length of the plans as well as the non-distinctive prefixes.

Background
To clarify the similarities and differences between symbolic task planning, goal recognition and goal recognition design problems, their formal definitions are provided below. The formal notation will be stated during the description of our GRD method. As our work is closely related to GR, this section also introduces the GR approaches that inspired our solution to GRD.

Formal Definition
Task planners (e.g., Fast Downward [22]) search for a sequence of actions (i.e., task plan) that changes the current world state into a desired goal state. Formally, a planning problem P can be defined as P = (F, I, A, G), where F is a set of fluents, I ⊆ F is the initial state, G ⊆ F is a goal state, and A is a set of actions along with their preconditions a pre ⊆ F and effects (a add ⊆ F, a del ⊆ F) [7,10,23,24].
Actions add and delete fluents from states, and action a is applicable to the state s if a pre ⊆ s. In this paper, effects are denoted a e f f , fluents are interpreted as variables with values, and thus actions change the values of variables, e.g., change the value of a variable that represents a human's location, or change variables that indicate which items have been taken from cupboards from false to true.
GR is often viewed as the inverse of planning, i.e., T = (F, I, A, O, G) where G is the set of all possible goals and O is a sequence of observations [7,25]. Similarly, a GRD problem can be defined as D = (F, I, A, G) [10]. The aim of our GRD method is to find the world model (I) that maximises the distinctiveness of the goals G, by reducing the length of the non-distinctive plan prefixes. This results in fewer observations (|O|) being required to determine which of the goals (G) a human is intending to reach.

Goal Recognition
Activity recognition [16], plan recognition [26] and goal recognition [7,8] can all be referred to as intention recognition; nevertheless, their aims slightly differ. Activity recognition labels sensor data with which activity (or action) a human is currently performing, e.g., switching on a kettle. While several previous works have used the terms plan and goal recognition interchangeably, we consider them to be distinct. A plan recogniser attempts to discover the sequence of actions (or possibly hierarchy of actions) a human is performing, including their possible future actions. GR methods aim to label a sequence of discrete observations (e.g., actions), with which (high-level) goal they belong to [27]. For instance, when provided with a sequence of move actions GR methods will attempt to select (from a predefined list) which location the agent is intending to reach. For the kitchen domain by Ramírez and Geffner [7], the sequence of observed actions includes taking different items and performing activities (e.g., making toast), and the returned classification indicates if a person is making breakfast, dinner or a packed lunch.
Methods for GR can be broadly categorised as data-driven and knowledge-driven methods [4,28]. Data-driven approaches train a recognition model from a large dataset [4,[29][30][31]. The main disadvantages of this method are that often a large amount of labelled training data is required, and the produced models often only work on data similar to the training set [32,33]. Knowledge-driven approaches to GR rely on a logical description of the actions agents can perform. They can be further divided into approaches that search through a library of predefined plans (also known as "recognition as parsing" [26,27,34]) and approaches that solve a symbolic recognition problem, i.e., "recognition as planning" [2,7,27]. In our work, a graph structure is generated from a problem definition in PDDL; this graph structure is similar to those used by some recognition as parsing methods.
Recognition as parsing tends to be fast and allows multiple concurrent plans to be detected [34][35][36][37][38], but is often considered to be less flexible [7,25] because a planning library containing all actions and their orderings must be developed a priori. A planning library is usually formulated in a hierarchical structure, which includes abstract actions along with how they are decomposed to concrete (observable) actions. The link between parsing text and hierarchical plan recognition structures was suggested by [39], and since then, many plan recognition as parsing algorithms have been developed, e.g., [34,37,40].
In [41] a Temporal AND-OR tree was constructed from a library of plans to determine which objects a human will navigate to. Our method of representing a (human) agent's actions in an Action Graph is inspired by this AND-OR tree; however, the construction of our graph is considerably different. Moreover, their AND-OR tree only contains ORDERED-AND nodes, whereas Action Graphs include both ORDERED and UNORDERED-AND nodes as often actions do not need to be performed in a fixed order.
Recognition as planning is a more recently proposed approach, in which languages normally associated with task planning, such as PDDL, define the actions agents can perform (along with their preconditions and effects) and the world state. This enables a single set of action definitions to be written for task planning, GR and GRD. Moreover, whereas in recognition as parsing usually only  actions are considered, planning-based approaches allow for the inclusion of state knowledge, such as  what objects are found within the environment and their locations. Early works, such as [7], were computationally intensive as a planner was called twice for every goal, to find the difference in the cost of the plan to reach the goal with and without taking the observations into consideration. Later advances in GR as planning algorithms greatly improved the computational efficiency [2,8,9]. Plan graphs were proposed in [9] to prevent a planner from being called multiple times. A plan graph, which contains actions and propositions labelled as either true, false or unknown, is built from a planning problem and updated based on the observations. Rather than calling a planner, the graph is traversed to calculate the cost of reaching the goals. More recently, the work presented in [2,8] significantly reduced the recognition time by finding landmarks, i.e., states that must always be passed for a particular goal to be achieved.

Distinctiveness Metric
The WCD metric proposed in [10] only provides knowledge about the longest non-distinctive prefix for a set of goals G, rather than considering the overall distinctiveness of the goals and the structures of the plans. In this section, several examples are provided to illustrate the shortcomings of the WCD metric and additional metrics are proposed.
Whilst redesigning the environment, the distinctiveness of some goals (G ⊂ G) can be increased without affecting the WCD. We, therefore, propose finding the longest non-distinctive prefix for each goal and calculating the average length of these, namely the average distinctiveness (ACD). An example is provided to demonstrate the advantage of calculating ACD over WCD. Suppose there are three goals, all requiring a different item to be taken from the same cupboard. The plans are shown in Figure 3a, from which it is clear that the WCD of these goals is 1. After redesigning the environment, by moving item3 from cupboard1, G3 becomes fully distinctive (Figure 3b). The environment has arguably been made more distinctive but the WCD does not reflect this, i.e., it is still 1. The ACD of the initial environment is also 1, i.e., (1 + 1 + 1)/3, but after item3 has been moved, the ACD is reduced to 0.67.  A second drawback of WCD (and also of ACD) is that these metrics only capture the lengths of the non-distinctive plan prefixes and not the structure of the plans. In other words, WCD lacks knowledge of the dependencies between actions, and thus the possible plan permutations. The more dependants a non-distinctive action has, the less distinctive that action is. If an action has one distinct dependant, then only one change is required to make it fully distinctive. When the state of the environment is redesigned, both the WCD and the ACD metrics could be reduced without the goals becoming necessarily more distinctive, or vice-versa. For this reason, we introduce modified versions of WCD and ACD, i.e., WCD dep and ACD dep .
To calculate these, if a non-distinctive action is required to fulfil multiple actions' precondition(s), that action is counted multiple times. More precisely, each action a in the longest non-distinctive prefix is counted C times, with C equalling the number of actions (including the goal itself) for which a is a dependency. Dependencies are defined in this paper as actions that set one or more of the dependant's preconditions, e.g., action 1 is said to be dependent on action 2 if action 2 fulfils one (or more) of action 1's preconditions, i.e., a1 pre ∩ a2 e f f = ∅. If multiple options exist the longest list of dependencies is selected.
The calculation for ACD dep is defined by Equation (1), in which |p(G1, G2)| is the number of actions counted, as described above, in the longest prefix common to both G1 and G2. In the calculation of WCD dep , the averaging operation is replaced by finding the maximum, see Equation (2). Note, the operator p is non-commutative: |p(G1, G2)| is not necessarily equal to |p(G2, G1)|. Algorithmic details on how p can be discovered are provided in Section 6. For clarification several examples in which WCD = WCD dep , and the reduction differs, are provided next.
In the example shown in Figure 4a, each of the 2 goals requires taking item1, item2 and item3. These items are all in different cupboards that must be opened before the item can be taken. Therefore, the WCD for this environment is 6. The WCD dep is 7, as both these goals require a second (unique) item to be taken from cupboard3, so opening cupboard3 is counted twice. If during the design process item2 is moved into cupboard1 the WCD is reduced to 5 but the WCD dep is not reduced (Figure 4b). When items are moved into a single cupboard the plans, as well as non-distinctive prefixes, become shorter. Therefore, a GRD approach that aims to reduce WCD could simply move all items into the same cupboard. A better solution would be to put the items unique to a goal in a cupboard not required by any another goal since , depending on which plan permutation is selected, this enables the goal to be recognised after a single observation. Note, making plans shorter may be desired by the human; however, it is not the primary aim of our work.  In the same scenario, it is also possible that WCD dep is reduced when WCD is not. Figure 4c shows the example after such a change occurs. If the items unique to both plans are moved into different cupboards (which are also not contained within the non-distinctive prefix), WCD dep is reduced but WCD remains the same. This reduction reflects the fact that, due to the different plan permutations, the human's first action could now be distinctive (i.e., if they open cupboard4 their goal must be G4).

Action Graphs for Modelling GRD Problems
Goal recognition design is performed by building an Action Graph, traversing the graph to find the non-distinctive prefixes, then either removing or replacing actions to reduce the length of these non-distinctive prefixes. This section provides details on the structural features of Action Graphs and on how such an Action Graph is derived from a problem specified in PDDL.

Structural Features
Action Graphs model the order constraints, i.e., dependencies, between actions. An Action Graph contains OR, AND and leaf nodes. Leaf nodes are also referred to as action nodes, as each one is associated with an action. AND nodes are split into two categories: ORDERED-AND, which denotes that all children are performed in order, and UNORDERED-AND, for which all children are performed in any order. For OR nodes one child is performed. The root node is always an OR node. Throughout the paper, unless otherwise stated, the term parent(s) always refers to the direct parent(s) of a node, the same goes for child/children.
The term graph is stated, rather than tree, because action and ORDERED-AND nodes can have multiple parent nodes. Action Graphs are acyclic, i.e., do not contain any cycles. A sub-section of an example Action Graph is depicted in Figure 5.

Action Graph Creation
An Action Graph is generated from a GRD problem D = (F, I, A, G) by performing a Breadth First Search (BFS) backwards from each goal G ∈ G to the initial state I. This section first describes the construction of an Action Graph for unconstrained problems, i.e., problems in which all plan variations (including those longer than the optimal) are required to calculate the distinctiveness metrics. Subsequently, the optimisations made to the construction algorithm, that allow problems with constraints (i.e., navigation problems) to be handled, are detailed. Such problems have strict constraints on the order of actions and only the optimal plans are inserted into the Action Graph.
The unconstrained and constrained Action Graph creation algorithms are demonstrated in the videos provided as Supplementary Materials.

Unconstrained Action Graph Creation
In many domains, e.g., a kitchen domain, all plans (including sub-optimal plans) should be included, to calculate the distinctiveness. For example, the optimal plan for making breakfast includes making a cup of tea but there also exist longer plans, that for instance include making a cup of tea with milk or making a cup of coffee instead of tea. All these plan variations are required to calculate how distinctive making breakfast is from the alternative goals. These differences should not be inserted into the lists of hypothesis goals (G), e.g., (and (made_breakfast) (made_tea)), (and (made_breakfast) (made_coffee)), etc., for a number of reasons. First, there could be numerous different plans (e.g., in a large factory or hospital domain) to reach a goal; thus, it would be time consuming (and unnecessary) to add every plan variation as a separate goal in the list of hypothesis goals G. Second, it may be more important to determine the high-level goal of a human (e.g., are they making breakfast or dinner). GRD should make these more distinctive rather than, for instance, increasing the distinctiveness between making breakfast with tea and making breakfast with coffee. Third, in some cases this would cause it to be impossible to reduce the WCD, e.g., the WCD would always be the length of the plan to make breakfast with tea as all its actions are within the plan to make breakfast including tea with milk.
Performing BFS from a goal to the initial state enables an Action Graph to be built from the root downwards. In the resulting graph, (which has a tree-like structure) the actions closest to the root result in a goal atom being met and those at the furthest points from the root are applicable in the initial state (I). Figure 6 shows an example of the steps executed to insert the plans for a single goal, i.e., (lunch_packed), into the graph.
An Action Graph is initialised with a OR node as the root, and each goal G ∈ G is processed in turn. Actions which result in a goal atom being reached (a ∈ A g with A g = {a | a e f f ⊆ G}) are found and inserted into the graph along with their direct dependencies, as shown in Figure 6a. These goal actions are pushed onto a BFS queue to initialise it. For each action (a) in the queue our algorithm iterates over its dependencies, to insert the dependencies of these dependencies (e.g., Figure 6b,c into the Action Graph. During this iteration, the action's (a) dependencies are pushed onto the queue. Actions and dependencies are processed in this manner because it is simpler to remove the action when none of its dependencies' dependencies can be inserted. Dependencies are actions; therefore, in the subsequent text we refer to a dependency's dependencies as an action's dependencies.
To insert an action's dependencies into an Action Graph multiple nodes are created, including an ORDERED-AND node and an action node for each dependency. If the action is a goal action, its action node is created and the ORDERED-AND node is appended to the root node's children (e.g., Figure 6a). Otherwise, the action itself is already in the graph, and the action's current parents become the ORDERED-AND node's parents (e.g., Figure 6b). The ORDERED-AND node's children are set to the dependencies followed by the action node itself. These dependencies can be a single action, a list of multiple actions or a set of alternatives (as actions can have equivalent effects, or their precondition could contain an or statement). If an action has a single dependency, a single action node is created and prepended directly to the ORDERED-AND node's children; e.g., in Figure 6b take(lunch_bag cupboard2) has a single dependency. When an action depends on multiple actions, an UNORDERED-AND node is prepended to the ORDERED-AND node's children. The UNORDERED-AND node's children are set to the dependencies; for example, make-cheese-sandwich() in Figure 6b. For a set of alternatives, an OR node is inserted to indicate the different choices that can be made. In this case, the OR node is prepended to the ORDERED-AND node's children, and the different options (i.e., actions or UNORDERED-AND nodes) become the OR node's children (e.g., Figure 6a).
(a) The Action Graph creation starts by inserting the actions that result in the goal being reached (i.e., A g = {a | a e f f ⊂ G}), and their dependencies. The single goal action, pack-lunch() is inserted into the BFS queue.
(b) pack-lunch() is popped from queue and the algorithm iterates over its dependencies, i.e., make-peanutbutter-sandwich(), make-cheese-sandwich() and take(lunch_bag cupboard2). These actions' dependencies are inserted into the Action Graph, and these three actions are appended to the queue. When take(lunch_bag cupboard2) is popped from the queue, as its single dependency is applicable to the initial state, the graph is not expanded and no actions are pushed onto the queue.
(c) The next item in the queue, make-cheese-sandwich(), is processed in the same way, i.e., its dependencies are iterated over to insert their dependencies. Figure 6. Example of the steps taken to create an Action Graph, when the first goal is (lunch_packed), which can be achieved by retrieving a lunch bag, and making either a cheese or peanut butter sandwich. Actions with a dashed border are currently in the BFS queue and the bold actions indicate the action which has just been popped from the queue. The algorithm continues until the queue is empty; only the first steps are provided to prevent the graph becoming unreadable.
When an action that has no dependencies is reached, i.e., is applicable to the initial state (A I = {a | a pre ⊆ I}), there is no need to further expand that branch of the graph (i.e., the action is not pushed on to the BFS queue). Moreover, if an action was previously processed, e.g., when adding the plans for the preceding goals, there is no need to re-insert its dependencies nor to insert the action into the queue. Branches for which the initial state cannot be reached are removed. This process finishes when the BFS queue is empty.
After all actions required to reach the first goal have been inserted, the same process is repeated for the subsequent goal. The order in which the goals are processed does not affect the resulting Action Graph. To insert all plan variations, no boundary on how sub-optimal a plan can be is set. For actions (and states) that can expand abundantly, such as in navigation domains, the constrained approach described in the next section (Section 5.2.2) should be executed.
UNORDERED-AND nodes are also inserted when a goal containing an and statement, which results in more than one action being required to reach that goal, is detected. When this is detected an ORDERED-AND node is created and its children are set to a UNORDERED-AND followed by a placeholder/dummy action. The UNORDERED-AND node's children are set to the required actions (or their parent ORDERED-AND nodes if they have dependencies). This placeholder becomes the action required to reach the goal state, i.e., a e f f = G where a is the placeholder. These placeholders simplify the graph traversal algorithms required to find and reduce the non-distinctive prefixes; they are ignored when calculating the ACD dep and WCD dep metrics.

Action Graph Creation for Navigation Problems
For navigation problems the graph creation process can be optimised by only finding the optimal plans. This section describes how a limit is set during the Action Graph creation. The modified version of the algorithm is provided in Appendix A (Algorithm A1). For this domain, only a single permutation of a plan exists. Figure 7 shows an example of the steps executed to insert all the optimal plans for a single goal in a 3 by 4 grid-based navigation problem, as depicted in Figure 8.
Similar to before, all actions to reach a goal are inserted into the Action Graph by performing a BFS starting from actions whose effect (a e f f ) result in the goal state (G), i.e., A g = {a | a e f f = G}. When an action that has no dependencies, i.e., is applicable to the initial state (A I = {a | a pre ⊆ I}), is reached the length of the shortest plan is known; as this is the number of steps taken from the goal to reach that action (Figure 7d). Any actions not within an optimal plan, i.e., whose dependencies are further from the goal than the current limit, are removed from the graph; if there are no actions with equivalent effects, actions that are dependent on the action being removed are also removed.
Actions are also removed when linking it to its dependencies would cause a cycle (and thus sub-optimal plans) to occur. If all the dependencies of an action have been inserted into the graph while processing the current goal, then connecting it to its dependencies would create a cycle within the graph. Therefore, the action is removed. For example, the move(0_0 0_1) action in Figure 7b.
When the BFS queue is empty, all optimal plans to reach the first goal are contained within the Action Graph; an example is shown in Figure 7e. When inserting the actions that lead to the subsequent goals, the algorithm also checks if an action was inserted when processing a previous goal. If so, there is no need to (re-)insert the action's dependencies (nor insert the action into the queue). The limit on the number of actions to reach the goal is set to the current action's distance from the goal plus its distance from the initial state, which is discovered by traversing the graph (depth-first).
(a) The Action Graph creation starts by inserting the actions that result in the goal being reached (i.e., A g = {a | a e f f = G}), and their dependencies. The two goal actions, move(0_1, 0_0) and move(1_0, 0_0) are inserted into the BFS queue. If any of the goal actions are applicable to the initial state, a threshold on the number of allowed steps away from the goal action is set to 1.
(b) move(0_1, 0_0) is popped from queue and the algorithm iterates over its dependencies, i.e., move(0_0, 0_1), move(0_2, 0_1) and move(1_1, 0_1). These actions also all have dependencies. All the dependencies of move(0_0, 0_1) have already been inserted into the graph, and connecting it to its dependencies would result in its dependencies being further from the goal (than their current placement), so the action (move(0_0, 0_1) is removed. For the remaining actions, i.e., move(0_2, 0_1) and move(1_1, 0_1), their dependencies are successfully inserted into the Action Graph, and these two actions are appended to the queue.
(c) The next item in the queue, move(1_0, 0_0), is processed in the same way. move(1_1, 1_0)'s dependencies already exist in the Action Graph, therefore it can be linked to these without the need to create any action nodes.
(d) As a dependency of move(0_2, 0_1) is applicable to the initial state, the threshold on the number of actions required to reach the goal is set to 2. Adding the dependencies for move(0_1, 0_2) would result in the threshold being breached, therefore this action is removed. No further actions are inserted into the queue.
(e) The items in the queue continue to be processed as described in the previous steps until the queue is empty. This figure shows the resulting Action Graph, after a single goal has been inserted. Example of the steps taken to create an Action Graph, when the initial state is (human-at 1_2) and the first goal is (human-at 0_0). The layout of the environment is shown in Figure 8. Actions which result in the goal state being reached are highlighted in green; yellow boxes with a thick border indicate actions applicable to the initial state; red indicates actions that are being removed; actions with a dashed border are currently in the BFS queue and the bold actions indicate the action which has just been popped from the queue. Reproduced from our prior work [11].

Find All Non-Distinctive Plan Prefixes
Once an Action Graph has been created, the non-distinctive plan prefixes are discovered, and the ACD dep and WCD dep are calculated. These prefixes are then iterated over to increase the distinctiveness of the goals. In our approach finding non-distinctive plan prefixes involves two steps: (1) labelling which nodes in the graph belong to which goals and (2) for each of the goals traversing the graph to find the actions that also belong to another goal's plan (as shown in Algorithm 1). This section describes these two steps.

Label Which Nodes Belong to Which Goals
Each goal action A g = {a | a e f f = G ∈ G} that has dependencies, has an ORDERED-AND node as its only parent. All children of this ORDERED-AND node, including non-direct children, belong to the same goal as the goal action. Therefore, by performing a depth first traversal starting from these ORDERED-AND nodes, all nodes are labelled with which goals they belong to. This step is performed to make traversing the graph to extract the non-distinctive prefixes easier and more efficient. These labels will also be utilised by the algorithms for increasing the distinctiveness of goals.

Extract Non-Distinctive Prefixes
To extract the non-distinctive prefixes, goal actions are iterated over (lines 5-8 Algorithm 1) and a depth first traversal starting from each of their parent ORDERED-AND nodes is performed, to find the actions (and their dependencies) that are also in another goal's (G2's) plan. During this depth first traversal what steps are performed depends on the node's type (e.g., action, OR etc.) and whether that node belongs to the second goal (G2).

•
If the current node is an action node and it belongs to G2 (line 14), then it is simply appended to the list of non-distinctive actions (i.e., the prefix). Actions that do not belong to G2 are ignored.

•
When the node is of type OR (line 16), each of its children are iterated over to find the one containing the most actions belonging to G2. Only one of its branches are required within a plan, and so to calculate the metrics only the longest non-distinctive prefix is required. The non-distinctive actions not within the branch containing the most actions belonging to G2, are inserted into the list of all non-distinctive prefixes, as these actions should also be processed by algorithms that attempt to minimise the ACD dep /WCD dep .

•
If an ORDERED-AND node that belongs to G2 is encountered (line 19), then all the nodes' children (including non-direct children) also belong to G2. Therefore, a depth first traversal is performed to insert all the actions that are descendants of the ORDERED-AND node into the list of non-distinctive actions. As the left branch of an ORDERED-AND node must be performed before the right branch, actions are inserted in the order they must be performed. During this traversal, OR nodes are handled using the method mentioned above.

•
In all other cases, the algorithm recurses over all the node's children (lines [22][23][24]. The longest non-distinctive prefix discovered for each pair of goals (lines 5-8) is passed to Equations (1) and (2) (from Section 4) to calculate ACD dep and WCD dep .

Performing Action Replacement to Reduce ACD
Modifications applied to the world state I can cause actions in a plan to be replaced by more distinctive actions, and thus reduce ACD dep . For example (as shown in Figure 3), if the state of the environment is modified by moving item3 from cupboard1 to cupboard2, in all plans opening cupboard1 then taking item3 is replaced by opening cupboard2 then taking item3. In our approach, part of the Action Graph is replaced by another sub-graph. This section describes how the list of actions and their replacements is formulated, followed by two methods for selecting which actions to replace, i.e., exhaustive and our less computationally expensive method Shrink-Reduce.

Defining Modifications
This section describes how the possible modifications are defined; the process that creates the replacement sub-graphs; and how these modifications are applied to the Action Graph by the exhaustive and Shrink-Reduce algorithms. The modifications applied during the environment design process are generated from additional PDDL action definitions. For instance, the move-itemstate-modification(?item ?container1 ?container2) action definition, along with its preconditions and effects, is filled in with all combinations of items and containers that are stated in a PDDL defined problem. The modifications (actions) not applicable to the initial state are filtered out. An example definition is provided in Appendix B.
Our action replacement algorithms start by creating a map, that maps action nodes to their replacement graphs. For each modification, a list of actions affected by the modification is extracted. An action is affected by a modification if its preconditions (a pre ) contain a variable of which the value is different in the modification's effects (m e f f ). Subsequently, a replacement sub-graph is generated by first applying a modification to the initial state I, then creating a (separate) Action Graph for a goal set to an affected action's effects (a e f f ). The resulting map contains a list of actions along with their replacement sub-graphs and corresponding modifications. An example of this is visualised in Figure 9.
Changes are made to the graph by swapping the action node(s) (or, if the action has dependencies, its parent ORDERED-AND node) with the appropriate sub-graph(s). If a replacement is selected and the corresponding modification affects other actions, they will also be replaced with their equivalent replacements. Actions are not repeated within the Action Graph; thus, nodes' parents and children are altered if an action in the replacement already exists (rather than creating a new action node). Figure 9. Example of an action currently in the Action Graph, i.e., take(item1 cupboard1), which can be affected by 2 state modification actions. Thus, it can be replaced, by swapping its parent ORDERED-AND node with one of its replacement sub-graphs.

Exhaustive
The exhaustive approach iteratively applies each modification to the Action Graph, then pairs of modifications, then triplets, etc. After a set of action nodes in the Action Graph has been replaced with their corresponding sub-graphs, the ACD dep is re-calculated. The set of modifications resulting in the lowest ACD dep is returned. These modifications can be applied to I to produce the resulting environment design. While this algorithm is guaranteed to find the best possible solution, its computational time can become exceedingly high, especially for problems with large state spaces. Therefore, a maximum size for the set of modifications (N) is provided as a parameter. The pseudo-code for this process is provided in Appendix C.

Shrink-Reduce
Our Shrink-Reduce heuristic is a two step process: (1) Shrink all plans and (2) reduce the length of the non-distinctive prefixes. The pseudo-code can be found in Algorithms 2 and 3. A video showing the Shrink-Reduce process, for an example problem, has been provided as Supplementary Materials. In this section their details are described in turn, then an example is provided. This method has been developed as it is less computationally expensive than an exhaustive search; however, it is not guaranteed to find the solution with the lowest ACD dep . Moreover, our Shrink-Reduce method was designed for problems containing plans with multiple permutations.

Shrink Plans
Shrinking the plans makes it possible to increase the distinctiveness of the environment by performing a single replacement, as afterwards there will exist unused actions that can replace parts of the non-distinctive prefixes. In other words, if there are x variables (e.g., items) that can have y different values (e.g., locations), after changing all variables (that can change value) to 1 value, there are some (e.g., |y| − 1) unused values that can be utilised to increase the distinctiveness. Without this step it is more difficult to reduce the non-distinctive prefixes, as a single modification often does not affect ACD dep . The process described in this section is performed on each goal G ∈ G in turn.
All actions in the plans to reach a goal are extracted from the graph (Algorithm 2 line 3) by performing a depth-first search starting from the goal action's (a e f f = G) parent (ORDERED-AND node).
Actions inserted when processing a preceding goal are discovered (i.e., A r from line 4), to prevent increasing the length of a previously shortened plan. Subsequently, all the actions in the plans that can be replaced are iterated over (line 5). If replacing an action results in the plan being shortened (line 7), the replacement is applied to the Action Graph (line 8) and all actions in the replacement's sub-graph are appended to A r (line 9); otherwise the original action (and its dependencies including indirect dependencies) are appended to A r (line 14). This check is performed by comparing the number of elements in the set containing A r plus the original action and its dependencies (including dependencies' dependencies), with the number of elements in the set containing A r plus the replacement's actions. During this check, repeated actions are ignored. Performing this comparison with A r , rather than the full plan, increases the amount the plans are shrunk (e.g., if item1 has already been moved to cupboard2, all items should be moved to cupboard2; rather than another cupboard that appears elsewhere in the plan). This process is likely to increase the ACD dep ; however, in the next step it will be reduced.

Algorithm 2 Shrink plans.
> Inputs: list of goals, the Action Graph and map of action node to their possible replacements (e.g., Figure 9) > Output: Action Graph containing the shrank plans. for G ∈ G do 3: allPlans = graph.get_all_plans(G)

Reduce Non-Distinctive Prefixes
The second step attempts to reduce the length of the longest non-distinctive prefix for each pair of goals (e.g., p(G1, G2) and p(G2, G1)). For each prefix, this step iterates over the replaceable actions (line 5 of Algorithm 3); a replaceable action appears anywhere in the remainder of the plan (that the prefix belongs to), provided that at least one of its dependencies is within the prefix and the action itself is not in the prefix. An action is replaced by each of its possible replacements in turn (line 7), until the prefix has been shortened and the ACD dep reduced (lines 11-13), or all its replacements have been processed. If a replacement does not shorten the prefix and reduce ACD dep the alteration is undone (line 15). During this process the currently applied modifications are tracked and, once the non-distinctive prefixes have been processed, they are returned (line 21). This list of modifications (i.e., actions) can be applied to the initial state I to get the resulting environment design.

Example of Performing Shrink-Reduce
To clarify why this two step process is performed an example is provided, and depicted in Figure 10. In this example, there are 2 goals (G1 and G2) and to reach these goals different items must be taken from cupboards. The state of the environment can be altered by changing the locations of items. Prior to redesigning the environment, all available cupboards must be opened to reach either goal ( Figure 10a). Therefore, it is impossible to alter the ACD dep (i.e., the length of the non-distinctive prefixes) with only a single modification. After shrinking the plans, i.e., moving all items to single cupboard (Figure 10b), a single modification can be applied to reduce ACD dep ; for instance, as shown in Figure 10c, item5 can be moved to cupboard2. The resulting plans and non-distinctive prefixes, after all actions in both non-distinctive prefixes (i.e., p(G1, G2) and p(G2, G1)) have been reduced, are depicted in Figure 10d.

Performing Action Removal to Reduce ACD
As described in our previous work [11], by removing the possibility of performing an action and, consequently, modifying the state of the environment, the goals G can be made more distinctive (i.e., the ACD dep reduced). This type of state modification is applied to a navigation domain, as it is the only human-inhabited environment, we can think of, in which it is feasible to remove actions (e.g., by placing obstacles). As mentioned above, for such a domain only the optimal plans are contained within the Action Graph; thus, during the action removal process the cost of the optimal is not increased. Rather than simply exhaustively removing parts of the graph, a much less computationally expensive algorithm has been developed. This section first provides an overview of how the non-distinctive prefixes are processed, before providing the details. In Figures 11-13, taken from our previous paper [11], some simple examples are provided to illustrate how our approach works. An example is also provided in a video supplied as Supplementary Materials. Our approach is applicable to environments of any size with any number of goals.
A video showing the Shrink-Reduce process, for an example problem, has been provided as Supplementary Materials. The unconstrained and constrained Action Graph creation algorithms are demonstrated in the videos provided as Supplementary Materials.
(a) Example of an environment with a single non-distinctive plan prefix containing 1 action, for which each goal has an alternative action.
(b) As each goal has an alternative, the non-distinctive action can be removed, resulting in an ACD of 0.
(c) The action graph for the example environments. The action which is removed is shown highlighted (white text with red background).
(d) The Action Graph after the ACD has been reduced. Figure 11. Example GRD problem, in which both goals have an alternative to the plan(s) containing the non-distinctive prefix. In all example environments a longest non-distinctive plan is indicated with red arrows, and the alternative action(s) by a green dashed arrow.
(a) Both goals have an alternative to the 1st action in the non-distinctive prefix but the alternative, move(1_2 0_2), also belongs to both goals; so is only taken into consideration as a last resort. G1 has an alternative to the 2nd action, whereas G2 does not. So, the next action in G1's plan, containing the prefix, is removed.
(b) After removing an action, move (1_2 1_2) is still a non-distinctive action; thus, is evaluated on the next iteration. This time G2 is the only goal with an alternative action.
(c) The environment after all non-distinctive plan prefixes have been evaluated.
(e) The Action Graph after the action removal process has completed. (a) Initial environment. Only G2 has an alternative action, for any action in the non-distinctive plan prefix, so the next action in G2's plan which contains the non-distinctive prefix is removed.
(b) After the 1st action has been removed the second to last action in the original non-distinctive plan prefix is still a non-distinctive action; thus, is evaluated in the next iteration.
(c) After the 2nd action has been removed the longest non-distinctive prefix belongs to G1 and G2. Neither G1 nor G2 have a unique alternative action. So, the next actions(s) in 1 of the goal's plans are removed. In this case, as G1 has no further actions, G2's next actions are removed.
(d) Two actions have been removed, since there were two possible next actions to reach G1 (discovered by detecting an OR node in the Action Graph).
(e) G2 is now distinctive, but there still exists a non-distinctive plan prefix for G1 and G3. The list of non-distinctive plan prefixes is sorted, most costly first, so that the worst is processed first. In turn each prefix is taken from the list and its actions iterated over to discover if: (1) all goals an action belongs to have a (unique) alternative action; (2) a sub-set of the goals the non-distinctive prefix belongs to have an alternative; or (3) all the non-distinctive prefix's goals have an action with an non-unique alternative. These points are expanded on below. The actions are iterated over in order because if an action at the start of the prefix can be removed, the remainder of the prefix is no-longer valid. If the algorithm were to start from the last action, in a plan containing the non-distinctive prefix, it would be more difficult to reduce the distinctiveness (i.e., more actions would require removing).
A goal has an alternative action if an action (or if the action has dependencies its single ORDERED-AND parent) has an OR node as a direct parent and another of the OR node's children belong to that goal. If so, an action can be removed without causing the goal to become unreachable. Moreover, the alternative cannot belong to any of the other goals the (non-distinctive) action belongs to as removing this action would have no effect on the goals' distinctiveness. If all the goals with a plan containing the non-distinctive action have an alternative, the action is removed. An example is provided in Figure 11.
After checking all actions in a non-distinctive prefix, if only a subset of the goals have alternative actions, the action(s) directly after the non-distinctive prefix in their plans is removed. As a result, those goals can only be reached by their alternative (possibly more distinctive) plan(s). An example is shown in Figure 12. Otherwise, when all the non-distinctive prefix's goals have an action (in the non-distinctive prefix) with an non-unique alternative, i.e., the alternative(s) belongs to more than one of the action's goals, the next (distinctive) action(s) for one of the goals is removed (see Figure 13c,d). This prevents the non-distinctive prefix being valid for that goal, thus making it more distinctive. Our action removal method always checks that the action removed will not interfere with any of the other goals, which do not have an alternative, to prevent them from becoming unreachable.
Once an action has been removed, which nodes belong to which goal is re-evaluated (see Section 6); and the actions in the non-distinctive prefix, prior to any removed action, are checked to see if they should be inserted into the list of non-distinctive plan prefixes (e.g., Figure 12a,b). Actions, that are not the last action in the prefix, should be re-processed as it may be possible to further reduce the length of the non-distinctive prefix (e.g., Figure 13a-c). The last action in the prefix will not be processed again, if it is still a non-distinctive action then the ACD will not be reduced to 0. An example of an environment in which the ACD cannot be reduced to 0, and the steps our algorithm performs, is shown in Figure 13.

Experiments
Through experiments we aim to demonstrate the scalability and performance of our goal recognition design approach. First, Shrink-Reduce is compared to the exhaustive search on problems generated from a scalability testing domain we developed. Second, the results of running Shrink-Reduce and exhaustive on a kitchen domain are presented. Finally, our action removal method is compared to the pruned-reduce method by Keren et al. [10], on grid-based navigation problems of various sizes.

Action Replacement Scalability Experiments
The purpose of experiments in this section is to demonstrate the performance of our Shrink-Reduce method, and compare this to the exhaustive approach. The results show the computation time, ACD dep reduction, WCD dep reduction and number of states modifications required to get from the provided to the designed world model. As the plans' lengths can be altered during this process, the average change in length is also mentioned. The experiment setup is described, before presenting and discussing the results.

Setup
For these experiments a test domain was developed, that contains definitions for the following actions: open(?container), take(?item ?container) and multiple goal actions, e.g., goal1(). Each goal action was generated by randomly selecting between 3 to 10 items that require taking before the goal action can be performed, i.e., its preconditions are set to a list of (taken ?item) fluents, and its effects were set to a single fluent e.g., (goal1_reached). These actions definitions are representative of definitions suitable for many application domains, e.g., a kitchen, factory or hospital. The location of each item was also selected randomly. The location of these items can be modified during the environment design process, for example, if open(cupboard1) and take(item1 cupboard1) are in a plan they can be replaced by open(cupboard2) and take(item1 cupboard2).
Three datasets were created from this test domain, (1) with a differing number of variables (i.e., items) whose value can modified, (2) an increasing number of values the variables can be set to (i.e., cupboards the items can be in), and (3) a varying number of goals. When one of these amounts is changed the others are fixed, i.e., there are 15 variables, 5 values, and 5 goals. As the datasets contain an element of randomness, for each variation 5 problems were created and the average result is presented. Experiments were ran on a server with 11 GB of RAM and a Intel Xeon CPU 2.27 Ghz processor. All graph's error bars show the minimum and maximum result.
The exhaustive approach was ran for varying values for the maximum number of changes that can be applied (N), i.e., 1, 2, 3 and 4. In the results these are named exhaustive1, exhaustive2, exhaustive3 and exhaustive4 respectively. At N = 5, the exhaustive search can take 2 days to complete a single (relatively large) problem. Therefore, for the scalability experiments N was set to a maximum of 4. For the experiment presented in the subsequent section (Section 9.2), as a domain with fewer goals and possible modifications was used, no limit was set for N.
The ACD dep and WCD dep reductions are presented in performance profile graphs, as suggested by Dolan and Moré [42]. This enables the results to be presented in a more readable format, and all datasets can be grouped into a single result, to prevent a small number of problems dominating the discussion. For run-times, graphs showing the individual data-points are provided, so the exponential increase in run-time when the problem size is increased can be clearly seen. To produce the performance profile of an approach (S ∈ S) the ratio between its result (D P,S ) and the best solution for a problem (P ∈ P) is calculated, as shown in Equation (3). Equation (4) calculates the percentage of problems an approach solved when the ratio is less than a given threshold, τ. When the ACD dep /WCD dep reduction is 0 (D P,S = 0), the ratio is set to infinity (as the approach failed to provide a solution). As approaches can produce the same reduction, the sum of all approaches' performance at τ = 1 does not necessarily equal 0.

Results and Discussion
The run-time and number of required state changes, for an increasing number of variables, values and goals are shown in Figures 14-16, respectively; the ACD dep and WCD dep reduction comparisons are shown in Figure 17 (separate results for each dataset are provided in Appendix E.1). A similar trend was observed in each of these configurations. The computational times for the different approaches, followed by the increase in goal distinctiveness, are discussed.  As the size of the problem (i.e., number goals, values or variables) was increased, there was an exponential increase in the average time taken by each of the approaches. The outliers of this trend (e.g., in Figure 14a at 18 variables) are due to the nature of generating data with randomness, e.g., if a variable is not within a plan to any of the goals, the actions associated with that variable are not inserted into the graph, and thus not processed when increasing the distinctiveness of goals. Our Shrink-Reduce method finished in less time than exhaustive, except for exhaustive1 (i.e., N = 1). Having said that, exhaustive1 was inferior at increasing the distinctiveness of the goals.
For exhaustive, higher values of N allow ACD dep (Figure 17a) and WCD dep (Figure 17b) to be reduced to lower values. If exhaustive performs all possible combinations of state changes, it is guaranteed to find the best environment design. Nevertheless, our Shrink-Reduce approach often made the goals more distinctive than exhaustive4, but it also performed a large number of state changes (Figures 14c, 15c and 16c) and often (slightly) increased the average plan length (Appendix E.1). As stated earlier, and reiterated by these results, ACD dep is sometimes reduced without any reduction in WCD dep ; moreover, it is possible that WCD dep is increased when ACD dep is decreased (e.g., exhaustive1 Figure A3b). At τ ≥ 2, the percentage of problems solved by exhaustive4 is higher than Shrink-Reduce; showing that on the problems Shrink-Reduce did not have the highest ACD dep reduction it is further from the best approach, than exhaustive4 is when it did not produce the best result.    With some alternative GRD methods, such as [10], it would be more difficult to calculate ACD dep and WCD dep . In [10], a task planner is called to find a plan to reach a pair of goals, this joint plan contains the longest non-distinctive prefix. Our distinctiveness metrics could be calculated by searching joint plans for actions that have dependencies (i.e., should be counted multiple times); however, this would be computationally expensive and, as multiple prefixes could be of equal size, multiple joint plans (for a single pair of goals) may require processing. Methods that create graph or tree structures, such as MDPs [12] or planning libraries [21] could employ a similar method to Action Graphs. Wayllace et al. [12] perform stochastic GRD with MDPs, that each contain a single terminal (goal) state; these MDPs can be traversed to find the non-distinctive prefixes, include those actions that should be counted multiple times. In future work, we intend to investigate stochastic GRD in more detail.

Action Replacement Applied to a Kitchen Domain
The kitchen domain contains the actions and world model required for a human to make several meals, i.e., breakfast, a packed lunch and dinner. Ramírez and Geffner [7] created the original version of the kitchen domain for their work on goal recognition, based on the work by Wu et al. [16]. Moreover, similar problems have been utilised by several related works, e.g., the cooking problem by Yordanova et al. [4]. This experiment is included to demonstrate our approach applied to a realistic domain containing more action definitions (than our scalability testing domain). Rather than just a goal action preceded by a list of take item actions, a more complex graph (plan) structure was produced. Further, fewer goals, variables and values were contained within the kitchen goal recognition design problem, therefore running exhaustive to find the best possible solution was feasible.

Setup
We extended the original kitchen domain by Ramírez and Geffner [7] with action definitions and the state knowledge required to take items from cupboards, a fridge and a drawer; the newly defined actions are open(?container), close(?container) and take(?item ?container). The initial state of the environment is set so that all food items (that don't need to be stored in the fridge) are in cupboard1, equipment (e.g., cup and plate) is in cupboard2 and all utensils (e.g., spoon and knife) are in the drawer. In this dataset there are 3 hypothesis goals: G ={(made_breakfast), (lunch_packed) and (made_dinner)}. These goals require multiple items to be taken and other actions to be performed, e.g., part of the plan to make breakfast involves taking bread and making toast. Each goal can be achievable through multiple plans, e.g., for (lunch_packed) to be reached a person must always perform the take(lunch_bag) action but has the option of either performing the make-peanut-butter-sandwich() or make-cheese-sandwich() action. Information on the plans producible from this domain are provided in Appendix D.
Like the previous experiment, the location of items is modified during the environment design process. As it is likely that a human would not want items within the fridge or drawer to be changed (e.g., milk taken out the fridge, or cereal put in the drawer), PDDL not statements have been inserted into the state modification action to prevent this. The PDDL definition for this action is provided in Appendix B.

Results and Discussion
For the kitchen domain, the initial WCD dep = 14 and ACD dep = 11.00. Both our Shrink-Reduce and exhaustive approaches successfully increase the distinctiveness of the goals. Exhaustive (although slower) made the goals more distinctive than Shrink-Reduce. Shrink-Reduce reduced WCD dep to 12 and ACD dep to 10.00, whereas exhaustive reduced WCD dep to 11 and ACD dep to 9.33. This illustrates that, for environments with fewer state changes, the exhaustive algorithm should be executed.
Both approaches reduced the WCD (distinctiveness metric by Keren et al. [10]) by 1 action. WCD dep was lowered more than WCD as the number of repeated open actions (that are dependencies for take actions) was also reduced (as explained in Section 4). This difference indicates that depending on which plan permutation is selected by a human, their goal can now be determined sooner. As demonstrated by these results, for problems with multiple plan permutations, WCD dep can provide more insight into the distinctiveness of an environment than just providing WCD.
To produce the resulting environment designs of either approach, 3 state modifications need to be applied to the initial state (I). The list of required modifications are provided in Table 1. Both approaches proposed moving bread from cupboard1 to cupboard2, this results in cupboard1 only needing to be opened when making breakfast, and not when making dinner or a pack lunch. Therefore, if a human opens cupboard1 their goal is known. Moving the water-jug and/or cup to cupboard1 decreases ACD dep as these items are only required to make breakfast. Shrink-Reduce moved the bowl instead the cup into cupboard1. This increased the distinctiveness between making dinner (which requires the bowl) and making a pack lunch (which does not require the bowl). After moving the bowl, changing the cup's location would not have altered the distinctiveness (as cupboard1 requires opening for both making breakfast and dinner). Table 1. List of state change actions returned by the Shrink-Reduce and Exhaustive methods, that when applied to the initial state (I) result in the designed environment. The initial WCD dep = 14, ACD dep = 11.00, WCD = 7.

Shrink-Reduce:
WCD dep = 12, ACD dep = 10.00, WCD = 6 Exhaustive: WCD dep = 11, ACD dep = 9.33, WCD = 6 move-item(bread cupboard1 cupboard2) move-item(bread cupboard1 cupboard2) move-item(water-jug cupboard2 cupboard1) move-item(water-jug cupboard2 cupboard1) move-item(bowl cupboard2 cupboard1) move-item(cup cupboard2 cupboard1) During some initial tests we enable items to be moved into/from the fridge and drawer. This resulted in a lower ACD dep (than provided in Table 1) for the redesigned environment produced by both methods. Nonetheless, some changes, e.g., moving the bread into a drawer, are undesired by humans. In many scenarios there will be a trade-off between making undesired changes and increasing the goals' distinctiveness.

Action Removal Experiments
The action removal experiments, presented in this section, aim to show: (1) How well our approach scales as the number of goals and grid size of a navigation domain are increased, (2) how much the ACD and WCD is reduced, and (3) how many actions are removed. For this domain ACD = ACD dep and WCD = WCD dep , as actions are strictly ordered. Our Action Graph approach is compared to the pruned-reduce method by Keren et al. [10], by running both approaches on a dataset we generated, containing problems with randomly selected initial and goal locations. This section first describes the experiment setup, then discusses the results.

Setup
Two grid-based navigation datasets containing GRD problems were created. The first contains problems with an 8 by 8 grid and a varying number of goals (destinations). For each number of goals, 8 problems were generated by randomly selecting a start location and the goals' locations. In total the dataset contains 112 problems.
The second dataset consists of problems with differing grid sizes, for all problems both the width and the height are equal. For each grid size 8 problems, with a random start location and three random goal locations, were generated. In total this dataset contains 56 GRD problems.
Experiments were ran on our department's server with 3 GB of RAM and a Dual Core AMD Opteron 2 Ghz processor. A timeout of 10 min per GRD problem was set for all experiments. The whole process, including converting the PDDL into an Action Graph is included in the run-times for our approach. The output of pruned-reduce does not state the ACD (as this is our new metric), so to calculate this the resulting environment design is passed to our algorithm's ACD calculation. As before, all graph's error bars show the minimum and maximum result.

Results and Discussion
Experiments were ran on both datasets: with varying amounts of goals and with an increasing grid size. The corresponding results, showing the average time and number of actions removed, are provided in Figures 18 and 19; and the WCD reduction and ACD reduction comparisons are shown in Figure 20 (separate results for the datasets are provided in Appendix E.2).
(a) Average time, in seconds, per GRD problem.
(b) Average number of actions that were removed. Figure 18. Results for an increasing number of goals. The results of our Action Graph approach are indicated by blue circles, pruned reduce [10] is shown with red triangles. All times are given in seconds.
For the majority of problems, within the varying number of goals dataset, our Action Graph approach took less than 1 second to perform GRD. Whereas pruned-reduce hit the timeout for the majority of problems (Figure 18a). The execution time (i.e., minimum and maximum) varies between different problems as for some problems removing a small number of actions reduces WCD to 0, whereas for others WCD can only be partially reduced; further, the optimal plans within the different problems differ in length (longer plans take more time to find and iterate over). The same trends are observed in the results for an increasing grid size ( Figure 19).   Our approach successfully managed to reduce WCD more than pruned-reduce. Pruned-reduce attempts to remove an increasing number of actions; thus, as shown in Figure 18b, did not attempt to remove many actions before the timeout was reached. This resulted in pruned-reduce not reducing the WCD as much as our approach (Figure 20b), and there were more problems for which pruned-reduce did not manage to reduce the WCD. This was also observed in the ACD reduction (Figure 20a). The aim of pruned-reduce is to reduce WCD (not ACD); thus, it performs worse at reducing ACD than WCD.
For several problems, our approach reduced ACD but not WCD. This indicates that although the least distinctive goal could not be made more distinctive, some goals could be. Both metrics provide an important insight into the distinctiveness of an environment; thus, in future GRD experiments we propose providing both.
These experiments proved that our Action Graph approach is more scalable than a current state-of-the-art approach on grid-based navigation problems. Our approach has managed to reduce the WCD of 168 grid-based navigation environments, with various numbers of goals and grid sizes, from an average of 6.29 (ACD = 4.69) actions to 3.02 (ACD = 1.91) actions, in an average of 4.69 s per problem.

Conclusions
As the plans to reach different goals can start with the same actions, a human's goal often cannot be recognised until their plan nears completion. By redesigning an environment, our work enables the goal of a human to be recognised after fewer observations. This is achieved through transforming a PDDL defined Goal Recognition Design (GRD) problem into an Action Graph, by means of a Breadth First Search (BFS) from each of the goal states to the initial world state. The non-distinctive plan prefixes are then extracted to calculate how distinctive the goals are, i.e., the WCD dep and ACD dep . Subsequently, these prefixes are processed to determine which actions can be replaced or removed. Our Shrink-Reduce method replaces actions by first shrinking the plans, then reducing the non-distinctive prefixes. Shrink-Reduce is less computational expensive than an exhaustive approach; however, when ran on kitchen domain Shrink-Reduce only reduces the ACD dep by 1, whereas exhaustive reduces ACD dep by 1.67. Our action removal method is shown to increase the distinctiveness of various grid-based navigation problems, with a width/height ranging from 4 to 16 and between 2 to 14 randomly selected goals, by an average of 3.27 actions in an average time of 4.69 s, whereas a state-of-the-art approach, namely pruned-reduce [10], often breaches a 10 min time limit.
Action Graphs are acyclic and no actions are repeated, therefore domains in which an action has different dependencies under varying circumstances are not currently solvable by our approach. For instance, if to reach one goal a human must retrieve item1 before taking item2, but to reach another goal the human must take item3 before item2, the action to retrieve item2 has different dependencies. For many domains, e.g., the kitchen domain, this strict ordering is not required, as humans are unlikely to adhere to a fixed ordering of actions, but for other domains this may be required. In future work, we will consider enabling actions to be repeated within an Action Graph. For domains such as the barman domain [43], in which different cocktails are created, a combination of our unconstrained method (as ingredients can be added in any order) and constrained method (as grasping and leaving a shot requires an optimal plan) will be required.
In future work we intend to apply our Action Graph approach to other, closely related, research domains. For instance, GRD with stochastic actions [14,44], plan recognition design [21], goal recognition [8] and task planning (e.g., plan legibility [45,46]). Future experiments will hopefully demonstrate how Action Graphs can help all agents in collaborative smart environments to become more aware of each other's intentions.
Further to contextual-awareness in human-inhabited environments, our work is applicable to numerous application areas. In human computer interaction scenarios, detecting a network intruder's intentions [21,26,47] or offering a user assistance [48,49] can also benefit from recognising the user's goal sooner. In video game development [50] often the world is designed so the player's goal can be recognised, thus enabling the non-playable characters to assist or thwart them; moreover these characters' plans can also be modelled as an Action Graph. Action Graphs could allow robots to learn from humans. For instance, the graph could be built from observations or their structure adjusted to match the order humans perform actions. We hope this paper will inspire researchers in these domains, to incorporate our Action Graph approach into their work.
Supplementary Materials: The following are available at http://www.mdpi.com/1424-8220/19/12/2741/s1. Video S1: Solving GRD Problems with Shrink-Reduce, Video S2: GRD for Navigation Domains. Algorithm A1 BFS for generating an Action Graph containing optimal plans > Inputs: list of goals, list of actions and the initial state > Output: the Action Graph 1: function GENERATE_ACTION_GRAPH(G, A, I) Inputs: list of goals, list of actions and the initial state 2: for each G ∈ G do 3: limit = −1 4: for a ∈ {a | a e f f = G} do 5: graph.insert_action_and_deps(a, {a | a pre = a e f f }) Insert action and its dependencies 6: if a pre ⊂ I then 7: limit = 1 8: else 9: q.push_back(a) Initialise the BFS queue, but only if a was not already in the graph 10: end if 11: end for 12: while q = ∅ do 13: a = q.pop() 14: for d ∈ {a | a pre = a e f f } do 15: if d pre ⊆ I or any {d | d pre = d e f f } have been inserted when processing previous goal(s) then 16: limit = max distance from goal 17: else if distance(G, d) = limit or all {d | d pre = d e f f } inserted when processing G then 18: graph.remove(d) Remove action and dependant actions whose branches will not reach I

19:
else 20: graph.insert_action_and_deps(d, {d | d pre = d e f f }) 21: q.push_back(d) 22: end if 23: end for 24: if a pre not met by actions within the graph then failed to insert an action to reach ≥ 1 of a pre 25: graph.remove(a) 26: end if 27: end while 28: end for 29: end function

Appendix B. Example Action Definition for Action Replacement
The modifications to the state of the environment, that can be made during the action replacement process, are defined as PDDL actions. In the example provided in Listing A1, an object (e.g., cup) is moved from one container (e.g., a cupboard) to another. Objects cannot be removed from the fridge or drawer (i.e., (not (= ?s fridge)), (not (= ?s drawer))), and no object can be moved into the fridge or drawer (i.e., (not (= ?g fridge)), (not (= ?g drawer))).
Listing A1. Example PDDL actions for action replacement.

Appendix C. Exhaustive Algorithm for Reducing ACD
The exhaustive action replacement algorithm iteratively applies each modification to the Action Graph, then pairs of modifications, then triples, etc. The pseudo-code is shown in Algorithm A2.
Algorithm A2 Reduce ACD dep : Exhaustive > Inputs: maximum number of modification, action graph and list of actions to modify the initial state > Output: list of modifications (actions) to produce the resulting environment design graph.undo_all_modi f ications(mods) 13: while get_next_valid_combination(stateModi f ications, n, mods) 14: end for 15: return bestMods 16: end function

Appendix D. Plans Generated from the Kitchen Domain
During the experiments, our action replacement algorithms were ran on a kitchen domain, to increase the distinctiveness of the plans for making breakfast, a packed-lunch and dinner. In this Appendix we provide further details on the plans producible from this domain. This domain was originally developed by Ramírez and Geffner [25]; we expanded it to include container objects, an open container action definition and a take item from container action definition. A single permutation of the longest plan to each of the goals within the kitchen domain are provided, in Table A1. There also exits plans to: • make-dinner() with either just make-salad or make-cheese-sandwich • make-breakfast() with make-coffee() instead of make-tea(). In addition, the option of making tea without milk and sugar or with just sugar. • pack-lunch() with make-cheese-sandwich() instead of make-peanut=butter-sandwich()

Appendix E. Detailed Results Graphs
In the experiments section the performance profiles for the reduction in ACD dep and WCD dep are shown. In this Appendix graphs showing the average reduction for each dataset are provided independently. These provide further detail on the performs of the different approaches. The graphs for the action replacement experiments are shown, followed by the action removal graphs. All error bars show the minimum and maximum result. open(fridge) take(salad-tosser cupboard2) take(kettle) take(peanut-butter fridge) take(bowl cupboard2) take(cloth) take(bread cupboard1) take(plate cupboard2) take(tea-bag cupboard1) take(knife drawer) take(dressing fridge) take(sugar cupboard1) take(plate cupboard2) take(cheese fridge) take(cereal cupboard1) take(lunch-bag cupboard2) take(bread cupboard1) take(bread cupboard1) make-peanut-butter-sandwich() make-salad() take(water-jug cupboard2) pack-lunch() make-cheese-sandwich() take(cup cupboard2) make-dinner take(bowl cupboard2) take(knife drawer) take(spoon drawer) take(butter fridge) take(milk fridge) make-cereal() make-toast() boil-water() make-tea() use(toaster) make-toast() make-buttered-toast() make-breakfast() (a) Average ACD dep reduction.
(c) Average difference in plan length.