Improving Strategic Decisions in Sequential Games by Exploiting Positional Similarity

: We study the strategic similarity of game positions in two-player extensive games of perfect information by looking at the structure of their local game trees, with the aim of improving the performance of game-playing agents in detecting forcing continuations. We present a range of measures over the induced game trees and compare them against benchmark problems in chess, observing a promising level of accuracy in matching up trap states. Our results can be applied to chess-like interactions where forcing moves play a role, such as those arising in bargaining and negotiation.


Introduction
In complex sequential interactions, such as those arising in bargaining and negotiation, boundedly rational participants are often not in a position to fully calculate the consequences of their own decisions and need to make a judgment call on which move to make next. These interactions are often compared to chess, for the importance of forward thinking, opponent modelling and prediction, surprise moves and deceptive concessions (for an interesting take on the connections between chess and negotiation, see [1]). However, the right way to analyse them is far from obvious. Game theory offers perhaps the most natural toolbox to do so, but it is important to strike a good balance between the simplicity of the solution and its applicability.
In his book Modelling Bounded Rationality [2], Ariel Rubinstein says: At the beginning of the twentieth century, Zermelo proved a proposition which can be interpreted: "chess is a trivial game". [2] (p. 130).
This sentence ironically revisits the seminal result of Zermelo on the determinacy of two-player zero-sum games (for a modern and neat technical exposition see [3]; for a comprehensive account of its historical relevance, including the original proof of the result, see [4]) which rarely scales up to games that are played in practice, such as chess. In a way, Rubinstein notes, there is a game theory and a game practice, and the often idealised toy models from the former abstract away from many important features in the latter, which we should not forget about. Some games have a solution in theory, but this solution can be very hard to find in practical play.
Rubinstein continues: But [in games like chess] this calculation requires going through a huge number of steps, something no human being can accomplish. Modeling games with limited foresight remains a great challenge [and the frameworks studied thus far] fall short of capturing the spirit of limited-foresight reasoning. [2] (p. 131).
problems in chess, with a number of surprising and promising findings. Notably, our structural similarity measure was able to match trap states to their child-trap states with 85% accuracy without using domain-specific knowledge. On top of this, we introduce a move-matching algorithm, which accurately pairs moves with similar strategic value from different positions. Our results are of immediate relevance to MCTS adaptations to detect and avoid trap states in gameplay. Other Related Work. Graph comparison is one of the most fundamental problems of theoretical computer science, with graph isomorphism computation having been an open problem for quite some time [11]. With tree structures, possibly the most commonly used metric is the edit distance [12]: based on the number of edits (node insertions, deletions, and substitutions) necessary to transform one tree into another, this metric works well for trees of a similar size with many shared nodes and edges. However, it tends to be less suitable when comparing multiple trees of different sizes, as large trees sharing some proportion of their nodes appear further from each other than two completely distinct smaller trees. An alternative measure is the alignment distance [13], an adaptation of the edit distance based on the notion of sliding one tree into another and counting the number of edits needed to transform both trees into the combined one. The alignment distance requires lower complexity to compute than the edit distance, but it is technically not a metric and suffers from similar problems comparing trees of different sizes.
In game playing, the presence of forcing continuations is identified as a key problem faced by AI engines, with more acute implications for chess-like games [10]. Surprisingly, though, the theory of similarity metrics to aid strategic decisions in game playing is not well developed.
Similarity measures have instead been used in other areas of AI, as in the case of Siamese neural networks for one-shot learning [14]. In this case, two symmetric convolutional neural networks were trained on same-different pairs and then shown a test instance, as well as one example from each possible classification. The output of the twin networks was then compared using a similarity measure. Here, a cross-entropy objective function was used to determine similarity, but this required the networks to be symmetric and weight tied. New similarity measures based on the structural similarity of networks could remove these requirements, but have not yet been investigated.
Paper Structure. The section "Positional Similarities" introduces our formal setup to compare game trees through a number of similarity measures. The section "Detecting Structural Similarities" uses these as the basis of a dynamic algorithm to detect structural similarities among subtrees. In the "Performance" Section, we compare these against known chess positions. We conclude by discussing potential applications and research directions.

Positional Similarities
Let G be a two-player finite extensive form game of perfect information, where players, e.g., Black and White, alternate moves, with White starting the game. Formally, G consists of a set of histories (x 0 , x 1 , . . . , x K ) such that x 0 is the starting board position and each x k+1 (with k ≤ K) can be reached from x k with a single legal move by White whenever k is even, and by Black otherwise (as in, e.g., [3]).
We are interested in comparing trees that result from players exploring game continuations from a certain board position on. In MCTS, for example, these are the game trees generated by the expansion step (see, e.g., [15]). Let T 1 , T 2 denote tree roots and We now present three natural measures, of increasing complexity, to establish how similar such trees are: the similarity of continuations, the similarity of sequences, and the tree-edit similarity. All these measures are model free, in the sense that they can be used in all situations that can be described as two-player finite extensive games of perfect information. We analyse their formal interrelation in this section and use them as the basis of our dynamic algorithm in the subsequent one.

Similarity of Continuations
Our first measure, which we call similarity of continuations, is calculated from the sets M d 1 , M d 2 of 1-ply atomic moves from starting positions T 1 , T 2 to their children of depth d. The similarity is the size of the intersection of these two sets divided by the size of their union.
where b is the breadth of the game tree. At depth 1, the similarity of continuations simply calculates the proportion of children that two nodes share. When extended to a deeper search, the measure becomes less finegrained, since a move that occurs at different depths in the trees will still count as shared, and multiple occurrences of the same move are only counted once.
As an example, consider the trees T 1 , T 2 in Figure 1, which have depth-2 continuation sets Here,

Similarity of Sequences
Our second similarity measure, which we call similarity of sequences, uses longer sequences of moves rather than single plies. To ease computation, we require each possible move sequence of length d from tree root T 1 to first be rewritten according to a predetermined move, ordered as a simplified sequence S. Formally, two sequences are simplified into one if-and only if-they are the same modulo move permutation. These simplified sequences are then stored in a structure T 1 , which we call the simplified tree of T 1 . As different move permutations can create the same simplified sequence, we also store the multiplicity k of each S in T 1 , where k corresponds to the number of ways S can be reached from the root note. Then, the similarity of sequences calculates the ratio of the intersection to the union of the simplified trees.
Let k 1i be the multiplicity of simplified sequence S i in T 1 , k 2j the multiplicity of S j in T 2 , and n = max(|T 1 |, |T 2 |), and the number of nodes in the larger of T 1 , T 2 . Then, the similarity of sequences of T 1 and T 2 is given by Calculating the similarity of sequences at depth 2 on the example trees in Figure 1 can be achieved as follows. For an alphabetical ordering, the simplified trees can be written as where the superscript corresponds to the multiplicity of each sequence. Then The tree simplification can be performed in one depth-first pass of each tree, taking time O(b d ). Calculating the proportion of shared sequences takes O(|T 1 | + |T 2 |), which is equal to O(b d ) in the worst case, the same logarithmic complexity as the similarity of continuations. It should be noted that the tree-reduction step means that the complexity coefficient is larger for the similarity-of-sequences calculation. This is a trade-off for accuracy at depth d > 1, as less information is lost when calculating from sequences rather than continuations.
Relation to Kernels. The similarity of sequences is closely related to the Tanimoto similarity measure or kernel [16,17] based on the intersection over the union of the inner products of two sets. The Tanimoto kernel was successfully used to calculate the similarity of molecule fingerprints in Bioinformatics from the feature map of a molecule by counting the number of paths through the map shared by different molecules [16]. The methods used in this area can be carried over to extensive form games of perfect information, as a board position can be viewed as a fingerprint representing the game that has gone before it. The game tree and feature map can both be traversed and have their matching paths counted. Using a suffix tree data structure [18,19], we can compute the Tanimoto kernel in time O(d(n 1 m 1 + n 2 m 2 )), for depth d, n i nodes and m i edges in trees T 1 , T 2 . The similarity of sequences is also comparable to the random walks kernel [20], a measure of similarity between two graphs found by counting the number of random paths they share. The main difference here is that the similarity of sequences has limited depth and is a normalised metric.

Tree-Edit Similarity
It may be the case that T 1 is very similar to T 2 but differs by some very shallow moves. If this is the case, the similarity of sequences measure would not detect this similarity. We therefore propose a modified version of the tree-edit distance [21], which traditionally counts the cost-wise minimal number of operations needed to turn one tree into another. The tree edit similarity, used to compare subtrees, is normalised, and acts as a metric on the tree-edit space. The normalised tree-edit distance [21] gives values in the range [0, 1], and as such would be suitable as a similarity measure when subtracted from one. The normalised distance is given as where e(T 1 , T 2 ) is the tree-edit distance between T 1 and T 2 , and α is the weight of edit operations. Since there is no need to weigh edit operations differently, we may take α to be one for all operations. Then, as shown by Li and Zhang [21], the formula is valid as a metric. Since calculating the distance between two trees is equivalent to calculating their similarity and subtracting it from one, we define the tree-edit similarity as Calculating the tree-edit similarity on the example trees in Figure 1 is as follows: This measure is the most fine-grained of the three detailed so far. Since calculating tree-edit distance on unordered trees is known to be NP-hard [22], we must again order the nodes in a preprocessing step with complexity O(b d ), as above. Once we have ordered trees, the time complexity reduces to O(b 2d d 2 ) when d < b, and O(b 2d+2 ) when d ≥ b [12]. As such, the improvements made by the tree-edit similarity over the two previous measures must be weighed against the added complexity.

Comparing Terminal States
It may sometimes be necessary to find the similarity of two terminal states. In terms of the game tree for a zero-sum game, two terminal nodes should have a value of one if they give the same reward for the agent (win-win, draw-draw, lose-lose), and zero if the reward is different. Since two terminal nodes have no children, their fractional similarity measure is undefined, so we must handle this case separately.
The normalised difference between the rewards of the two terminal nodes can be found by subtracting the reward R 1 of one node from the reward R 2 of the other, then dividing the result by the size of the range of possible reward values S 0 , S 1 . This gives a value between 0 and 1, where 1 represents rewards at opposite ends of the range, and 0 represents equal rewards. Subtracting from 1 then gives a similarity measure, formalised as This can be used in endgame cases to prevent zero errors when calculating other similarity measures.

Relationship between Measures
At depth 1, the similarity of sequences and the similarity of continuations are equivalent, as each child move only appears once per tree. At depth 2, the similarity of sequences has greater variation, as can be seen from the following chess-inspired instance.
Example 1 (Chess trees). Let T 1 , T 2 be nodes of a chess game tree where branching factor b is constant, and T 1 , T 2 differ only in the placement of two pieces. Then at depth two Now consider positions T 3 , T 4 , which also differ only in the placement of two pieces, except that in T 3 the opponent has chosen a forcing move leading to checkmate at depth 2, while in T 4 the opponent has chosen otherwise. Then T 4 extends past depth two, but T 3 is truncated and only contains depth 1 moves, all of which are shared with T 4 . Then at depth two So we can see that P cont (T 1 , T 2 ) < P seq (T 1 , T 2 ), and thus, the similarity of sequences has greater variation than the similarity of continuations. The tree-edit similarity is yet more variable than the similarity of sequences, as can be seen from further calculations on the same examples.
Modulo is the tradeoff between simplicity and complexity. The above similarity measures can be used to analyse any game trees with consistent move labelling. This would be especially useful for games with less dynamic trees; that is, those without capturing or blocking moves that change the game tree structurally between plies. For games such as Go, with the potential to use one piece to exert power over a whole area, these measures provide useful tools for analysis, which could be further explored by accounting for symmetries and abstractions of the board.

Detecting Structural Similarities
We may find ourselves comparing positions that do not share many continuations, e.g., those that are far away from one another in a game tree. What we can then do is to extend the previous approach to recursively check for subtree similarity.

Structural Similarity Measure
Our final similarity measure, which we call the structural similarity measure, compares the graphical structure of two game trees without comparing their atomic moves directly. The measure is based on calculating the similarity of each starting position T 1 to each of its child nodes T 1i using any of the three previously defined measures, before comparing this list of similarities to the list of similarities of another starting position T 2 to its children T 2i . The measure uses an assignment algorithm (see Algorithm 1) to pair each child node of T 1 to a child node of T 2 to minimise the sum of the paired nodes' similarities to their respective parents. If one subtree has more children than the other, each unpaired child adds one to this sum. The sum is then divided by the larger number of children and subtracted from one to provide the structural similarity of the two subtrees, where a value of 1 is identical and zero is completely distinct. Let c 1 , c 2 be the number of child nodes of T 1 , T 2 respectively. Then, for a selected similarity measure P, the structural similarity measure can be expressed as for child T 1i of T 1 do 5:

Algorithm 1 Structural Similarity Detection Protocol
for child T 2j of T 2 do 7: pad smaller of sim 1 , sim 2 with 1s 9: for i, j from 1 to max do 10: matches ← MATCH(distances) 12: for k from 0 to max do 13: total ← total + matches[k] 14: return ( distances max ) The following calculates the structural similarity measure based on the similarity of continuations at depth 1 on the trees in Figure 1. The similarity of each branch to its root is There are two minimum distance matchings: and their total distance is 0.683. So While the structural similarity measure may calculate more accurate similarities between positions, this comes at a cost, as each calculation requires similarity computations of every child node to its parent. When the similarity of sequences or continuations at depth 1 is used as the base measure, on average it takes time O(b 2 ) to calculate the similarity of all children to their parent. Assigning children in pairs using the Hungarian algorithm takes O(b 3 ) operations, so the structural similarity algorithm runs in time O(b 3 ). To improve the complexity, the measure could be approximated by randomly sampling child nodes and calculating their structural similarity, which warrants further investigation.

Strategic Similarity of Tic Tac Toe Positions.
To investigate convergence of the structural similarity measure to an intuitive similarity of board positions based on their strategic advantage, we manually calculated the structural similarity measure on a small section of the Tic Tac Toe game tree, using the similarity of continuations measure as a basis. Figure 2 shows the game tree for a small segment of a Tic Tac Toe game, and Table 1 contains the results of the similarity analysis, where the branches of the game tree are labelled according to their original board position O and the square in which the next move is made. The measure correctly identifies the rotational symmetry between branches OB1 and OB3, and gives a value of zero for all comparisons of OC2 with distinct subtrees, as OC2 is terminal and so shares no structural similarity with any of the other depth 1 nodes. This is very promising, as it shows that the measure behaves well on a solved game, so we can be more confident in trusting it in a heuristic setting.  Move Matching. As the structural similarity measure pairs moves that are comparably similar to their parent states, this method can be used to pair moves from different board positions that may have similar strategic value. For example, if one position is known to have a killer move in two plies, leading to a win for the opponent, and this position has a high similarity to a new position, the depth 2 matches can be inspected and the move that is most frequently matched to the killer move in the known position can be identified, and this move is likely to be a killer move from the new position. We will evaluate the effectiveness of this approach in the forthcoming section.
Generalisability. The structural similarity measure is generalisable to the analysis of any two local trees with self-consistent move labellings, as the measure can be calculated independently of such labels. This means, e.g., that the structure of a local Go tree can be compared to that of a local chess tree or, alternatively, we can show how a game tree changes through the game.
Calculating how dynamic a game is, in terms of the variability of the connection density of the graph, can be very useful in indicating which gameplay heuristics to use. For example, to use the All-Moves-As-First (AMAF) heuristic, which initially updates sibling nodes with the same estimated value for each move played, an agent first assumes that a move from one node is likely to affect the game in a similar way to the same move played from a sibling node. This may be likely to work on less dynamic games, but could be less reliable for highly dynamic games, where the effect of a move on the state of the game is less consistent. Conversely, pruning may be most helpful for highly dynamic games, as these games offer a stark contrast between reward values for different branches, which is not necessarily the case for less dynamic games. These hypotheses are supported by studies of successful AMAF use in less dynamic games such as Go [23], Phantom Go [24], Havannah [25], and Morpion Solitaire [26], successful pruning in the dynamic game of Amazons [27] and less successful pruning in Havannah [25].

Performance
We tested how effective the first three similarity measures were at detecting nearby trap states in chess, using the similarity of continuations at depth d = 1, similarity of sequences at d = 2 and tree-edit similarity at d = 2. We chose a sample of four distinct trap states which each lead to checkmate within 2 to 4 plies, as shown in Figure 3. We used a sample of all 1000-1500 board positions that were two plies away from each trap state, and recorded whether the trap was maintained or not for each new position. The measures were calculated on each of these board positions, as was a cross-correlation measure that was used as a control, calculated by finding the number of squares where piece placement differed and dividing this number by 64. The similarity of sequences was adapted for chess by including captures in the simplified sequences. This adaptation can be generalised to any game with irreversible moves by recording the irreversible moves from each sequence as well as its standard moves.
Clearly, an effective measure should evaluate trap states as highly similar to the original position with high frequency, so we fixed a threshold value ρ and calculated the proportion of trap and non-trap states with similarity higher than ρ for each measure. For each trap state and each of our similarity measures, when ρ was set to the average value of the similarities, around 70% of all children that were also trap states had above average similarity to the original position, and consistently over 50% of non-trap children had below average similarity. This was not the case for the cross-correlation, where up to 87% of trap states had below average similarity, and 72% of non-trap states had above average similarity. These results can be seen in Table 2.  In general, there was no significant difference between the proportion of false positives (non-traps with above average similarity) and false negatives (traps with below average similarity) given by the similarity of sequences, similarity of continuations and tree-edit similarity. However, the added time complexity of the similarity of sequences and treeedit similarity at depth 2 was significant. Thus, perhaps surprisingly, the similarity of continuations is effectively better as a heuristic similarity measure for evaluating similarities of closely related board positions than the similarity of sequences.
Finally, for complexity considerations, we tested the structural similarity measure on five smaller samples of 40 randomly selected child positions from the first two trap positions. Using this measure, an average of 85% of child trap states had above average structural similarity to the original position. The high complexity of this measure makes it time intensive to compute, but results clearly show it is rather effective at picking out potential trap states from a select sample of positions.
Move Matching. The move-matching algorithm was also tested on various chess positions to detect moves with similar strategic impact. Frequent matchings were assumed to be a more reliable indicator of moves with a similar effect on gameplay, so only the top five most frequently matched pairings were assessed.
We tested the matching algorithm on three different samples, each with six pairs of board positions, all shown in Table 3. Firstly, we used the algorithm on all traps from the trap-detection sample. For all but one of the pairings (Légal and Budapest traps), all of the five most frequent matches for each pair comprised two decisive or two non-decisive moves. In all but one pairing (Caro-Kann and Kieninger traps), the two most frequently paired moves were both checkmate moves. The second sample we used was based on the Légal and Budapest Gambit traps. We compared each trap with a sample of three child positions. This sample comprised one position containing the original trap but a difference in the placement of two pawns; one position where the bishop that had threatened the queen had been captured; and one position that was selected as the best continuation by the Stockfish chess engine. In all but one pairing, all of the top five matches comprised two decisive or two non-decisive moves. All of the most frequently paired moves were both decisive. The third sample was a selection of positions from the 2016 World Championship match between Magnus Carlsen and Sergey Karjakin, which appeared after 10, 20, 30, and 40 plies. An average of four of the top five matches for each pairing comprised two decisive or two non-decisive moves. Three of the most frequently paired moves were both check moves, and one of them comprised two equivalently unimpactful moves of the king. This sample provided less reliable pairings than the previous two samples, possibly because its positions had a more varied strategic impact than those of the other samples.
These results show that the move-matching algorithm is fairly well suited to finding similarly decisive moves from different board positions, and thus is useful for detecting possible trap states and sacrificial moves from the game-tree structure without evaluating board positions.

Amaf/Rave Adaptation
Past papers [28] have shown that MCTS displays a marked improvement when using adaptations such as All-Moves-As-First (AMAF), Rapid Action Value Estimation (RAVE), and Permutation-AMAF. Such adaptations update multiple areas of the game tree at once, where one move is available from many positions (as in AMAF) or where one board position is a permutation of another, on the assumption that the equivalent move from each of these positions will have the same strategic impact on gameplay. We envisage the effective use of a similarity measure when choosing which equivalent positions to update, as this may lead to more effective trap detection than that of MCTS or its AMAF adaptations. We suggest adding a similarity measure to two MCTS adaptations: the killer heuristic, where decisive moves are evaluated first, and killer RAVE, which only applies RAVE to decisive moves [29]. MCTS may more quickly detect a trap ahead when combined with these similarity-based adaptations.

Wider Game Strategy and Graph Applications
Many modern AI programs use deep learning to recognise tactical patterns from shapes of features in the field of play. It seems natural to use this learning strategy to group atomic moves by their tactical value, to then create an abstracted game-tree with a lower branching factor than the original tree. The structural similarity measure can then be used to detect tactical moves representing equivalent strategies, giving the agent options once it has chosen its desired strategy.
In cases where an agent is trained to predict the moves a human player would make, as was the case for AlphaGo [5], the modified AMAF/RAVE adaptation above can be used to prime the neural network and update predictions for multiple positions at once. This may lead to opportunities for faster reinforcement learning or more efficient learning from smaller data sets.

Conclusions
We presented four similarity measures for game positions in two-player, deterministic games of perfect information, based on their game trees with no domain-specific knowledge. We tested the measures on chess and suggested their use in heuristics for MCTS-based agents, noting their application to a range of graphical problems. We showed that, using our first two similarity measures, an average of around 70% of chess positions occurring two plies after a trap state that were also traps had above-average similarity to the original position. This figure rose to 85% using the structural similarity measure. We also showed that our move-matching algorithm consistently paired moves with similar strategic value from different starting positions. We believe this can aid MCTS agents in finding equally decisive moves within different areas of the game tree, as well as in detecting new trap states.