Comparing Search Algorithms on the Retrosynthesis Problem

In this article we try different algorithms, namely Nested Monte Carlo Search and Greedy Best First Search, on As-traZeneca’s open source retrosynthetic tool : AiZynthFinder. We compare these algorithms to AiZynthFinder’s base Monte Carlo Tree Search on a benchmark selected from the Pub-Chem database and by Bayer’s chemists. We show that both Nested Monte Carlo Search and Greedy Best First Search outperform AstraZeneca’s Monte Carlo Tree Search, with a slight advantage for Nested Monte Carlo Search while experimenting on a playout heuristic. We also show how the search algorithms are bounded by the quality of the policy network, in order to improve our results the next step is to improve the policy network.


| INTRODUCTION
Retrosynthesis is a domain of organic chemistry that consists of finding a synthetic route (a sequence of reactions) for a given molecule in order to synthesize it from a given set of available precursor molecules [1].It is an important part of organic chemistry molecule synthesis, and can be used to produce newfound drugs.What we aim for in this paper is to evaluate the strengths and weaknesses of two search algorithms by comparing them to AiZynthFinder's Monte Carlo Tree Search (MCTS) on a small benchmark consisting of curated and complex molecules, covering many reactions encountered by chemists.
The second section presents the retrosynthesis problem, the third section presents the AiZynthFinder retrosynthesis tool, the fourth section describes the search algorithms we compare, the fifth section details the benchmark used to compare the search algorithms, and the sixth section gives experimental results.

| THE RETROSYNTHESIS PROBLEM
Before diving into the details, let's broadly present the retrosynthesis problem.
given molecule that leads to available precursors.It is a decision problem.
We start from the one molecule we want to find a synthetic route for and decompose it into precursors, available in the market or not.Then we recursively decompose the precursors according to a reaction template.Each time a reaction is applied, this gives us another state of the search, composed of different molecules (as many or more).The goal is to find a sequence of reaction templates (a retrosynthetic route) that leads to a state of the search uniquely composed of molecules/precursors available on the market.
Figure 1 represents a retrosynthetic route for molecule A0 (on the right) as displayed in AiZynthFinder's UI, each green framed molecule is a precursor available in ZINC, each orange framed one is a molecule that isn't available, and each black dot is a reaction.The molecules on each column represent a state of the search space that was explored, but it does not display all states explored, only the ones in the route.AiZynthFinder uses a neural network trained on USPTO, a set containing about 18 million reaction templates from organic chemistry patents.That neural network's role is to select the best reactions given a molecule we want to synthesize, it also gives a value to each move (the prior).Due to how the program works, it's hard to do without that neural network and the priors because it would require finding another method to evaluate the reactions available for a molecule.Thus, every algorithm presented here get their possible moves/reactions from the policy neural network.In this article we use AstraZeneca's open source pretrained network.Training a network to predict more accurately the reactions for molecule retrosynthesis is another domain called "one step retrosynthesis", see Ref. [1], it is not what we aim to explore here.

| AIZYNTHFINDER
AiZynthFinder takes the SMILES: a string representation of a molecule, as an input which makes the first state (which is made of only one molecule).A state is a set of molecules, from each state the neural network proposes some reactions producing a molecule from the state from precursors.If a reaction is played the molecule is removed from the state, and the precursors are added (a reaction can also be a modification of the molecule's shape only, not removing any atom, we call these "structural moves").Retrosynthesis often uses and/or trees, here the "and" are combined into a single state as it makes the search more simple.
We use the ZINC [2] molecule database, a curated collection of commercially available (in stock) chemical compounds prepared especially for virtual screening.Any state is evaluated by AiZynthFinder's base evaluation function which is 0.95*(fraction_of_mole-cules_making_the_state_in_stock) + 0.05*(squash _ depth_of_the_reaction_tree).This function tries to maximize the fraction of molecules in stock among the ones composing the state.It decides between those that have the same proportion by adding the squashed (applying 1 -sigmoid of) depth of the search tree to favor shorter routes.
We don't modify it directly in this study but exploring different score functions could be interesting in future works.

| ALGORITHMS
Our algorithms use common primitives:

| AizynthFinder's MCTS
AiZynthFinder uses a MCTS algorithm with priors very similar to PUCT.PUCT stands for "Prior Upper Confidence bounds applied to Trees", it is a generalisation of the UCT algorithm [4] using priors for each state of the problem (the prior is the policy at the output of the neural network here), see Ref. [5] for the original version of PUCT.PUCT has been used in AlphaGo [6] and Alpha Zero [7].Just like PUCT, this MCTS algorithm explores the tree using playouts: it selects the next moves to try according to their evaluations.It plays the selected moves until it reaches a state not explored yet or until it reaches a terminal state.That state is memorized in an entry that contains the number of visits for that state, the number of visits for each child state, and the evaluations for each child state.Then the score of that newly explored state is retro-propagated to update the evaluations of the parent states.
AiZynthFinder's MCTS in Algorithm 1 differs from standard PUCT in how the bandit value, the value used in the selection phase of a MCTS, is determined.In all our experiments the c hyper-parameter is AstraZeneca's base one of 1.4.Algorithm 1 is iteratively called until it reaches the maximum number of iterations, the transposition table is conserved between iterations.

| Nested Monte Carlo Search
Monte Carlo Tree Search (MCTS) has been successfully applied to many games and problems [9].
Nested Monte Carlo Search (NMCS) [10] is an algorithm that works well for puzzles and optimization problems.It biases its playouts using lower level playouts.Online learning of playout strategies combined with NMCS has given good results on optimization problems [11].Other applications of NMCS include Single Player General Game Playing [12], Cooperative Pathfinding [13], Software testing [14], heuristic Model-Checking [15], the Pancake problem [16], Games [17] and the RNA inverse folding problem [18].
Online learning of a playout policy in the context of nested searches has been further developed for puzzles and optimization with Nested Rollout Policy Adaptation (NRPA) [19].NRPA has found new world records in Morpion Solitaire and crossword puzzles.NRPA has been applied to multiple problems: the Traveling Salesman with Time Windows problem [20,21], 3D Packing with Object Orientation [22], the physical traveling salesman problem [23], the Multiple Sequence Alignment problem [24] or Logistics [25].The principle of NRPA is to adapt the playout policy so as to learn the best sequence of moves found so far at each level.
The use of Gibbs sampling in Monte Carlo Tree Search dates back to the general game player Cadia Player and its MAST playout policy [26].
NMCS [10] recursively calls lower level NMCS on children states of the current state in order to decide which move to play next, the lowest level of NMCS being a random playout, selecting uniformly the move to execute among the possible moves.A heuristic can be added to the playout choices.
We detail the NMCS in Algorithm 2.
Here we used a heuristic to penalize the structural moves (when nothing is added or removed from the molecule, it only changes shape) because these moves often occupied most of the limited depth of search, even looping to a previous state sometimes.We use the score of the children (between 0 and 1), to which we add 1 if the largest molecule weight in the state is smaller than its parent largest molecule weight.That value is then used as the chance to select that move over the sum of every other move's values.The modified score function for the heuristic and the heuristic are described in Algorithm 3.
While we did not use softmax to harden the heuristic, nor tuned the parameters, that simple heuristic allowed us to diminish the structural moves problem, giving them less than half the chance of being selected in playouts than before, and led to better results.

| Greedy Best First Search
GBFS stands for Greedy Best-First Search.It is a simple algorithm that consists of opening (and removing) the best node from a list of nodes sorted by their scores, evaluating all its children and inserting them in the sorted list of nodes, and then repeating the operation by opening the new best node [27].The evaluation function can use playouts to make the algorithm closer to a Monte Carlo Search algorithm.Like MCTS, that algorithm can lock itself in a local minimum, but is faster (and less accurate) as it skips the playout and the associated calculations between each node discovery.Both lack forced progress in depth of NMCS.We describe GBFS in Algorithm 4.
The function insert(open-states, score, new-state) inserts new-state in the sorted list open-states given the score value.
We modified the evaluation function similarly to the NMCS playout function: if the children's biggest molecule is smaller than the parent's then add 1 to the score.That modification allows to avoid structural moves, multiplying states with high scores, but also prevents structural moves that are sometimes necessary to the resolution of a molecule, finding an alternative solution would greatly help this algorithm.

| BENCHMARK
To compare our algorithms to AstraZeneca's MCTS, we use a small subset containing 40 SMILES representation of drugs from the PubChem database (see Table 3) selected by Ref. [28] (molecules ID starting with C) and 20 SMILES representation of molecules selected by Bayer's chemists [29] (molecules ID starting with A).The Bayer's chemists molecules contains molecules ranging from easy ones to hard ones.The 40 molecules selected randomly from PubChem by the authors of the benchmark were obtained in such a way to cover small to large molecules [28].The original goal of this benchmark was to test the prediction of difficulty of retrosynthesis of these molecules, thus these 40 molecules feature some of the hardest to synthesize according to some chemists.
Generally, one step retrosynthesis uses USPTO-50 K [30] as a benchmark.However here we are not trying to benchmark the reaction propositions of the neural network used here, but how our algorithm solves the retrosynthesis problem.Thus our benchmark provides a few advantages: proposing harder molecules to synthesize, reducing the benchmark size allowing us to focus more on each molecule, and giving a fixed benchmark to compare with others.
You can find the SMILES representation of this benchmark in Table 3 of the appendix.These experiments were made on a 3.50GHz intel core i5-6600 K on Windows 10 with 32 Gb of RAM.We used the same common parameters for every algorithm: * Max step for substrates (how many reactions we can make from a substrate to the target molecule): 15 * Policy cutoff cumulative: 0.995 * Policy cutoff number (maximum number of possible moves returned on a molecule): 50 * Filter cutoff: 0.05

| AiZynthFinder's MCTS
To compare our algorithms, we ran AiZynthFinder MCTS at least 2 times on each of the 60 molecules of the benchmark with the base settings, C = 1.4 for the exploration/exploitation constant.Running it a few times only is not problematic because the MCTS results were observed to be very stable on the few molecules we ran it multiple times on.Our goal is not to measure the exact solving times as they heavily depend on the implementation, the language, and the hardware, but to see how many molecules of the benchmark a given algorithm can find a synthetic route for.The times specified are here only to give an idea of the differences in performance between the algorithms.
First, we ran the MCTS (Table 1) with a timeout of 2 minutes and a maximum number of iterations of 100 ("MCTS 2 min 100it").The molecules identified with a C in Table 3 were either solved instantly (< 200 ms) or not solved in 2 minutes.On the contrary, the molecules selected by Bayer's chemists (starting with "A") took generally more time to be solved if they were.In addition, a bigger proportion of molecules from A were solved than from C, which can be explained by their sizes and the presence of distinctive atoms (like fluor).We then launched the MCTS with a time limit of 20 minutes, solving a few more molecules, and finally, we launched a MCTS of 2 h on some of the remaining ones: C2, C35, C37, C38.Only C37 (*) got solved in 5236.093seconds, it was not included in Figure 2.
As you can see on Table 1, increasing the MCTS search time doesn't help much, the molecules are either solved instantaneously (< 200 ms) or very quickly.The instantaneous solving is due to the neural network (onestep retrosynthesis) which immediately proposes the right solution in 1 or 2 reactions for small molecules.This means that the quality of the search algorithm doesn't matter on these molecules, and is solved almost equally as fast with the MCTS, the NMCS, the GBFS, and even a NMCS without playout.These molecules are not useful to our research so we remove them from the set for further experiments.

| Other algorithms
Every molecule solved by the MCTS was also solved by NMCS and all but one by GBFS.Even if they may be slower than MCTS for easy states (the GBFS has to instance 50 children per opened state even if it uses only one or two).Thus, we are going to focus on molecules not solved by AizynthFinder base MCTS and those that took at least 2 minutes to solve.
For the NMCS, we first perform a level 1 NMCS using only the 5 best moves from each state, instead of the 50 best given by the Policy cutoff number.If that NMCS (usually shorter than 1 minute) fails we perform a much longer level 1 NMCS using all the 50 moves.If even that fails, we launch the level 2 NMCS.The level 2 NMCS is very slow but is able to solve molecules unsolved by both MCTS and GBFS.
The GBFS was only launched once on each molecule because the algorithm is deterministic, it was launched with a time limit of 20 minutes, and molecules not solved by then are considered unsolved.Given enough time, the GBFS explores the entire tree.
First, we can notice in Figure 2 that NMCS and GBFS outperform Astrazeneca's MCTS in the long run, but as the y-axis starts at 15, they slightly underperform on molecules solved by 1 or 2 steps, in less than 1 s.These molecules take about 0.6 s with GBFS and NMCS compared to about 0.1 s for MCTS.This is because they don't open first the most promising node according to the neural network.It is observed that GBFS is better than both NMCS and MCTS for search times between 2 and 480 seconds.It stops improving after 120 seconds while the others continue to improve.NMCS finds retrosynthesis routes slower but can find more of them.MCTS (or any algorithm opening the reaction greedily according to the neural network) is better for very short experiments (< 1 s), GBFS is better for medium length experiments (< 60s) and NMCS appears to be better for longer experiments and more complex molecules.F I G U R E 2 Distribution of the numbers of molecules solved with times in seconds.
On Table 2 we focus our attention on the molecules that took more than 2 minutes to solve or were not solved by at least one of the algorithms: C19, C20, C22, C37, A0, A3, A5, A6, A8, A12, A13 and A15 The results on A8 (*) were obtained by opening the 200 best nodes given by the neural network and not the 50 best, in addition to only searching up to a depth of 5 instead of 15.NMCS and MCTS were unable to solve the molecule with the same parameters.The reactions required to solve A8 were not present in the 50 best proposals from the neural network, but in the 200 best.On the other hand, a top 5 level 1 NMCS was enough to solve 30 of the 60 molecules, meaning the NN was very accurate in these cases.It emphasizes how much results depend on the accuracy of the NN.Again, the one we used was trained by AiZynthFinder's team on the public USPTO reaction dataset, which does not feature many reactions present in licensed datasets such as Reaxys or Pistachio [31].
Our GBFS was unable to solve C20, despite being solved by the MCTS and the NMCS, we think it was because our search heuristic favors the non structural moves when a structural move is required here to cut the carbon cycle.Overall, molecules with long carbon cycles posed problems to be solved to all the algorithms and C20 was the smallest and most simple of them from the benchmark.It is what AstraZeneca is focusing on in the latest updates for AiZynthFinder (we used an earlier version from 2022 to conduct these experiments).
Like with MCTS, running the other algorithms longer did not yield much improvement.This is because the neural network does not always propose the best reactions.Obtaining a better network, possibly trained on a more complete dataset could improve our results greatly.
To put our results in perspective, out of the 60 molecules of our benchmark we managed to solve 35 molecules with GBFS, and 36 with NMCS [29], while managed to solve 38 molecules with a MCTS and 41 with a DFPN (Depth-First Proof Number search, Aizynth-Finder's DFPN does not yield such results).We hope we will be able to try with a more complete dataset and the according NN in the future.Bigger molecules would still be a challenge given all the reactions and subtrees they offer, but we think it could help with the few unsolved small molecules: C4, C6, C7, C10, C12, C35, C38, and A10.

| CONCLUSION
While MCTS solves 31 molecules out of 60 from this benchmark, GBFS solves 35 in a reasonable time and NMCS solves 36.We showed that GBFS and NMCS could provide satisfying performance improvements, especially since GBFS and NMCS are much simpler and don't use the neural network as a search policy beyond the reaction proposition, unlike MCTS.We believe that a more accurate neural network trained on a bigger dataset, and a more complete template set would improve the performances.

| FUTURE WORKS
Retrosynthesis is a vast topic, and much remains to be done, we only scratched the surface of AiZynthFinder here.It would be interesting to experiment on more algorithms, including the canonical PUCT and apply the prior to the algorithms we used here, use other score functions or train and use another neural network.This research would require a lot of time, and we can only encourage other computer scientists to try their algorithms and score functions on AiZynthFinder.

| APPENDIX
The appendix can be found in Table 3 below.

ACKNOWLEDGMENTS
This work was supported in part by the French government under the management of Agence Nationale de la Recherche as part of the "Investissements d'avenir" program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute).

AiZynthFinder [ 3 ]
is a retrosynthesis tool made by As-traZeneca's research and development.It has the advantage of being open source, understandable, and well described.
T A B L E 1 Comparison of algorithms.
T A B L E 2