Synthetically accessible de novo design using reaction vectors: Application to PARP1 inhibitors

De novo design has been a hotly pursued topic for many years. Most recent developments have involved the use of deep learning methods for generative molecular design. Despite increasing levels of algorithmic sophistication, the design of molecules that are synthetically accessible remains a major challenge. Reaction-based de novo design takes a conceptually simpler approach and aims to address synthesisability directly by mimicking synthetic chemistry and driving structural transformations by known reactions that are applied in a stepwise manner. However, the use of a small number of hand-coded transformations restricts the chemical space that can be accessed and there are few examples in the literature where molecules and their synthetic routes have been designed and executed successfully. Here we describe the application of reaction-based de novo design to the design of synthetically accessible and biologically active compounds as proof-of-concept of our reaction vector-based software. Reaction vectors are derived automatically from known reactions and allow access to a wide region of synthetically accessible chemical space. The design was aimed at producing molecules that are active against PARP1 and which have improved brain penetration properties compared to existing PARP1 inhibitors. We synthesised a selection of the designed molecules according to the provided synthetic routes and tested them experimentally. The results demonstrate that reaction vectors can be applied to the design of novel molecules of biological relevance that are also synthetically accessible.


INTRODUCTION
De novo design techniques were first proposed around 30 years ago as a way of accelerating the drug discovery process with many different approaches developed over time.Key issues in de novo design are exploring the enormous search space of drug-like chemical entities effectively while ensuring that the designed compounds are biologically relevant and synthetically accessible [1].Early approaches were agnostic of synthesis and as a consequence their application was limited [2][3][4].They were later replaced by rule-based approaches whereby modifications made to starting structures were based on a small number of hand-coded transformation rules [5,6].While these approaches lead to compounds that are more likely to be synthesisable, the use of pre-defined rules limits the extent of the chemical space that can be explored.Recently, a number of deep generative methods have been developed for de novo design, and while these provide data-driven approaches to promote the search for novel compounds, they typically do not account for synthesis explicitly [5][6][7][8][9][10][11][12][13][14][15][16].A more sophisticated approach was also described recently in which a generative deep learning model was coupled with a rulebased filter that was used to select compounds compatible with an automated synthesis platform.This method was successful in linking de novo design and compound synthesis into an automated workflow; however, it was limited to just 17 synthetic rules and to one-step syntheses both of which limit the ability of the method to explore diverse areas of chemical space [17].
We have developed a data-driven approach to the design of novel, synthetically accessible molecules which we refer to as reaction vector-based de novo design [18,19].Reaction vectors are derived automatically from databases of reactions so that the available transformations are not limited to a predefined set of rules but are driven by a user-defined database of reactions.The core of our approach to de novo design is a structure generation module, which takes reaction vectors and a database of reagents and applies these to input molecules to generate novel products for which synthetic routes can be provided based on literature precedents.The structure generation module can support different de novo design strategies such as "iterative forward synthesis" which starts with a key fragment to which different fragments are added in each iteration.However, this strategy rapidly leads to a combinatorial explosion of possible molecules.We have recently described the integration of the structure generator into an "insideout" approach to de novo design in a tool called RE-NATE (REtrosynthetic desigN using reAcTion vEctors) [20].The starting point is one or more known compounds of interest each of which is fragmented using retrosynthetic rules.A search is then made for similar reagents for each of the resulting fragments and these are combined in silico by the structure generator using reaction vectors and external reagents.The approach is similar in concept to Flux and COLIBREE but with a number of important differences [21,22].The most significant difference is that the forward construction is driven by known reactions and available reagents so that synthetic routes are provided for the novel compounds.RENATE was previously validated on retrospective design by showing that it could reproduce known drugs and propose meaningful synthetic pathways for them [20].Here we demonstrate the prospective application of RENATE to the de novo design, synthesis and experimental validation of molecules that meet multiple objectives.The study is focused on ADP-ribosyltransferase PARP1 [23], a nuclear enzyme that has critical involvement in DNA the repair of single-strand breaks, as proof-of-concept.PARP1 is primarily identified as a target in oncology yet recent studies have also suggested it as a potential target against ageing and neurodegenerative diseases such as Alzheimer's disease and Parkinson's disease [24][25][26].However, the brain availability of PARP1 inhibitors is often limited by their low lipophilicity which reduces their ability to cross the blood-brain barrier and by their affinity for P-glycoprotein (Pgp) or breast cancer resistance protein (BCRP), which are efflux transporters expressed at the apical membrane of the blood-brain barrier (BBB).
We used a set of known PARP1 inhibitors as reference ligands for the design.These were fragmented and new molecules were generated in silico using reaction vectors derived from the US Pharmaceutical Patents Database and reagents from Enamine [27].The building blocks/reagents were selected using pharmacophore fingerprint similarity to the fragments from the known inhibitors, and the top-scoring molecules were selected at each step using a series of machine learning models designed to predict: PARP1 binding; low substrate affinity for Pgp and BCRP, respectively; and good BBB penetration.Following an assessment of the output molecules for the presence of reactive and undesirable groups, the top scoring products were docked against PARP1 and a subset was synthesised based on the synthetic routes proposed by RENATE.Following the synthesis, the compounds were tested against PARP1 in an enzymatic assay.The apparent BBB permeability of two selected compounds was finally measured in a Parallel Artificial Membrane Permeability Assay (PAMPA).

PARP1 AS A TARGET
PARP1 binds to damaged DNA and promotes the recruitment of repair enzymes through the generation of poly-ADP-ribose, which PARP1 attaches to itself and other proteins.PARP1 modifies proteins via the covalent addition of ADP-ribose and elongates it sequentially to become poly-ADP-ribose using NAD + as the donor of ADP-ribose as shown in Figure 1 [28,29].PARP1 plays fundamental roles in other biological processes such as cell proliferation, differentiation, and apoptosis [30].Upregulation of PARP1 is observed in cancers with BRCA gene defects and has been shown to enhance the resistance of cancers to DNAdamaging therapies, hence making its inhibition an attractive field of study in pharmaceutical research [31].The upregulation of PARP1 also leads to a drastic reduction of NAD + levels, affecting ATP production and cell functions, which can lead to the development of other conditions such as diabetes, neurodegenerative diseases, and viral infections [32,33].Most PARP1 small molecule inhibitors are designed to act as NAD + mimetics to produce interactions analogous to those between the nicotinamide and the enzyme in order to block the substrate binding site of PARP1.
Beyond the role of PARP1 in oncology, emerging research has indicated its potential as a therapeutic target against Alzheimer's disease and Parkinson's disease, which have been shown to upregulate PARP1 and lead to neuroinflammation, mitochondrial dysfunction, and autophagy dysregulation [24,25].Hence, the inhibition of PARP1 has also been suggested to maintain the function of the brain and contribute to life span extension [26].However, the effectiveness of the currently approved PARP1 inhibitors has been shown to be significantly reduced by their poor brain permeability, which makes them unsuitable as neurotherapeutics [39][40][41][42].Therefore, the development of PARP1 inhibitors with suitable potency and PK profile including high brain permeability and low affinity towards efflux transporters is strongly desired to extend the application of this class of inhibitors against brain cancers and neurodiseases [35][36][37]43,44].
F I G U R E 1 DNA repairing mechanism mediated by PARP1: a DNA single-strand break activates PARP1 via non-covalent recognition.The activated PARP1 recruits a series of enzymes responsible for DNA repair via the covalent addition of poly-ADP-ribose.The recruited enzymes repair the damaged DNA strand.
The selected crystal structures show the three interacting residues that are conserved across the ligands: Gly863 and Ser904, which are responsible for The structures of PARP1 inhibitors from the selected crystal complex structures.These compounds show similar interactions within the receptor and the atoms involved in hydrogen bonding as donors and acceptors are coloured in blue and red, respectively, with substructures involved with π-π stacking interactions shown in bold.the formation of three hydrogen bonds while Tyr907 produces π-π stacking interactions with the electron-dense areas of the inhibitors.These residues are also responsible for the interaction with the natural substrate NAD + [45][46][47][48].The superimposition of all five protein structures produced an excellent structural alignment of the residues in the catalytic domain, with an average root-mean-square deviation (RMSD) of 0.543 Å, hence suggesting the suitability of the crystals for cross-docking.A rendering of the selected compounds aligned within the PARP1 catalytic domain is reported in Figure 4, which shows that the ligands assume similar binding inside the receptor.

De novo design and compound selection
The five PARP1 inhibitors were processed to yield their key fragments, which in turn were used to search for new building blocks which were combined by RENATE to design new products.The software was configured using a data set of commercial reagents and a reaction vector database, which are described in the Experimental Section.The parameters used in RENATE and the results from the fragmentation are reported in the Supporting Information (Sections S3 and S4).The design was driven by scoring the generated compounds on their predicted PARP1 pIC 50 , Pgp and BCRP substrate inactivity, and BBB + character.The number of The selected compounds interacting within the binding pocket (pink) of the PARP1 catalytic domain (light purple).The compounds show similar orientation and binding in the pocket.compounds generated from each experiment and their enumerated stereoisomers are reported in Table 1.Note that RENATE produces flat structures, and these must be stereo enumerated to be docked correctly.
The fact that Rucaparib and Talazoparib did not reach the maximum number of final products allowed by the design parameters can be explained by the complexity of their chemical motifs which have fewer chances to match the reaction centres in the reaction vector database.These results are in agreement with the principles of reaction vector-based design: the structural characteristics of the starting materials and reagents affect the number of applicable reaction vectors, which in turn affects the number of products [19].
The final products were then docked against PARP1 with ten poses generated per compound.The docked compounds were inspected manually to identify promising candidates.This was done by sorting compounds on their scores and prioritising the selection of candidates based on the quality of their interactions with the key residues and the number of consistent poses showing valid interactions.Specifically, compounds were selected that showed consistent poses that were qualitatively similar to those of their reference ligands, and which exhibited valid hydrogen-bond interactions with Gly863 and Ser904 (i.e., a bond distance between 2.5 and 3.0 Å and angle recognised by GOLD as compatible with that of the hydrogen bond).These manual selection criteria were defined due to the difficulty of algorithmically discriminating candidates from other top-scoring compounds, which, although they generally presented physically meaningful poses and interactions within the binding site, did not possess characteristics similar to their reference ligands.A total number of 20 compounds was selected for synthesis.They are reported in the Supporting Information (Section S5) with their predicted activities, pose consistencies, and average and standard deviations of the binding scores across poses.The selected compounds were predicted to have micromolar pIC 50 , classified as non-binders of Pgp and BCRP, and to have BBB + character.In addition, they were all considered to be novel compounds due to their absence from two known suppliers (https://emolecules.com and https:// molport.com)and due to no data being available on them in the PubChem and ChEMBL databases (October 2022).
Figure 5 describes some examples of candidates and reference ligands in the binding pocket, which, despite their low 2D pairwise similarity, show good overlap and similar interactions with the key residues.In particular, Olaparib and Row514 (similarity 0.27 using Morgan 1024-bit fingerprints (Radius 2) and the Tanimoto metric) both have a bicyclic scaffold, but they differ significantly in terms of functionalities and connections.Niraparib and Row847 (similarity 0.22) have different functionalization around the central core but assume similar orientations.Rucaparib and Row312 (similarity 0.40) are also diverse since Rucaparib has a three-ring motif.A similar outcome is shown for PJ34 and Row86 (similarity 0.24) since PJ34 also has three fused rings, whereas Row86 was designed with a two-ring scaffold connected to an additional aromatic ring, which was found later in the patent literature [49].These results show that, in both the Rucaparib and PJ34 cases, RE-NATE performed scaffold hopping leading to the design of compounds with potential affinity towards PARP1 (i.e., by incorporating motifs present in annotated compounds into newly designed structures).Note that the data used to train the PARP1 activity model did not contain any compounds with the motifs proposed for Row847.This result suggests that RENATE can be used to propose novel scaffolds.
The inspection of the results from the design also highlighted an important limitation of RENATE.The products generated from Talazoparib contain shuffled key fragments compared to those in its reference ligand.This is due to the heuristics applied by the algorithm and the use of fingerprint-based scoring, which The number of flat and enumerated compounds retained from each design experiment.Note that the Rucaparib and Niraparib experiments were run using two different configurations to produce more candidates.The parameters used in each experiment are reported in the Supporting Information.sometimes cannot match the global shape and features of candidates with their references.Most of these molecules were filtered out by the scoring functions but some of them can still describe valid interactions by chance.These compounds might still be of interest, but they were not produced using a rational approach.A high number of these products was found for Talazoparib due to its connections and complexity, which as previously discussed, limited the number of structures generated by the algorithm, hence reducing the chance of finding better solutions.Examples of valid and invalid candidates from Talazoparib are reported in Figure 6.Talazoparib can be seen as a three-fragment molecule (BÀ AÀ C): The main interacting scaffold (A) plus two substituents (B and C, five-and six-membered rings, respectively), which are directly connected to the scaffold.Although Row2 and Row606 are both predicted to be active by the QSAR model, they are considered valid and invalid candidates, respectively, since the first has a configuration identical to the query (BÀ AÀ C), while the second has a different configuration (AÀ BÀ C).

Compound synthesis
The synthetic routes proposed by RENATE for the selected candidates were adjusted according to three factors: reagent availability (e. g. cost of building blocks), additional steps (e. g. protection chemistry), and reaction conditions (e. g. yields, catalysts, solvents) leading to the selection of six candidates which are reported in Overlap between candidates (e. g., Row514) (green) and reference ligands (e. g., Olaparib) (purple).The residue Tyr907 is hidden to ease the view of the poses.Hydrogen-bond interactions between the protein and ligands are displayed in yellow.In particular, the interactions of the ligands with the key residues Ser904 and Gly863 are displayed.
adjusted routes are reported in the Supporting Information (Section S6), where the additional steps are highlighted in dashed squares.The outcomes of the syntheses are reported in Figure 7. Compound purities, and numbers of proposed and actual synthetic steps are reported in Table 2. Row86 and Row745(2) (PJ34 candidates) were obtained via organolithium conditions rather than a Grignard generation since the latter was considered less robust; hence, these molecules were obtained through procedures very similar to their original routes.Note that Row745(2) and Row86 were produced as racemates of diastereomers or enantiomers.Other routes where minor adjustments were made are those of Row26 and Row514 (Olaparib candidates), where some protection chemistry was introduced since, as we discussed in our previous publications, the reaction vector approach does not account for the presence of reactive groups outside the reaction centre [20,50].
More significant modifications were introduced in the synthesis of Row847 (Niraparib candidate), which was redesigned using a precursor of the building block proposed by the algorithm.As a consequence of the use of a precursor, the synthesis also required further functionalisation (i.e., extra steps) to obtain the final compound.A similar process is described for Row528 (Rucaparib candidate) with the exception that the precursor also required a different reaction to form a CÀ C bond between the two aryl rings.

PARP1 activity assay
The synthesised compounds (Row26, Row86, Row514, Row528, Row745(2), Row847) were assayed along with two intermediates (Row528 (I) and Row86 (I)) for their inhibitory activity against PARP1 (Table 3).Among the assayed compounds, the PJ34 candidates (Row86 (I), Row86, and Row745(2)) reported potencies in the submicromolar range, with the intermediate Row86 (I) emerging as the most potent inhibitor with IC 50 = 395 nM.These molecules share the same phthalazinone T A B L E 2 Results from the synthesis of the selected candidates, where purities are reported along with the numbers of proposed and actual synthetic steps needed to obtain the compounds.(a) Intermediate compound.(b) (1S,2S) isomer of Row745, which was selected over the other stereo-enumerated compounds from the docking.(c) These compounds were selected from the docking with a specific stereo configuration, but their synthesis produced mixtures, hence their structures are represented as flattened.

PAMPA permeability
Based on the results from the PARP1 activity assay, two compounds were preliminary evaluated for their BBB permeability using PAMPA.In particular, we selected Row514 (derived by Olaparib) and one of the phthalazinone derivatives, Row745(2) (derived by PJ34).Olaparib was also included as reference compound.Results are reported in Table 4, which show that Row745(2) has good BBB permeability with a diffusion of 2.1×10 À 6 cm s À 1 .A lower permeability is shown for Row514; however, this is approximately 20 times greater than that of its reference Olaparib.This demonstrates that the design strategy was successful in identifying a micromolar compound with improved brain penetration in a single design cycle.

CONCLUSIONS AND FUTURE OUTLOOK
We have described the application of reaction vectors to the design and synthesis of novel and synthetically accessible PARP1 inhibitors with improved BBB penetration, compared to the reference compounds on which the designs are based.Our approach involved the use of data from known PARP1 inhibitors and their crystal structures, in combination with docking, machine learning, and our reaction vector-based tool RENATE.We were able to use RENATE to design compounds that were predicted to meet the multi-objectives of the study and which have high structural diversity to their reference ligands.The software also proposed viable synthetic routes that allowed the preparation of selected compounds which were biologically assessed against PARP1 with resulting activities in the order of micromolar concentration (IC 50 values ranging from 0.4 to 19 μM).Most of the compounds share the benzamide moiety that typically characterises the inhibitors of PARP1; however, compound Row847 emerged as an innovative hit among these.Although the indole scaffold imparted a weak activity to the compound (IC 50 = 16 μM), its novelty provides promising insights for the development of new series of PARP1 inhibitors.We also experimentally validated the brain penetration of two compounds that showed binding with PARP1.The results obtained from the permeability measurements showed that RENATE was able to account for the optimisation of the brain penetration of compounds within just one design cycle, hence promoting its suitability for generating valid alternatives to known compounds.We conclude by suggesting that this work constitutes the first example in the literature of a de novo design method where, as well as successfully designing novel hit compounds with desired pharmaceutical properties, our software also provided multi-step synthetic routes which led to the preparation of the compounds in the laboratory.

Computational methods
A summary of the theory and implementation of reaction vectors for de novo design is reported at the end of this section along with references that provide detailed explanations.
The scoring module implemented for the PARP1 design is described in Figure 8, and is divided into five active components applied sequentially at each step of the design, and three passive components applied at the end of the design.The active components consist of a similarity search to retrieve building blocks similar to the fragments extracted from the reference ligands, and four machine learning models to score the structures generated by RENATE.The models consist of a PARP1 activity regression model, and Pgp substrate, BCRP substrate, and BBB penetration classification models.The passive components consist of a reactive group conversion unit, substructure and property filters, and a docking model.Each component and the data used to run the experiment are discussed in the following sections.

and reaction data and building block scoring
The 746,245 reagent set from Enamine and the 92,530 USPD reaction vector database described in Ghiandoni et al. [50] were selected as sources of reagent and reaction data for RENATE, respectively.Count Feat-Morgan fingerprints (Radius 2) 1024-bit and Euclidean distance were selected for the scoring of building blocks.FeatMorgan fingerprints are described in the Supporting Information (Section S1).The use of pharmacophore fingerprints aims to maximise the chance of retrieving isosteric replacements of the query fragments.
Machine learning models A PARP1 activity data set of 2,371 entries was obtained from ChEMBL 24 in January 2019 [51].The Pgp data was obtained from a collection of annotated substrates/ non-substrates of Pgp from the literature [52].The BCRP data was also obtained from a set of substrates/non-substrates of BCRP from the literature [53].The BBB data was obtained from AdmetSAR [54].Only entries associated with defined units and activities were retained.Activities were converted into micromolar pIC 50 values.Molecules were sanitised, salts and ions were stripped, and canonical SMILES were generated using RDKit [55].SMILES associated with multiple activities were grouped and values were averaged.
The standardised PARP1 (1363 actives, 501 inactives based on an activity threshold of 1 μM), Pgp (243 substrates, 241 non-substrates), BCRP (164 substrates, 99 non-substrates), and BBB (1,437 BBB + , 401 BBBÀ ) data sets were described using a selection of fingerprints and descriptors using RDKit, which were used to train a series of Random Forest models [55,56].The models were evaluated by performing an internal validation using 80 % train and 20 % test data.The PARP1 models were evaluated on their R 2 score, mean absolute error (MAE), and mean squared error (MSE).The Pgp, BCRP, and BBB models were evaluated on Recall, Precision, F1score, and Matthews correlation coefficient (MCC) metrics weighted by class sample size calculated using Scikit-learn [56].The validation was repeated 15 times per model using random train-test splits.The best performing models were optimised on their hyper-parameters via 5-fold cross-validation and a genetic algorithm to yield the models used in the design workflow.In F I G U R E 8 RENATE scoring module for the PARP1 inhibitors design.The active components drive the algorithm at each step of the design, while the passive components are applied at the end to refine the selection of the most promising candidates.
addition, the Pgp and BCRP models were optimised on their descriptors via feature elimination.The descriptors and performance metrics of the optimised models are reported in Table 5.The other models are described in the Supporting Information (Section S1).

Docking model
A PARP1 docking model was validated by cross-docking the reference ligands to the crystal structure of Niraparib (PDB ID: 4R6E) (which is the one with the highest resolution) using GOLD [57].Waters were extracted, and PLP and GoldScore functions were selected for pose and interaction scoring.The software parameters are reported in the Supporting Information (Section S2).Prior to the validation, the reference ligands were prepared by sanitising them using RDKit and by calculating their protonation states at pH 7.4 using MOE [58].In addition, their stereocentres were enumerated and conformations were minimised using the MMFF94 method.
The model was validated by docking the reference ligands plus their virtually generated stereoisomers: Each ligand generated 10 poses, which were compared with those from the co-crystals.The superimpositions between computed and co-crystal poses of Niraparib are described in Figure 9, while those of the other ligands are in Figure 10.Mean T A B L E 5 The optimised models for PARP1 activity regression and Pgp substrate, BCRP substrate, and BBB penetration classification.The models are described on their molecular descriptors and performance metrics.The implementations of the selected molecular descriptors are reported in the Supporting Information (Section S1).PLP.Fitness and GoldScore.Fitness and the number of consistent poses (i.e., correct overlap with the cocrystal) are reported in the Supporting Information (Section S2).Note that the virtually generated stereoisomers, which represented structures almost identical to those of the actual inhibitors, always produced lower PLP.Fitness scores.This result suggests that the parametrisation selected for the docking can discriminate effectively also on the stereochemistry of compounds.

Reactive group conversion and additional filters
The reactive group conversion consisted of a SMARTS filter to detect reactive compounds followed by a reaction vector structure generation step consisting of functional transformations only in order to convert the reactive groups into non-reactive functional groups.After the conversion, compounds were reprocessed through the filter and those still containing reactive patterns were discarded.The filter was implemented using the definitions proposed by Hann and colleagues and RDKit [59].Note that compounds modified by the reactive group conversion were also rescored by the models.
The substructure and property filters were configured to remove compounds matching any pattern from a selection of SMARTS and those violating more than one Lipinski rule.The SMARTS were implemented using RDKit from the definitions in Brenk et al. [60], Doveston et al. [61], Baell and Holloway [62], and ZINC (http:// blaster.docking.org/filtering/).

Reaction vectors and their application in de novo design
A summary of reaction vectors and the associated structure generation algorithm, which enables them to be applied in de novo design, is given below.More details are provided in Patel et al. [18], Hristozov et al. [19], Gillet et al. [63], and several works by Ghiandoni et al. [20, 50 64].
Reaction vectors encode the structural changes that occur in a chemical reaction as difference vectors, as shown in Equation (1).The reaction components are described using atom pair descriptors (AP) which describe two atoms and their properties, along with a separator that indicates the length of the atom path between the two atoms.Two types of AP descriptors are used in the reaction vector, namely AP2s which describe neighbouring atoms with the separator encoding information about the bond type, and AP3s which describe a pair of atoms separated by two bonds.An AP2 is represented as shown in Equation (2).

Reaction Vector
Equation ( 1): Generic definition of a reaction vector.
Equation ( 2): AP2 descriptor.In AP2s, X 1 and X 2 are the atom types; h 1 and h 2 are the numbers of non-hydrogen bonds incident on each atom; p 1 and p 2 are the numbers of p electrons shared by the respective atom; r 1 and r 2 are the numbers of rings each atom is part of; S is the separator; BO is the connection bond order which can be 1 (single), 2 (double), 3 (triple) or 4 (aromatic).AP3s describe the two atoms at the start and end of the path only, i. e., there is no bond information.The atom pair vectors are counts indicating the number of times each atom pair occurs in a reaction component.Reaction vectors are generated by first cleaning a reaction to ensure it is balanced, i. e., it contains the same number of heavy atoms on each side of the reaction.Then AP2 and AP3 descriptors are calculated for each component and are summed for the reactants and the products, respectively.Finally, the reactant descriptors are subtracted from the product descriptors so that the reaction vector encodes the atom pairs that are changed in the reaction.The reaction vector consists of atom pairs with negative counts, which indicate atom pairs lost during the reaction, and positive atom pairs which indicate atom pairs gained during the reaction.
The structure generation process consists of applying a reaction vector to a new reactant(s).The first step is to test for validity.A reaction vector is considered valid if the reactant contains (either wholly or partially) the negative atom pairs encoded in the reaction vector.If the match is partial, then it is necessary to identify a reagent that contains the missing negative atom pairs.The negative atom pairs are then used to fragment the reactants and the products are assembled by adding in atoms according to the positive atom pairs.In the first implementation of the structure generation algorithm, described in Patel et al. [19] the structure generation proceeded atom-by-atom in a breadth first search with back-tracking.Although this was an effective approach to generate novel molecules, it was slow in execution.A considerably faster implementation has since been developed in which fragmentation and recombination paths (or fragments) are stored with the reaction vector, as described in Ghiandoni et al. [64].

Chemistry methods
All reactions were carried out using anhydrous organic solvents under a nitrogen atmosphere at room temperature unless otherwise stated.All solvents, reagents and catalysts were obtained from commercial sources and were used without further purification unless otherwise stated.All reactions were carried out using oven-dried glassware.All microwave reactions were carried out in a Biotage ® Initiator + using a maximum power of 400 W. Reactions were monitored using TLC and/or LCMS.TLC was performed using glass pre-coated silica gel plates and visualized using either ultraviolet light (254 nm) or by dipping in potassium permanganate or phosphomolybdic acid solution and heating.Flash column chromatography was performed using a Biotage ® Isolera 4 using pre-packed Biotage ® SNAP KP-Sil cartridges (40-63 μm) unless otherwise noted.Preparative HPLC was performed in reverse phase using a Waters XBridgeTM C18 column (30 mm×100 mm, 5 μm) at room temperature using an injection volume of 1500 μL at a flow rate of 40 mL min À 1 at 10 % B for 2.00 min then a gradient of 10-95 % B over 14.00 min and held for 2.00 min, where A = 0.2 % ammonium hydroxide in water and B = 0.2 % ammonium hydroxide in acetonitrile.1H NMR spectra were recorded in chloroform-d, DMSO-d6 or methanol-d4 at 400 or 500 MHz.Chemical shifts are reported in ppm with reference to the residual solvent peak.Multiplicities are reported with coupling constants (J) in hertz (Hz) and are given to the nearest 0.1 Hz.The peak information is described as: s = singlet, d = doublet, t = triplet, q = quartet, m = multiplet, br = broad.Analytical HPLC and LCMS analyses were performed using seven methods: Method A (Kinetex Core shell C18 column

PARP1 enzymatic assay
PARP1 was expressed and purified as described in the literature [65].Briefly, full-length human PARP1 was produced in E. coli Rosetta2 (DE3).The cells were harvested by centrifugation, and re-suspended in lysis buffer containing 50 mM HEPES pH 8.0, 500 mM NaCl, 10 % glycerol, 0.5 mM TCEP and 10 mM imidazole.The sample was lysed using sonication in the presence of DNAse1 and 3-aminobenzamide, centrifuged at 18,000 rpm for 1 h and the supernatant was filtered.The clarified sample was loaded to a HiTrap TM IMAC column (Cytiva), washed with lysis buffer, then with a similar buffer containing 1 M NaCl, and finally with lysis buffer containing 25 mM imidazole before eluting the protein with 500 mM imidazole.The eluted sample was then diluted to reduce NaCl concentration to 250 mM, and loaded to a HiTrap Heparin column (Cytiva).The column was washed with 50 mM HEPES, 250 mM NaCl, 10 % glycerol, 1 mM EDTA, 0.1 mM TCEP, pH 7.5.and the sample was eluted using a gradient with a similar buffer containing 1 M NaCl.Fractions were analyzed by SDS-PAGE and those containing protein were pooled and concentrated before aliquoting and stored at À 70 °C.
IC 50 was determined using homogeneous NAD + consumption assay [65,66].5 nM PARP1 was incubated with each compound in log half dilution series for 30 min in 50 mM Tris pH 8.0, 5 mM MgCl2, 0.2 mg mL À 1 BSA, 10 μg mL À 1 activated DNA and 500 nM NAD + .The reactions were performed in quadruplicates and resulting IC 50 curves were fitted to a sigmoidal dose-response curve (four variables) using GraphPad Prism version 8.02.Each experiment was repeated three times from which pIC 50 � SEM was calculated.Compounds with IC 50 above 1 μM were measured only once.

PAMPA methods
Each donor solution was prepared by diluting a solution of the corresponding compound (DMSO, 1 mM) with phosphate buffer (pH 7.4, 0.025 M) up to a final concentration of 500 μM.Filters were coated with 10 μL of 1 % dodecane solution of phosphatidylcholine or 5 μL of brain polar lipid solution (20 mg mL À 1 16 % CHCl 3 , 84 % dodecane) prepared from CHCl 3 solution 10 % w/v, for intestinal permeability and BBB permeability, respectively.The donor solution (150 μL) was then added to each well of the filter plate.300 μL of the solution (50 % DMSO in phosphate buffer) were added to each well of the acceptor plate.The sandwich plate was assembled and incubated for 5 h at room temperature.After the incubation time, plates were separated, samples were taken from both the donor and acceptor wells, and the compound concentration was measured by UV/LC-MS.All compounds were tested in three independent experiments.Apparent permeability (P app ) and membrane retention were calculated as described in Equation ( 3) and ( 4).

F I G U R E 3
PARP1 3D (left) and 2D (right) key residue interactions with Niraparib.Yellow and black dashed lines indicate hydrogen bonds in 3D and 2D representations, respectively.Green solid lines show hydrophobic interactions and green dashed lines show π-π stacking interactions in the 2D diagram.The 2D diagram was obtained from the PDB.

F I G U R E 6
Examples of valid and invalid candidates designed by RENATE using Talazoparib as a reference (A, B, and C fragments are coloured in black, blue, and red, respectively).F I G U R E 7 Synthesis summary.The diagram describes the names of the reference ligands (e. g.Olaparib) on the left, which are connected to their candidate structures (e. g.Row26).Candidates are associated with descriptions on the right side, which refer to additional/alternative chemistry (e. g. protection chemistry).

F I G U R E 1 0
Overlap between docked (green) and experimental (purple) poses of Olaparib, Rucaparib, Talazoparib, and PJ34.The ligands are shown within the binding pocket whilst interacting with the key residues Ser904, Gly863, and Tyr907.Olaparib produced the largest pose variance although its portion binding with the key residues still produced good overlap with the experimental pose.

Compound IC 50 (pIC 50 � SEM)
T A B L E 3 PARP1 inhibitory activities and standard deviations.Activities were averaged on three replicas for every compound with potent inhibition.Compounds with IC50 above 1 μM were measured once.