An Automated, Open-Source Workflow for the Generation of (3D) Fragment Libraries

The recent success of fragment-based drug discovery (FBDD) is inextricably linked to adequate library design. To guide the design of our fragment libraries, we have constructed an automated workflow in the open-source KNIME software. The workflow considers chemical diversity and novelty of the fragments, and can also take into account the three-dimensional (3D) character. This design tool can be used to create large and diverse libraries but also to select a small number of representative compounds as a focused set of unique screening compounds to enrich existing fragment libraries. To illustrate the procedures, the design and synthesis of a 10-membered focused library is reported based on the cyclopropane scaffold, which is underrepresented in our existing fragment screening library. Analysis of the focused compound set indicates significant shape diversity and a favorable overall physicochemical profile. By virtue of its modular setup, the workflow can be readily adjusted to design libraries that focus on properties other than 3D shape.

RDKit (4.0.1.v202006261025), Erlwood (v4.0.0) and Vernalis (1.28.2.v202101281353) KNIME nodes were used. Additional features were added via Python scripts running Python v3. 6.12 in Anaconda v2020.11, with OpenBabel v2.8.1 and RDKit v2020.03.06. For specific details about the workflow, see the KNIME. The workflow can be downloaded from: https://hub.knime.com/tomdekker/spaces/ (3D)%20Fragment%20Library%20Design%20Workflow/latest. Figure S1. Exemplary depiction/comparison of the clustering method. (A) The virtual library from which 6a-f were selected was clustered into four clusters using k-means with merely the fingerprints (by reducing the Tanimoto distance matrix into 20 MDS dimensions) as input, and without input from PCA (i.e., chemical descriptors). Colors represent the different clusters. The compounds were plotted using (a separate) MDS in only two dimensions. A significant degree of the information captured in the fingerprints is captured in two dimensions with respect to what is captured in 20 dimensions. (B) Data and clustering identical to panel A, but plotted using MDS in three dimensions. Clustering becomes more apparent. (C) Data and clustering identical to panel A-B, but here the fingerprints are reduced using t-SNE. Contrary to MDS, t-SNE is non-linear, with inter-cluster distance being less relevant and hence clusters becoming more apparent in only few dimensions. When panel A-B and C are compared, it becomes apparent that both methods capture a similar degree of information. (D) Data identical to panel A-C, but the compounds were clustered using the combined MDS/PCA approach (i.e., chemical descriptors were included). A single cluster that was identified by the approaches A-C, is now split into two separate clusters (encircled data points), due to their different physicochemical properties captured in the PCA components. For clarity, the size of the data points corresponds to size of the first PCA component, from which it becomes apparent that the red cluster generally possesses lower values for the first component. This component has significant covariance (coefficient of 0.4-0.5) with sLogP, MW, and HAC. The encircled compounds comprising these two clusters are depicted in Panel (E) and (F).

Settings used in workflow
We designed our library to consist of a first subset with various amide substituents on the "northern" exit vector and a second subset with various ether substituents on the "southern" exit vector (Figure 2A). The first subset was based on amidation of acid 3 and was designed to include three amide compounds per diastereomer type. The second subset involved substitution on bromide 5 by alcohols and comprised two ether compounds per diastereomer type. The corresponding synthetic transformations and building blocks were imported into the workflow, along with our in-house reagent database and a database of commercially available reagents. The latter was filtered to only include relatively inexpensive reagents (≤200 €/g) in order to limit complexity and expenses. Following generation of the virtual combinatorial libraries, filtering was applied on the maximum number of rotatable bonds (nRot; 4), molecular weight (MW; 280 Da), hydrogen bond acceptors (HBA; 4) and donors (HBD; 4), polar surface area (TPSA; 70 Å 2 ), cLogP (3) and the number of aromatic rings (1). Furthermore, a maximum of one or zero additional specified stereocenters was allowed in the first and second subset, respectively, allowing for the use of enantiopure reagents in the first subset only and thereby balancing complexity between subsets. Some of the physicochemical properties (i.e., nRot and TPSA) were deliberately allowed to exceed the Ro3 limits, as there has been debate whether these Ro3 limits are too strictly formulated. 1,2 Using an expanded Ro3 would thereby allow for a broader and more diverse set of fragments. Next, compounds were removed based on occurrence in patents, articles and commercial libraries. To bias for 3D character, diverse conformations (RMSD >0.1) were generated up to 5 kcal·mol −1 above the global minimum, and compounds with an average ΣNPR lower than 1.07 (the 3D cutoff proposed by Firth et al. 3 ) were removed. Clustering was performed per building block/exit vector combination, i.e., 6a-c, 6d-f, 7a,b and 7c,d result from four individual clusterings. Morgan fingerprints (1024 bits, radius = 2) and the outlined molecular descriptors (see main text) were calculated, and subsequently included in MDS and PCA, respectively, where MDS and PCA received equal weighing. All molecular descriptors received equal weight in the PCA. MDS dimensions and PCA components were limited to the number of dimensions that ensured a stress value of <0.05 and an explained variance of >90%, respectively, and used as input for the k-means clustering algorithm. From each cluster, the highest-ranked compound was selected for synthesis, with priority given to inhouse reagents up to the 5 th ranking compound. See Figure S2 for more details.

Secondary scoring function
The secondary scoring was defined as: |(1.75*NumHBD+1.25*NumHBA+1*NumAromaticRings+1*has_halogen)/(NumHeavyAt oms)-(SS_balance_value)| The balancing value (SS_balance_value, default value=0.375) can be changed to include more (higher value) or less (lower value) "interaction". Its weighing with respect to the cluster score can also be configured.  Table with the output of the different elements/steps as numbered in panel A. (C-F) Clustering and compound selection as referred to in column #9 in panel B. Clustering was performed as described above (i.e., using the combined MDS/PCA/k-means approach) and visualized using two-dimensional t-SNE reduction of the fingerprints. Colors represent the different clusters; compounds selected for synthesis (see main text) are encircled with black.

Experimental methods Nephelometry
Nephelometry was performed using BMG LABTECH NEPHELOstar Plus equipment. Kaolin was used as the internal standard and compounds were added to HBSS buffer as DMSO stock solution to a final concentration of 1% DMSO and a total volume of 200 µL. Precipitation or aggregation was considered significant when average values exceeded three times the standard deviation of the blanks. Blank values were omitted if they exceeded three times the standard deviation of the 24 blanks that were measured on each 96-well plate. All compounds were tested in triplo and wells of suspected outliers were visually inspected before omitting any outliers. Data was processed in Excel v16.16.27 for MacOS and graphs were made in RStudio 2022.02.2 (Build 485) running the ggplot2 package.

Synthetic methods
General All reagents have been purchased from commercial suppliers (primarily being Sigma-Aldrich and Combi-Blocks) and used without further purification. THF and DMF were dried by passing through an activated alumina column prior to use. All other solvents used were used as received unless otherwise stated. Hygroscopic reagents (18-crown-6 ether, dimethylamine hydrochloride) were dried by co-evaporation with MeCN prior to use. TLC analyses were performed using Screening Devices or Merck F254 aluminum-backed silica plates and visualized with 254 nm UV light or staining with KMnO4. LC-MS analysis was carried out on a Shimadzu LC-20AD liquid chromatograph pump system with a Shimadzu M20A photodiode array detector, a Shimadzu LCMS2010EV mass spectrometer and Xbridge C18 column (5 µm, 4.6 × 50 mm) at 40 °C using ESI in positive ion mode. For acidic runs, 0.1% HCOOH in H2O and 0.1% HCOOH in MeCN were used as eluent A and B, respectively. For basic runs, 0.4% w/v NH4HCO3 in H2O and MeCN and were used as eluent A and B, respectively. The gradient for acidic and basic runs was 5:90:90:5:5% B at t = 0:4.5:6:6.5:8 min. The purity of a compound was determined by calculating the peak area percentage of UV detection at 200 nm. The purity of compounds bearing aromatic groups was also assessed at 230 nm and 254 nm, and such values were reported if the purity was lower at these wavelengths than at 200 nm; Unless stated otherwise, the reported purity was measured at 200 nm. HRMS spectra were determined with a Bruker micrOTOF mass spectrometer using ESI in positive ion mode. Reverse phase column chromatography was performed on Teledyne ISCO CombiFlash Rf 200 equipment with the same solvent systems used for LC-MS measurements. Normal phase flash chromatography was performed on Biotage Isolera or BUCHI Pure C-815 equipment. Preparative HPLC was performed on BUCHI PrepChrom C-700 purification system equipped with a XBridge Prep C18 (5 µm, 19 × 100 mm) column. Pre-packed columns were purchased from Screening Devices (C18 and UltraPure irregular silica) or BUCHI (FlashPure EcoFlex irregular silica). Microwave reactions were carried out using a Biotage Initiator. IUPAC names were generated with ChemDraw Professional 21.0 (PerkinElmer). Nuclear magnetic resonance (NMR) spectra were determined with a Bruker Avance II 500 MHz or a Bruker Avance III HD 600 MHz spectrometer. Chemical shifts are reported in parts per million (ppm) against the reference compound using the signal of the residual non-deuterated solvent (CDCl3 δ = 7.26 ppm ( 1 H), δ = 77.16 ppm ( 13 C); DMSO-d6 δ = 2.50 ppm ( 1 H), δ = 39.52 ppm ( 13 C); CD3OD δ = 3.31 ppm ( 1 H), δ = 49.00 ppm ( 13 C)). NMR spectra were processed using MestReNova 14.0 software. The peak multiplicities are defined as follows: s, singlet; d, doublet; t, triplet; q, quartet; dd, doublet of doublets; ddd, doublet of doublets of doublets; dt, doublet of triplets; dq, doublet of quartets; td, triplet of doublets; tt, triplet of triplets; qd, quartet of doublets; p, pentet; dp, doublet of pentets; br, broad signal; m, multiplet. For NMR listings, in addition to specific instructions that are given by the journal in the guidelines for authors the following additional procedures were used: 1) Multiplicity is not solely reported based on peak shapes, but also distinguishes the coupling to all non-equivalent protons that have similar J values; 2) If additional smaller couplings are observed but are too small for accurate quantitation because the precision is smaller than the digital resolution, a symbol D will be used; 3) The notation 'm' is used in case of obscured accurate interpretation as a result of (i) overlapping signals for different protons, or (ii) a result of overlapping signal lines within the same proton signal; 4) For any rotamers or diastereomers, signals will be listed separately if resolved; 5) NMR signals that could only be detected with HSQC analysis are denoted with a # symbol; 6) NMR signals that could only be detected with HMBC analysis are denoted with a * symbol; 7) If one or more signals remain undetected after extensive 1D and 2D NMR analyses, this will be mentioned. 8) Signals for exchangeable proton atoms (such as NH and OH groups) are only listed if clearly visible (e.g., excluding the use of D2O or CD3OD) and if confirmed by a D2O shake and/or HSQC.

SAFETY STATEMENT
Bromoethene is volatile (b.p. 16 °C) and a possible carcinogen. The appropriate safety measures were used as described in the experimental procedure. No further unexpected or unusually high safety hazards were encountered.
General procedure A -Ester hydrolysis Aq. KOH (5.0 M, 2.0 eq) was added to a solution of the ester (1.0 eq) in EtOH (0.2 M). The solution was stirred at 80 °C for 1 h in a capped microwave vial. The reaction mixture was diluted with water, transferred to a separatory funnel, neutralized with 1.0 M HCl and extracted thrice with EtOAc. The combined organic phases were dried over Na2SO4, filtered and concentrated in vacuo to give the desired acid that was of sufficient purity for further reactions.
General procedure B -Amide coupling HATU (1.2 eq.) and DIPEA (3.0 eq) were added to a solution of a carboxylic acid (1.0 eq) in dry DMF (0.25 M). The reaction vial was flushed with nitrogen gas and the mixture was stirred at rt for 1 h. The amine (1.2 eq) was added. The reaction mixture was stirred overnight at rt and partitioned between sat. aq. Na2CO3 (25 mL) and EtOAc (25 mL). The phases were separated, and the aqueous phase was extracted with EtOAc (2 × 25 mL). The combined organic phases were washed with brine, dried over Na2SO4, filtered and concentrated under reduced pressure. Subsequent purification by column chromatography or preparative HPLC provided the desired amide.
General procedure C -Boc deprotection The Boc-protected amine was dissolved in MeOH (1.0 mL). The mixture was diluted with water to 0.03 M and heated at 100 °C for 8 h under microwave irradiation. The reaction mixture was concentrated under reduced pressure. The residue was subjected to reverse phase chromatography (basic mode, 5-50% B). Lyophilization of the relevant fractions provided the desired amine.
General procedure D -Substitution This procedure was adapted from Banning et al. 4 To a vial charged with powdered KOH (3.0 eq), 18-crown-6 (0.10 eq) and the alcohol (2.0 eq), was added a solution of bromocyclopropane 5 (1.0 eq) in THF (0.2 M). The mixture was stirred at rt for 2 h, after which the reaction mixture was partitioned between EtOAc and sat. aq. NaHCO3. The phases were separated, and the aqueous layer was extracted twice with EtOAc. The combined organic layers were dried over Na2SO4, filtered and concentrated in vacuo. Column chromatography yielded the separated trans-and cis-isomers.

rac-Ethyl (1R,2R)-2-ethoxycyclopropane-1-carboxylate (2a) and rac-ethyl (1R,2S)-2-ethoxycyclopropane-1-carboxylate (2b)
Ethyl vinyl ether (2.40 mL, 25.6 mmol) was dissolved in Et2O (9 mL) and Rh2(OAc)4 (12 mg, 28 µmol, 0.2 mol%) was added. The mixture was stirred at rt while a solution of ethyl diazoacetate (1.61 g, 14.1 mmol) in PhMe (12 mL) was added with a syringe pump at a flow rate of 5 mL/h. The solution was stirred for an additional 1 h at rt and subsequently concentrated in vacuo. The residue was subjected to normal phase column chromatography (0-15% EtOAc in cHex) to yield the trans-diastereomer 2a as a colorless oil (900 mg, 40%) and the crude cis-diastereomer that was purified further with Kugelrohr distillation to yield the cis-diastereomer 2b as a colorless oil (700 mg, 31%). Relative stereochemistry was assigned based on the Jsum of the CH-proton(s) on the cyclopropane ring, which is larger for the cis isomer in comparison to the trans isomer.

Bromoethene (S1)
Safety warning: Bromoethene is volatile (b.p. 16 °C) and a possible carcinogen. It was kept in solution and handled in a fume hood at all times. This procedure was adapted from Huang et al. 6 A 250 mL reaction flask was charged with powdered KOH (12.7 g, 226 mmol) and EtOH (90 mL). The flask was equipped with a S8 Vigreux condenser connected to a 100 mL collecting flask. The collecting flask was charged with DCE (3.0 mL) and cooled with liquid N2. The reaction mixture was stirred at 45 °C and BrCH2CH2Br (15.0 mL, 174 mmol) was added dropwise over the course of 1 h. After an additional 2 h at 65 °C, 8.1 g of vinyl bromide (76 mmol, 44%) in DCE was obtained as calculated by 1 H NMR peak integration relative to DCE. This solution was used directly in the next step. 1 H NMR (500 MHz, CDCl3) δ 6.45 (dd, J = 15.0, 7.1 Hz, 1H), 5.99 (dd, J = 7.1, 1.9 Hz, 1H), 5.86 (dd, J = 15.0, 1.9 Hz, 1H)

Ethyl 2-bromocyclopropane-1-carboxylate (4)
The catalyst Rh2(OAc)4 (59 mg, 13 µmol, 0.25 mol %) was added to a flask equipped with a dry ice condenser and charged with bromoethane S1 (8.1 g, 76 mmol) in DCE (3.0 mL) (Safety warning: see experimental procedure above). The mixture was cooled to −10 °C using an ethylene glycol/dry ice bath. A solution of ethyl diazoacetate (6.0 g, 53 mmol) in PhMe (45 mL) was added with a syringe pump at a flow rate of 1 mL/h. After 20 h, the flow rate was adjusted to 0.5 mL/h. After complete addition, the reaction mixture was filtered over diatomaceous earth and concentrated in vacuo (Safety warning: see experimental procedure above) to yield a dark yellow oil (6.3 g) that was used in the next step without purification.

2-Bromo-N,N-dimethylcyclopropane-1-carboxamide (5)
This procedure was adapted from Prosser et al. 7 A solution of the crude ester 4 (1.5 g) in aq. 1.0 M NaOH (12 mL) was heated at reflux for 30 min. The solution was cooled and partioned between Et2O (50 mL) and 1.0 M NaOH (40 mL). The aqueous layer was neutralised with 3.0 M HCl and extracted with Et2O (3 × 50 mL). The combined organic layers were dried over Na2SO4, filtered and concentrated in vacuo to give a dark yellow oil (900 mg). To 800 mg of this oil was added SOCl2 (0.80 mL, 11 mmol), and the resulting mixture was stirred at rt overnight. Subsequently, the suspension was transferred to a stirring suspension of Me2NH·HCl (0.99 g, 12 mmol) and DIPEA (4.2 mL, 24 mmol) in THF (16 mL) at rt. The mixture was allowed to stir at rt for 2.5 h, after which it was partitioned between EtOAc (50 mL) and 0.5 M HCl (50 mL). The layers were separated and the aqueous layer was extracted with EtOAc (2 × 50 mL). The combined organic layers were dried over Na2SO4, filtered and concentrated in vacuo. Kugelrohr distillation of the residue (8 mbar, 190 °C) yielded the title compound as a yellow oil as a mixture of diastereomers in an approximate cis/trans ratio of 1:1.6 (300 mg combined, 1.56 mmol, 12% extrapolated yield from 1). trans-Isomer: 1