OFraMP: a fragment-based tool to facilitate the parametrization of large molecules

An Online tool for Fragment-based Molecule Parametrization (OFraMP) is described. OFraMP is a web application for assigning atomic interaction parameters to large molecules by matching sub-fragments within the target molecule to equivalent sub-fragments within the Automated Topology Builder (ATB, atb.uq.edu.au) database. OFraMP identifies and compares alternative molecular fragments from the ATB database, which contains over 890,000 pre-parameterized molecules, using a novel hierarchical matching procedure. Atoms are considered within the context of an extended local environment (buffer region) with the degree of similarity between an atom in the target molecule and that in the proposed match controlled by varying the size of the buffer region. Adjacent matching atoms are combined into progressively larger matched sub-structures. The user then selects the most appropriate match. OFraMP also allows users to manually alter interaction parameters and automates the submission of missing substructures to the ATB in order to generate parameters for atoms in environments not represented in the existing database. The utility of OFraMP is illustrated using the anti-cancer agent paclitaxel and a dendrimer used in organic semiconductor devices. Graphical abstract OFraMP applied to paclitaxel (ATB ID 35922). Supplementary Information The online version contains supplementary material available at 10.1007/s10822-023-00511-7.

. The distribution of atom coverage (blue) produced by OFraMP in combination with the 850,000 molecules currently in the ATB for 1250 query molecules with 49 to 51 atoms selected at random from the ChEMBL database. The cumulative total (red) shows the proportion of molecules with a particular coverage (%) or less.
Identification and partitioning of missing atoms into parameterisable fragments: Algorithmic details General definitions A molecule is modeled as a molecular graph, a simple graph G = (V, E) whose nodes and edges correspond to atoms and bonds, respectively. Nodes are labeled by their atom type t : V ! ⌃, where ⌃ is the set of all atom types. It is expected that the parameters assigned to atoms will vary depending on the presence of nearby substituents. To account for this, atoms are considered in the context of their neighborhood: of a node u 2 V be the set of all nodes v 2 V for which a path (u, . . . , v) of length  k exists.

Partitioning of missing atoms into small fragments
If it is not possible to match all atoms to a fragment in the existing database, OFraMP collects all unparameterisable nodes V 0 ✓ V within a query molecule and returns the connected components of the subgraph induced by the union of k-neighborhoods S u2V 0 N k (u) for parameterisation by the ATB. However, by default the ATB imposes a maximum number of nodes M = 50 for parameter assignment based on higher level quantum calculations.
Therefore, OFraMP needs to partition molecular graphs which exceed this size restriction into sub-graphs. Our partitioning strategy is the following: First, the number of partitions should be minimised. Second, partitions should overlap and ring systems in the graph should be retained in at least one partition. Partitioning is achieved by examining whether the molecular graph is biconnected. A graph is biconnected, if it remains connected after removing any one of its nodes. A biconnected component (or block ) of a graph is a maximal biconnected subgraph. The smallest block is an edge (also called bridge). It is easy to see that a ring system is a block in a molecular graph. If a graph is connected and has two S-3 blocks, they share one node, the cut node (Fig. S2a).
(a) Blocks and bridges (dashed lines) of a small example graph.
(b) BB-tree with block and bridge nodes of the same small example graph. Figure S2: (a) Blocks and bridges in a simple graph and its (b) corresponding BB-tree. The red node is a cut node and shared between all three blocks or bridges.
Based on the idea of the block-and-bridge-preserving operator introduced by Horváth et al., S1 we define the k-s-block-and-bridge-preserving (BBP) partitioning problem as: Problem 1 (k-s-BBP-Partitioning). Given k 0, s 1 and a molecular graph (ii) all partitions are smaller than the maximum size 8 V 0 2 P : |V 0 |  s, (iii) the number of partitions |P | is minimal, (iv) every block in G is in at least one partition V 0 2 P , and The block-and-bridge tree or BB-tree is a tree which maps tree nodes to blocks in the graph G = (V, E) (Fig. S2b). S1 Nodes in the BB-tree are connected by an edge, if they share a cut node.
To solve the k-s-BBP-Partitioning problem, we partition the molecular graph recursively (Alg. 1). If the current partition is too large and has more than one block, we partition the S-4 graph into two smaller partitions V 1 , V 2 using its BB-tree representation. We then increase the overlap of the partitions by adding the k-neighborhoods of the cut node w.  To determine how to partition the graph, we find a balanced cut edge e in the BB-tree Problem 2 (k-BB-BalancedCut). Given k 0, a molecular graph G = (V, E), and its BB-tree T = (V T , E T , f), find a cut edge e 2 E T , such that for the resulting partitions We solve the k-BB-BalancedCut problem using a simple tree traversal algorithm (Alg. 2).
We iterate all nodes u 2 V T of the BB-tree sorted in ascending order by the number of their S-5 unprocessed neighbors and mark u as processed. Because we iterate a tree, there is always either a node with exactly one unprocessed neighbor (starting with the leafs) or, in the final iteration, the root node with no unprocessed neighbors. The sizes of partitions that would result by cutting the edge e = (u, v) of u and its unprocessed neighbor v are computed. The size of the current partition w u corresponds to the size of union of nodes in G mapped by the nodes of the subtree of T rooted at u. The size of the other partition w r corresponds to the rest of G. As it is required that the partitions overlap, the sizes of the k-neigborhoods of the cut node of f (u) and f (v) are added to determine if the partitions are still smaller than G. Finally, we return the edge resulting in the minimal size di↵erence between both partitions.
Algorithm 2: k-BB-BalancedCut(T, G, k) foreach if w u < |V | and w r < |V | and |w u w r | < d min then The k-neighborhoods of all v 2 V can be computed in a preprocessing step using a breadth-first-search which in the worst case has a complexity of O(|V | + |E|) for each node.

S-6
Overall, the time complexity for the preprocessing is O(|V | · (|V | + |E|)). The algorithm visits each edge of the BB-tree exactly once and the number of edges in T is bounded by the size of G with at most O(|E|) edges. Extracting the minimum from Q requires O(log(|E|)) time using a Fibonacci-Heap, with all other operations on Q performing in constant time.
Overall, the time complexity of finding the optimal cut edge is O(|E| · log(|E|)).
Applying the k-s-BBP-Partitioning algorithm to a large unparameterisable fragment of paclitaxel with k = 1 and s = 50 results in five overlapping partitions with the ring system contained in one of the partitions (Fig. S3).