Growing Spicy ONIOMs: Extending and generalizing concepts of ONIOM and many body expansions

The ONIOM method and many extensions to it provide capabilities to treat challenging multiscale problems in catalysis and material science. Our open‐source program Spicy is a flexible toolkit for ONIOM and fragment methods. Spicy includes a generalization of multicenter‐ONIOM, a higher‐order multipole embedding scheme, and fragment methods as useful extensions of our own n‐layered integrated molecular orbital and molecular mechanics (ONIOM), which allow applying ONIOM and high accuracy calculations to a wider range of systems. A calculation on the metallo‐protein hemoglobin demonstrates the versatility of the implementation.


| INTRODUCTION
Computational chemistry has long been restricted to molecules of only a few atoms. Drastic improvements both in the capabilities of computer hardware and the theoretical models themselves provided access to larger and more complex chemical systems and methods with ever-increasing accuracy. 1 Nowadays, many chemical problems are related to interactions and behaviors of, among others, biomolecular systems, soft matter, or complex catalysts, necessitating reliable and efficient computational models. Many of those problems inherently possess multiscale character, as spatially small effects in large systems are to be investigated, that is, catalytic reactions of small molecules in the active center of large enzymes, 2,3 interactions of small substrates with large zeolites, 4 or dye-polymer interactions. 5 The development of the fundamental theoretical models continues, and with linear scaling self-consistent field (SCF), [6][7][8][9][10][11] correlation, [12][13][14][15][16][17][18][19] and density matrix embedding methods, 20,21 the quantum mechanically accessible system size grew tremendously. Force fields now allow to simulate entire viruses 22 and interface reactions up to the millimeter scale. 23 A different idea is employed by the quantum mechanics/molecular mechanics (QM/MM) [24][25][26][27] and ONIOM [28][29][30] multiscale approaches that aim to combine advantages of different methods to provide a tailorable system partitioning and description. Large but well-behaved parts of the chemical system can be described with low-cost methods, such as force fields, while smaller regions, where quantum effects play an essential role, or high accuracy is required, can be described with more precise but also computationally more demanding quantum mechanical methods. Thus, QM/MM and ONIOM have been successfully employed to describe the photophysics of proteins and deoxyribonucleic acid (DNA), [31][32][33][34] processes in soft matter materials, 35 catalytic processes, [36][37][38] inorganic materials, [39][40][41] reactive chemistry, 42 and more. In recent years, many implementations for QM/MM and ONIOM methods have emerged or have been improved; LiCHEM 43 and QM/MM 44 implement polarizable QM/MM methods, which can improve the description of ground and excited states. PyChemShell, 45 ASE, 46 and Cuby 47 implement QM/MM with electrostatic embedding, as parts of a larger computational chemistry frameworks, with access to a multitude of interfaced programs. ORCA 48,49 has recently implemented a set of QM/MM and ONIOM methods with electronic embedding. 50,51 The MRCC program package 52 in its latest version supports ONIOM with an arbitrary number of layers and inclusion of solvent effects via the polarizable conductor model (PCM). 53 Gaussian 54 offers a mature ONIOM3 implementation with micro-cycle driven optimizations and electronic embedding.
Despite the success of existing QM/MM and ONIOM methods, there remain classes of chemical systems, that push these methods to their limits. We are interested in describing interactions in electronically active soft matter materials, often in conjunction with embedded (photo)catalysts. For example, the integration of (photo)catalysts or photosensitizers into tailored polymer membranes can synergistically improve the stability of the active species and kinetic aspects while simplifying the handling of compounds sensitive to oxygen exposure. 55,56 An accurate description of their electronic properties and rich photochemistry prohibits their sole treatment by molecular mechanics (MM) methods. At the same time, a naive separation into low-and high-level regions may be impossible, as important electronic effects may occur in the entire polymer. Another popular example for delocalized, cooperative effects is hemoglobin, which exhibits large-scale structural changes when oxygen binds to the four heme centers. 57,58 All hemes are equally important, and association/dissociation of oxygen in one heme involves structural changes in the entire protein. Studying all heme centers simultaneously in a QM/MM or ONIOM setup using accurate, quantum mechanical approaches is, however, computationally very demanding.
To this date, many modifications of the original ONIOM method have been proposed. Multicenter (MC)-ONIOM allows independent calculation of multiple, non-coupling sites in a macromolecular system. 59 Generalized-ONIOM combines the many body expansion (MBE) fragment method with ONIOM in an attempt to describe large layers with quantum mechanical methods. 60 In combination with the generalized many body expansion (GMBE) and fragment combination ranges (FCR), even more flexible system partitions become possible, while at the same time also providing a different approach to multiscale/multilayer methods. 61,62 Fragment methods allow splitting the full system into many smaller calculations, and the fragment size can be adapted to accurately describe encountered electronic effects, for example, in conductive polymers. In contrast to ONIOM, fragment methods do not imply a hierarchical order. The X-Pol method provides a formalism for polarizable electronic embedding, even in these more complicated setups. 63 ONIOM-XS allows to adapt the system partitioning as needed by dynamic reassignment of fragments to different layers during molecular dynamics (MD) or geometry optimizations. 64 Such an approach is especially useful for handling large structural changes, for example, following an electronic excitation or associative/dissociative processes in enzymatic reactions. The quality of electronic-embedding ONIOM strongly depends on the employed embedding charges. A powerful and flexible method to derive distributed higher-order multipoles is the distributed multipole analysis (DMA). 43,[65][66][67] Together with fragment methods, DMA can provide accurate electronic multipoles for embedding at the quantum mechanics (QM) level of theory. In combination, versatile and flexible simulation setups in the extended ONIOM formalism can be realized. However, these methods are not widely applied, as integrated implementations are not commonly available.
In this article, we present the Spicy program, a well-integrated and user-friendly driver to carry out large-scale ONIOM calculations, with support for flexible fragment selection, and generation of high-quality embedding charges by means of DMA. Care was taken to implement these methods in a generic way, without restricting Spicy to a certain class of systems, for example, proteins. Spicy follows the approach taken by ASE, 46 pysisyphus, 68 LiCHEM, 43 and wraps computational chemistry programs to gain access to a wide range of computational models, which allows combining methods, that are not available within a single suite. Section 2 starts with a brief introduction to ONIOM, followed by a generalization of MC-ONIOM to arbitrarily nested multicenters. Electronic embedding and generation of higher-order multipoles is discussed in Section 2.2. Sections 2.3 and 2.4 provide details on how fragment methods are combined with our MC-ONIOM implementation and how micro-cycle driven optimizations are carried out. We will also discuss details of the implementation and how Spicy is developed in Section 3. Finally, illustrative test calculations will be presented in Section 4. With Spicy, we provide a useful, modern open-source implementation written with a high level of abstraction that will serve as a foundation for further developments, and complement the advances of multiscale methods in recent years.

| THEORY
Originally, ONIOM emerged as a generalization of the integrated molecular orbital + molecular orbital (IMOMO) 28 and integrated molecular orbital + molecular mechanics (IMOMM) 69 schemes, which allow using different levels of theory for different parts of a chemical system. 28 Given a set of atoms A comprising the real (full) system, a proper subset A m & A is chosen, which defines the so-called model system, a region usually treated with a more accurate and expensive level of theory. This system separation may lead to dangling bonds in the model system if a bond between any atom a A r , A r ¼ A ∖ A m , and any atom b A m , existed. Sophisticated approaches to treat those boundaries between the regions have been proposed, for example, tailored effective core potentials (ECPs), [70][71][72] or freezing localized molecular orbitals. [73][74][75] The simplest approach is the introduction of link atoms, commonly hydrogen atoms, to saturate the open valence. 76,77 By positioning the link atom at r l ab along the vector of the cut bond r a À r b , no additional degrees of freedom are introduced.
A reasonable value for the factor g can be obtained as a ratio of the covalent radii R cov of the involved atoms but empirically tuned values for g were also proposed. 78 The link atom approach has several advantages; it does not require implementation of a special Hamiltonian or orbital optimization procedure, works for any combination of MM and QM methods, and does not require tuned or system-specific parameters, for example, capping ECPs. Therefore, it is also applicable in setups, where, for example, an orbital localization scheme is not implemented, or a MM layer needs to be embedded. Contrary to additive QM/MM (Equation (3)), the ONIOM energy is obtained from a subtractive scheme. While often formulated for two layers (ONIOM2, Equation (4)), the subtractive scheme easily generalizes into an arbitrary number of layers (ONIOMn). 28 Both IMOMM and IMOMO can be rewritten as a two-layered ONIOM expression (Equation (4)), and the combination of both (one outer layer molecular mechanics, and two layers quantum mechanics) can be written as a three-layered ONIOM-expression (Equation (5)). 28 Here, the subscripts r, i, and m refer to the real, intermediate, and model systems, and the c j refer to calculation levels, which typically become more accurate with increasing j. ONIOM is considered an extrapolation scheme for the energy of the real, large system at a high calculation level. It has been noticed early, that the scheme generalizes to an arbitrary amount of layers n, which is then called ONIOMn. 28 In analogy to the system's energy, other extensive properties P can be calculated by the same scheme. The nuclear derivatives of the energy, the gradient g, and Hessian H are not extensive properties but become extensive when projected into the basis of the parent system before applying the ONIOM extrapolation. 78 This projection is given as where the properties from a layer j are projected into the basis of its parent system i, J denotes the Jacobian, and g i and H i are the nuclear derivatives of layer j expressed in the basis of i. The link atoms introduced in the model system are not present in the real system. Thus, their contributions to the nuclear derivatives are redistributed to their respective partners a and b in the real and model system, so that no additional degrees of freedom are introduced. Therefore, the Jacobian projects from the link-atom-augmented model system where A m,l is the set of link atoms of the model system, to the real system with the atoms A. For a system with m ¼j A 0 m j atoms in the augmented model system and n ¼j A j atoms in the real system, the Jacobian is of the form 3m Â 3n and is written as and g o a is a link factor calculated by equation 2 is o a is a link atom, Þ , and E is the 3 Â 3 identity matrix. So far, inter-layer interactions are only treated in the real system at the low level of theory and the mechanical coupling introduced via link atoms. However, the atoms of the set A r are not visible to the atoms of A m , and important interactions, that is, electrostatic attractions and repulsions, hydrogen bonds, or polarization effects in the model system, are entirely ignored. A substantial amount of such interactions can be captured by electronic embedding schemes. 29 In such schemes, point charges assigned to each atom in set A r are used to embed the model system, mimicking the electric field of the environment. The costs of this approximation are twofold. First the costs to obtain a point charge model of the real system, and second those to account for the electric field in the calculation of the model system. The latter is possible in most quantum chemistry codes, as embedding charges are easily considered via an additional term in the one-electron operator in the Hamiltonian of a QM calculation b H Here, in the Hamiltonian for the model system b H m , e refers to the electrons of A 0 m , J to the nuclei of A 0 m , and N to the point charges of A r . Point charges q N can be determined by various methods. While charge models such as Mulliken, 79 Löwdin, 80 or MM-charges come nearly for free, others, for example, restricted electrostatic potential (RESP)-charges, 81 may be costlier than the wavefunction calculation itself. While electronic embedding often significantly improves the accuracy of the ONIOM extrapolation, it also comes with a few pitfalls: (i) Nucleus-centered point charges are not observables, and the charge model cannot be uniquely defined. Charges obtained from commonly employed models can vary widely, for example, between Mulliken-, 79 Löwdin-, 80 and RESP-charges. 81 (ii) Short distances between atoms of A 0 m and A r can result in overpolarization of model system atoms close to the border. This is especially problematic when a link atom between a pair of atoms a A r and b A m is present, as this link atom is closer to the point charge representation of atom a than chemically reasonable. Such problems are usually avoided by scaling or deleting point charges in A r , that are within a given distance (real space or in terms of bonds) to an atom in A m . 29 (iii) Positive partial charges close to atoms of A 0 m attract electrons. This can lead to unphysical electron density localization around those positive partial charges, as the atom of A r , that is replaced by a simple point charge would also have repulsive effects on electrons via Pauli repulsion. 82 (iv) The sum of the point charges of A r is usually a non-integer number. Only A and A 0 m are required to sum up to the respective system's charge. Nuclear gradients of the model system depend on the gradient of the embedding charges of the real system with respect to the nuclear coordinates. In principle, this derivative coupling requires derivatives of the embedding charges, which is only possible for a limited set of point charge models, for example, Löwdin charges. 83 In Sections 2.1 and 2.2, we will describe how Spicy generalizes the classical ONIOM extrapolation scheme by allowing more flexible system setups and how the accuracy and robustness of electronic embedding are improved. We will also describe how fragment methods are used to approximately describe ONIOM layers in Section 2.3, and how the micro-cycle driven optimization scheme is generalized to arbitrary ONIOM setups in Section 2.4.

| Arbitrary multicenter ONIOM
So far, ONIOMn has been defined as an extrapolation scheme with a strictly linear layout of calculation layers, that is, there is exactly one model layer embedded in its parent layer. In 2003, Hopkins and Tschumper proposed an ONIOM2 scheme that allows defining multiple, non-overlapping models. 59 Later it was also extended to overlapping model systems. 84 The intersection corrections, that appear in those terms resemble those of the GMBE, 61 which is discussed in Section 2.3. However, there is no reason to permit only two layers (see Figure 1).
An arbitrarily deep ONIOMn scheme, where each system can have any number of non-overlapping subsystems, is conveniently represented as a rooted tree; see Figure 1 for an example layout with four layers, and eight (sub)systems. We will refer to such MC-ONIOM layouts with n layers as MC-ONIOMn. Each (sub)system in a MC-ONIOMn tree is identified not only by its depth in the tree, but more specifically by a sequence of steps i k ð Þ nÀ1 k¼0 relative to the root, which is called a path. The length of the sequence d i ð Þ ¼j i ð Þ j determines the depth in the tree. In the following, we will use this path equivalent to the respective node and its properties. Therefore, max i ð Þ O j i ð Þ j¼ n À 1 in the MC-ONIOMn terminology, where O is the set of all node-identifying sequences of the MC-ONIOMn tree. For example, the node "BA" in Figure 1b can be reached by the sequence B, A ð Þ starting from the root, and is at depth d B, A ð Þ ¼ 2. We can now denote a set of children C i ð Þ for each node i ð Þ, which is the set of index sequences leading to the children of i ð Þ. We can also define a set of nodes D d at the same depth d in the tree (a) (b) F I G U R E 1 Representations of a MC-ONIOM layout with four levels and multiple subsystems in the first three levels. The rooted tree representation of a standard ONIOM3 setup would be a linear tree of three nodes without branching. In 0 colors represent the depth of subsystems in the ONIOM setup, while the disjoint ellipsoids of same color represent disjoint model systems in an embedding parent layer.
In an equivalent representation in 0, each node represents a (sub)system in ONIOM, starting with the real layer 0 as the root of the tree. Colors represent the depth of a subsystem in the tree.
Given the example in Figure 1, the set of nodes at depth d ¼ 2 is g . Employing this nomenclature, the notion of "non-overlapping subsystems" as stated earlier, can be formalized: Given any node i ð Þ with its set of atoms A 0 i ð Þ , of which some may be link atoms, the subsystems of i ð Þ form a disjunct family of sets of atoms over A 0 Each subset in the family of sets over A 0 i ð Þ may then be augmented as necessary with link atoms. The notion implies that link atoms A i ð Þ,l of A 0 i ð Þ become real atoms in A C i ð Þ if included in any of the subsystems. Thus, the link atom property is not inherited from one node to another.
The two calculations for each node (one for the root) are performed by walking the tree in depth-first order. A second depth-first walk is used to calculate extensive properties P in the spirit of the original ONIOM extrapolation scheme. For this purpose, every node i ð Þ and its children C i ð Þ can be understood as a MC-ONIOM2 setup. Starting from the base case, the terminal nodes of the tree, whose properties are not subject to extrapolation, all terms can be collected in a recursion of MC-ONIOM2 layouts where in analogy to Equations (4) and (5), c i ð Þ denotes the level of theory originally defined for node i ð Þ. If i ð Þ has no children (C i ð Þ ¼ ;), the recursion terminates. Equation (13) is therefore a depth first walk through the tree and P ðÞ is the desired target property, where ðÞ denotes the root of the tree. Again, the nuclear gradient and Hessian have to be projected into their parent system's basis to become extensive properties (see Equations (6) and (7)).

| Electronic embedding
As noted earlier, the original mechanical embedding used in ONIOM and QM/MM misses the interlayer electrostatic interactions. In principle, intermolecular electrostatic interactions can be well described by a multipole expansion of their electric fields. However, the convergence radius of such multipole expansions is usually impracticably large and requires the interaction spheres of the involved molecules to be well separated. 67 Unfortunately, well separated interaction spheres are nearly never encountered in typical ONIOM and QM/MM setups. Instead, often the opposite is observed: one of the molecular systems will be surrounded by the other, for example, the active center of an enzyme within a pocket of the protein backbone. For IMOMM setups, where partial-charge providing force fields are involved, the atom-centered partial charges of the MM part can be included in the one-electron operator of the QM calculation to account for this embedding effect (Equation (9)). 29 IMOMO-like ONIOM setups, with one QM layer embedding another one, are also possible, and the embedding partial charges of the real system are commonly obtained by a partial charge analysis. 83,85,86 This representation by point charges can be viewed as a distributed monopole expansion of the molecular electric field, dramatically reducing the convergence radii of multipole interactions and thus allowing to adequately treat the difficult situation of embedded molecules. In Spicy we employ a robust higher order expansion of the electric field as obtained by Stone's DMA, or its grid-based version GDMA. 65,66 DMA relies on the fact, that a multipole moment of order l a þ l b originates at the overlap centers of two basis functions of angular momenta l a and l b . The method also supports relocating these multipole moments to arbitrary expansion centers, for example, the atomic nuclei. A distributed expansion of the electric field up to quadrupoles has been shown to be in excellent agreement with the true electrostatic potential. 67 Unfortunately, most computational chemistry programs cannot incorporate higherorder multipole terms into their Hamiltonians and are limited to distributed point charges. Such restrictions are circumvented by approximating a higher-order DMA multipole expansion only through point charges. This way, the good agreement of the multipole expansion with the true embedding potential is maintained while the resulting monopoles can be taken into account in most quantum chemistry codes. Gao et al., 87 and shortly after that Devereux et al., 88 have employed arrangements of point charges, called the distributed charge model (DCM), to represent a multipole expansion up to the quadrupoles. The method is also employed by the QM/MM program LiCHEM, which allows performing polarizable QM/MM calculations with the AMOEBA force fields and incorporates AMOEBA's higher-order multipole moments in the QM calculation. 43,89 We will briefly recapitulate on the distributed multipole model, its conversion to a DCM, discuss possible edge cases, and how it applies to arbitrary MC-ONIOMn. An arrangement of six-point charges can represent a multipole expansion up to second order (quadrupoles) if the point charge coordinates are chosen non-collinearly. Spicy adopts the octahedral point charge model, where multipole moments up to quadrupoles are represented as with a small distance d q , and Q 0 being multipole tensor elements expressed in the coordinate system of the principal axes of the Cartesian quadrupole tensor. These values are obtained from an arbitrary Cartesian quadrupole tensor Q 2 through simple diagonalization The atomic dipoles Q 1 must be rotated into the same axes system using R: Those DCMs introduce higher order multipoles for d q ≠ 0, which are usually negligible for small values of d q such as the 0:25 a 0 as proposed by Gao. 87 The charge octahedron of Equation (14) is defined in the basis spanned by the principal axes of the Cartesian quadrupole tensor. A local molecular reference frame is required to rotate the octahedron, so that the multipole moments have the correct orientation relative to other nuclei. Following the approach of Devereux et al., a local reference frame may be constructed from triples of non-collinear atoms. 88 The algorithm to select an appropriate reference frame for atom f and up to two other atoms g and h ( Figure 2) is outlined below: 1. If f has two directly bonded partners g and h, and r fg and r fh are not collinear, then f ¼ b, g ¼ a, and h ¼ c. If they are collinear, choose another g and recurse.
2. If f has only one bond partner g, and g has another bond partner h and r fg and r fh are not collinear, then f ¼ a, g ¼ b, and h ¼ c. If they are collinear, choose another h and recurse. 3. If f has only one bond partner g, and g has no other bond partner, look for the nearest neighbor h in the neighbor list of f , then f ¼ b, g ¼ a, and h ¼ c. If they are collinear, choose another h and recurse.
4. If f has no bond partner, look for the two nearest neighbors of f in the neighbor list. If r fg and r fh are not collinear, then f ¼ b, g ¼ a, and h ¼ c. If they are collinear, choose another h and recurse. 5. If f has no bond partner and only one neighbor g in the neighbor list, only one axis r fg can be defined, which is invariant under rotation. 6. If f has no bond partner and no neighbor can be found in the neighbor list, the principle axes of the Cartesian quadrupole tensors coincide with the molecular coordinate system.
Each application of these rules assigns between one and three atoms to a reference frame. Doing so may assign multiple reference frames to the same atom. In those cases, only the first reference frame containing the atom is used. The basis vectors o x,y,z for the octahedral charge model of atoms a, b, and c, in the basis of the principle axes of the quadrupole tensor are obtained as Thus, the coordinates of the point charges C in the basis of the principal axes of the quadrupole tensor O for each octahedron can be transformed into the molecular coordinate system M, yielding C: Reference frame and its axes systems for an atom triple a, b, and c. The axes labeled x, y, and z are the molecular Cartesian coordinate system in the Cartesian orthonormal basis. The blue, yellow, and red vectors are the basis vectors of the orthonormal basis, which is spanned by the principal axes of the Cartesian quadrupole tensor for each of the atoms a, b, and c, as in Equation (17). This local, atom-specific coordinate system is used to define the octahedral arrangement of point charges as in Equation (14). e x,y,z usually are the Cartesian basis vectors and therefore M ¼ E.
In contrast to the original algorithm outlined by Devereux et al., 88 which only considers covalently bound systems, in ONIOM multiscale setups, also non-covalently bound systems have to be considered. The following example illustrates the problem: a protonated tertiary amine R 3 NH þ is in proximity to its bromide counter ion Br À . The interaction is electrostatically dominated, and Br À and R 3 NH þ are not considered bonded in the sense of ONIOM (which would imply the introduction of link atoms if the R 3 NH-Br bond is broken). Nevertheless, due to the high polarizability of the bromide, its electron distribution should not be isotropic. Instead, a non-zero dipole moment is expected to arise. This corresponds to case 4 in the algorithm above, resulting in the construction of a Br-H-N reference frame. An important choice is the distance, up to which electrostatic interactions are still considered (see cases 3 to 5). Monopole-monopole interactions decay with r À1 and Spicy considers them negligible beyond 1.5 nm, as commonly done in classical MM simulations. Therefore, a neighbor list for distances of at least 1.5 nm is required. The resulting DCM, for an ONIOM2 calculation on ethanal, obtained from the algorithm just outlined, is shown in Figure 3.
In the context of arbitrary MC-ONIOMn, a polarization scheme for all subsystems has to be defined. This scheme should ideally allow reasonable polarization for arbitrary deeply embedded layers, while ensuring the independence of the different ONIOM tree branches. Building on the depth-first walk through the tree (as discussed in Section 2.1) that performs the calculations on each node, an embedding scheme for an arbitrary subsystem is defined as follows: The set of virtual embedding atoms for node i ð Þ is called A p, i ð Þ , and the rules to construct this set define the embedding scheme.
Here, i : ð Þ denotes a subsequence of i ð Þ, which is obtained by removing the last element of i ð Þ, and the superscript in A c i: ð Þ means that the multipoles of these atoms are obtained by the original (high level) calculation of each layer. Therefore, Equation (19) defines a scheme, which walks a branch of the tree bottom-up, starting from the node of interest. All parent nodes in the branch of the node of interest contribute to its polarization, and the highest level multipoles available for each atom are used. Importantly, this scheme ensures that no inter-branch polarization between different nodes at the same level is encountered, allowing different branches to be calculated independently. To avoid overpolarization, a scaling function s N b ð ) can be applied to the atomic multipoles in set bonds that separates any atom a A p, i ð Þ from any other atom b A i ð Þ , while considering the shortest path between a and b.

| Fragment methods
Fragment methods offer an efficient approximation to fully quantum mechanical calculations while mitigating the steep scaling of the full system's description. Instead, they describe a chemical system in terms of smaller components or fragments, limiting individual quantum mechanical calculations to systems of feasible size. Originally, the combination of the MBE with ONIOM has been proposed as generalized ONIOM, to reduce the computational costs of ONIOM calculations with large QM layers. 60 A multitude of fragment methods exist in the literature, such as the MBE, 90 GMBE, 61 systematic molecule fragmentation (SMF), 91 systematic molecule fragmentation by annihilation (SMFA), 92 FCR, 62 and molecular fractionation with conjugated caps (MFCC). 93 Given a MBE truncated at n-mers, the calculation time scales linearly with the number of groups in the system for a fixed n, if an interaction cutoff is introduced. Among those fragment methods, GMBE and FCR are the most general, as they place no restrictions on the fragment composition. In Spicy, a combination of methods from SMFA, GMBE, and FCR is employed to define a rather general fragmentation scheme. Fragment methods may be used instead of a monolithic calculation for any node in the MC-ONIOMn tree. Applied to the large outer layers, this allows building ONIOM setups, where even the lower level outer layers can be treated quantum mechanically in an economic fashion.
Below, the implementation of fragment methods in Spicy is outlined. Given a set of atoms A, it can be partitioned into groups G ¼ g 1 , g 2 , …, g n f g , where each group is a set subset of the atoms g i & A, and A molecule can be represented as an undirected graph, with the atoms A being labeled vertices, and bonds being labeled edges. Atom labels include a formal electronic charge and a formal number of unpaired electrons in the α and β orbitals. Bond labels are an abstract bond order, which is either a single bond, a multiple bond (any bond with a bond order > 1), or a coordinative bond. Spicy employs a slightly extended version of the SMF rules for group generation, to obtain bond orders and groups, as outlined below. 92 1. Initially, all atoms/vertices of the molecular graph are disconnected (Figure 4a). Bonds/edges, labeled with the abstract bond order, are introduced by the following steps: , and the number of bonds both a and b have, indicate a subvalent pair, their bond order is changed to multiple. Common valencies are looked up from a valency table for each element. Each bond label may be overwritten by the user. Finally, a molecular graph with atom vertices and undirected, labeled bond edges is obtained (Figure 4b). b. If a and b were connected through 1a, and any of the atoms a and b is a metal,* their bond label is changed to coordinative.
where d ab is the distance between two atoms a and b and d s is an additional tolerance (usually d s ¼ 40pm), a and b share a single bond. 2. All single bonds that do not involve a hydrogen atom are deleted from the graph. The disconnected subgraphs are preliminary groups, and within each preliminary group, the sum of the formal charge and the formal number of unpaired electrons is checked. If any of both is non-zero, all group members' bonds are added again. This means that charged or spin-polarized preliminary groups assimilate their neighbors. The disconnected subgraphs resulting from this step are the final groups (Figure 4c). 3. The molecular graph is reduced to an undirected graph of groups (Figure 4d). The labeled vertices are the groups g i , and unlabeled edges represent the original connections of the groups. (d) F I G U R E 4 Partitioning procedure of a molecular graph, yielding groups G ¼ g 1 , g 2 , …, g n f g . In the initial physical system description (a), the molecule is described by A in ℝ 3 without bond information. Bonds and their abstract orders are guessed via modified SMF rules, allowing the graph representation (b). Preliminary disconnected subgraphs are formed by deleting all single bonds that do not involve hydrogen. Preliminary groups that are charged or spin-polarized (according to the label information of the atoms) are reconnected to their direct neighbors, and the final subgraphs/groups (c) are obtained. A simplified graph of groups can be constructed, by maintaining the overall topology, but dropping the edge labels (d).
The graph of groups is the basis to construct a set of fragments F ¼ f 1 , f 2 , …, f n f g . The fragments (auxiliary monomers in the language of GMBE) are defined in terms of groups and may arbitrarily overlap but must comprise the entire graph.
Different algorithms, which may or may not make use of connectivity of the graph of groups, can be used to generate fragments in terms of groups ( Figure 5). Spicy implements SMFA, two types of graph traversals with ego-graphs, and fragment construction from ncombinations of groups or fragments, generated from graph-traversal or SMFA. The ego-graph of length l around a given vertex is a subgraph of the original graph, where all the subgraph's vertices can be reached with a maximum of l steps along the edges, starting from the given vertex. Each vertex in the graph of groups is visited, and its ego-graph of a specified length is formed to construct fragments. Similar to the level in SMFA, the length of the ego-graph l acts as an accuracy parameter of the fragmentation method, as does n, when forming n-combinations. The top left approach in Figure 5, constructs the ego-graph for every node, regardless of whether the node is already part of a fragment, resulting in overlapping fragments. A second fragmentation approach removes any vertex from the graph of groups that has been assigned to a fragment before processing the next vertex, producing non-overlapping fragments. Problematic fragmentations that suffer from unphysical interactions can occur when a single, small group such as ÀCH 2 À is removed from a cyclic structure of the graph. To circumvent such cases, a ring avoidance rule has been implemented for SMFA and F I G U R E 5 Different algorithms to generate fragments from a graph of groups (center). Fragments are shown as color-coded nodes. Nodes not part of a fragment are shown in gray. Overlapping ego-graphs (top left) of length 1 for each group yield strongly overlapping fragments. The ring-avoidance rule prevents formation of the problematic fragments g 2 , g 4 , g 5 , g 7 f g , g 4 , g 5 , g 6 f g , g 5 , g 6 , g 7 f g , g 4 , g 7 , g 6 f g f g . Nonoverlapping ego-graphs (bottom left) of length 1 exclude each group, that has already been assigned to a fragment from further inclusion in other fragments. Thus, those fragments do not overlap. The ring-avoidance rule is also applied for non-overlapping ego-graphs, and the fragment g 4 , g 5 , g 6 , g 7 f g instead of g 4 , g 5 , g 6 f gis formed. SMFA (top right) applies a recursive elimination scheme of groups and reduces the fragment size down to a given size threshold. Fragments may also be formed by n-combinations of groups or other fragments. The bottom right shows the small subset of fragments, obtained when 2-combinations of the groups are formed. Many of such dimers may include problematic interactions of link atoms, if the groups are small. When g 1 , g 2 , g 3 f gis a prop-2-yl group, the dimer g 1 , g 3 f gwould introduce link atoms in close proximity to each other (marked by red Â), and similar for the dimer g 2 , g 7 f g.
ego-graph-based constructions, avoiding fragmenting rings of size 2n þ 2, where n is the level of SMFA, respectively the length of the ego-graphs. 91,92 Figure 5 contains an example where the ring-avoidance rule prevents deconstructing the four-membered ring. The different fragment construction algorithms have different strengths and weaknesses. Graphtraversal approaches such as SMFA and ego-graphs construct spatially strictly confined fragments, where each fragment only contains directly bonded groups. Graph-based fragmentation allows for computationally very efficient fragment construction and their number grows only approximately linearly with the system's size. Conceptually they are well suited to describe the covalent interactions of a molecule, but miss non-covalent interactions. Groups with a large distance in the graph of groups can have a small distance in real space, and the graph-based approaches necessarily miss those. For example, in Figure 5, long-range interactions between the cycle and the substituents on the left part would not be included. The n-combinations of groups are better suited for inclusion of non-covalent interactions and they may also include combinations of groups that are distant in the graph but close in real space. Yet their construction is computationally more demanding due to the large amount of possible combinations, and their number grows fast with the system size. Employing very small groups, for example ÀCH 2 À, as commonly constructed by SMF's group generation, can give rise to fragments with link atoms in very close proximity (see the first dimer in Figure 5). Link atoms that are too close, can introduce artificial unphysical repulsion between fragments and may prevent the convergence of QM calculations. This problem is avoided by using larger groups, such as whole amino-acid residues in a protein, or by forming n-combinations of non-overlapping ego-graph fragments. Spicy allows forming n-combinations of fragments, generated by one of the other methods while allowing to specify a cutoff distance, beyond which the n-combinations of involved groups will not be formed. The distance between a pair of groups g a and g b is d g a g b . It is defined as the minimal distance between any of their atoms.
Similar to the isolation of a sublayer in ONIOM, fragment construction is likely to leave dangling bonds in the fragments. Those are treated on the same footing as ONIOM subsystems, that is, the bonds are capped with link atoms, as shown in Equations (1) and (2). Originally, the GMBE provided an expression to obtain the energy for a set of arbitrarily overlapping fragments but it also applies to any extensive property P. Each fragment f in the initial set of fragments F contributes an intersectioncorrected property P f ð Þ to the full system's property P F ð Þ in terms of its constituting fragments.
The GMBE equation applies universally to any set of fragments and ensures that double counting is avoided by the principle of inclusion and exclusion. 61,62 Again, a graph-theoretical approach can be used to implement Equation (23) which can be viewed as a forest of trees that form intersections ( Figure 6). Each tree represents the calculation of an intersection-corrected property by forming intersections of fragments from the original set, until the intersections become empty. The depth of a node in each tree determines the sign in the GMBE sum, starting with a positive sign, and then alternating for each layer in each tree of the forest. The implementation of the GMBE sum as a forest is concisely formulated as hylomorphism of trees. While unfolding the trees (anamorphism), fragments f 0 arise that were not necessarily part of the original fragment set F, and their properties must also be calculated. A consistent electronic structure is ensured by annotating the groups with additional information such as charge and spin. This also allows different magnetic couplings between groups to be considered. Implementing GMBE as an anamorphism gives rise to a myriad of terms, resulting in very large trees, especially with strongly overlapping fragments, for example, whenever n-combinations are formed. However, most terms cancel when collapsing the trees and summing over the different signs (catamorphism). Calculating the GMBE can become tremendously demanding, and calculating the GMBE sum for dimers of only ≈ 100 groups can become nearly impossible. In practice, this hylomorphism becomes the bottleneck of GMBE calculations, way before the QM calculations. The FCR formalism recently introduced by Hellmers and König, 62 provides an alternative approach to express and implement GMBE. R F ð Þ is the set of all fragments that may be formed from the original set of fragments F in terms of the power sets P of the individual fragments: This new set R is called the fragment combination range, and provides an alternative way to write the GMBE sum over extensive properties A formulation in terms of FCRs has the advantage to be independent of the overlap of fragments, which needs to continue until empty sets of groups are encountered and avoids the unfavorable growth of the trees as in Equation (23) and Figure 6. By avoiding recursion over sums in Equation (25), the number of terms in the GMBE is reduced dramatically. In consequence, the required computation time grows only linearly with the number of fragments in the original set for a given fragmentation scheme, and much larger systems can be treated using the FCR formalism.
The computational aspects of GMBE can be improved when approximate fragment interactions are included via electronic embedding. While arbitrary long-range interactions could be taken into account, the number of required computations increases drastically with growing distance cutoff during the n-combinations of groups and fragments. Instead, these long-range interactions can be approximated by electronic embedding, which even allows decreasing the distance cutoff. Contrary to the electronic embedding of ONIOM layers, the fragments have no hierarchical order, and polarization does not occur unidirectionally, as the fragments influence each other. Embedding multipoles for each F I G U R E 6 A tree can be used to represent an intersection corrected property P from Equation (23). The circles represent fragments f i from the initial set of fragments F, and the color encodes the actual fragment. Intersections of those fragments are marked in red, and a branch of the tree terminates when an intersection becomes empty. The depth of a node in the tree n determines the sign with which the fragment that is obtained by this node contributes to the GMBE sum as À1 ð Þ n , and denotes that n þ 1 fragments intersect. fragment can either be obtained from gas-phase calculations on each fragment in R or in a self-consistent manner in the spirit of the X-Pol method. 63 In X-Pol, the multipoles for a given fragment are iteratively calculated in the field of the other fragments of the previous iteration until the residual mean square value of all atomic multipoles converges to a certain threshold. Due to the absence of density-matrix derivatives for the multipole moments, those contributions are neglected, but the resulting errors were found to be very small. 63 Each atom a may appear in multiple fragments, and, thus, several multipoles may be available for each atom. Spicy follows Richard's and Herbert's suggestion, that each atom's atomic multipoles should be arithmetically averaged over the different fragments. 61 Typically, the atomic multipoles in a X-Pol approach converge in <10 iterations.

| Optimizations
Direct geometry optimization of an entire ONIOM system is not different from optimizing a non-ONIOM system. Both MC-ONIOMn and GMBE provide gradients and Hessians (Equations (13) and (25)), enabling usage of standard secondorder optimization algorithms. However, for large systems, direct optimizations can become prohibitively expensive. First, Hessian diagonalization for a system comprising N atoms scales as O N 3 ð Þ, which can outweigh the costs to calculate the system's energy and gradient. This can, in principle, be solved by using optimizers that do not utilize Hessian information (conjugate gradient 94 ) or avoid construction of the Hessian explicitly (limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) 95 ). Second, the internal-Cartesian back-transformation can become expensive for large systems as it can involve multiple O N 3 ð Þ steps. And third, weakly bound systems and molecular clusters may be difficult to converge, and often require care in the coordinate setup or specialized fragment coordinates. 96 For multilayer schemes like QM/MM and ONIOM, the consequence is often that a large number of expensive optimization steps is required. While outer layers commonly use semiempirical or MM methods, where gradients are inexpensive to calculate, they often contain weakly interacting fragments or solvent molecules which require many steps until a stationary point is obtained. At the same time, the costs to evaluate the gradient of the entire ONIOM setup are often dominated by the inner layers, which may require only a few optimization steps but commonly employ expensive ab initio methods. The problem has long been recognized and was solved by algorithms that utilize micro-cycles, where the outer layer is optimized before taking a step in the inner layer. 97,98 In Spicy, we generalize micro-cycle driven geometry optimizations to arbitrary MC-ONIOMn setups while further disconnecting the optimization steps on different layers.
To optimize a layer i ð Þ independently from its sublayers, a subset q i ð Þ ⊆ q of the coordinates of the entire system q must be chosen. Changing the coordinates q i ð Þ must not influence the nuclear gradients of any layer that is not a parent layer Consequently, for a given layer i ð Þ, q i ð Þ must comprise the coordinates of atoms that are exclusively members of this layer. In addition, the coordinates of the link atom hosts in the parent layer must be included, whose missing valencies are mimicked by link atoms (see Figure 7).
However, for MC-ONIOMn, the situation is complicated, as two or more sublayers of a layer may have link atoms a  Figure 8). Thus, to account for possible couplings between subsystems, these subsystems must be optimized simultaneously. However, the calculations on centers of the same depth slice can still be performed independently. Consequently, valid micro-steps for an ONIOM tree can be taken in all cases if layers in the same horizontal slice D d are optimized simultaneously and Equation (26) is instead written as It is possible to use a different optimization strategy for each depth-slice of the ONIOM tree if care is taken that translational and rotational degrees of freedom are correctly accounted for. An advantage of this separation is, that different coordinate systems can be used for optimizing different slices, that is, translation rotation internal coordinates (TRIC) 96 or simple Cartesian coordinates for the large, outer layers, and redundant internal coordinates for the inner layers. Furthermore, an optimizer only treating a small inner layer can optimize to a transition state without interference from F I G U R E 7 Separation of coordinates for 2,2-dimethylpropanoic acid in an ONIOM2 setup. The high-level layer comprises the carboxy group (yellow), and the low-level layer also comprises the tert-butyl group (blue). To satisfy the condition ∂E A ð Þ =∂q ðÞ ¼ 0, q ðÞ must not include coordinates of C1, as coordinates of the link atom in the model system directly depend on those of C1 (Equations (1) and (2)). modes of outer layers. 99 An algorithm, suitable for such a micro-cycle driven optimization with independent optimizers for each depth slice is outlined below: 1. Setup coordinates q D d for each depth d according to Equation (27). Layers at the same depth slice will be optimized simultaneously. 2. Initialize a separate optimizer for each depth slice and specify a coordinate system, convergence threshold, and optimization algorithm. All atoms A ðÞ and their coordinates q are visible to each optimizer so that translational and rotational degrees of freedom are correctly handled. However, all coordinates except q D d are frozen for an optimizer at depth d. In the construction of internal coordinates, frozen atoms can be excluded, thus keeping the number of defined internal coordinates to a minimum. 3. Calculate the MC-ONIOMn gradient for the full system. 4. Initialize depth d to start at the deepest slice of the MC-ONIOMn tree, thus d ¼ n À 1. 5. Optimize q D d by performing the following steps: a. If d < 0 return. b. Request slice D dÀ1 to be completely optimized by recursively calling into step 5 with d À 1. c. Calculate the gradient of all slices D i ≤ d to obtain a valid gradient expression for q D d . d. Calculate and perform a geometry displacement step in q D d . e. Check for the convergence in q D d . If the optimization at this depth slice has converged, return. Otherwise recurse into step 5a.
Thus, every depth slice requires all parent depth slices to be fully optimized before performing a single geometry optimization step. When the optimizer returns, the deepest depth slice has converged, ensuring that all slices above have also converged.
Equation (27) is only strictly satisfied for mechanical embedding. In electronic embedding schemes, the displacement of atoms in parent slices will always affect the properties of sublayers through the electric field, polarizing the sublayers. As has been previously shown, it is necessary to take the change of the embedding charges with respect to the nuclear coordinates into account to calculate valid gradients with electronic embedding. 100 For DMA such nuclear derivatives have not been formulated yet, and our implementation must therefore ignore these small contributions, possibly at the expense of requiring more optimization steps. In contrast to electronic embedding schemes in QM/MM, where the QM part and MM part polarize each other, and another set of microcycles has to be performed, 97 the unidirectional nature of electronic embedding in ONIOM simplifies the problem. The algorithm outlined above also ensures that steps are always taken in the presence of a consistent polarizing electric field.

| IMPLEMENTATION
The Spicy program is written as a platform to develop, test, and implement ONIOM and fragment methods, available on GitLab (https://gitlab.com/theoretical-chemistry-jena/quantum-chemistry/Spicy), and licensed under the free AGPL-v3. Spicy is implemented in Haskell, which allows a very high-level programming style without sacrificing performance. Haskell's cheap green threads and software-transactional memory allow a heavily asynchronous programming style with a clear separation of tasks (e.g., see Sections 3.1 and 3.2). 101 Furthermore, the "massiv" array library, also building on Haskell's green threads, enables parallel and efficiently fused array operations. 102 For example, construction of local frames and the DCM comprises only a minor part of the calculation time. The rules outlined in Section 2.2 to select a suitable reference frame can be efficiently checked by utilization of a neighbor list that is obtained in O N ð Þ for N atoms and that is subsequently also used to determine the system's topology. Both the neighbor list and the topology are stored as a PATRICIA tree of the atom indices and provide efficient lookups in O min N, W ð Þ ð Þwith W ¼ 64. 103,104 Consequently, construction of the embedding charges comprises typically < 1% of the runtime costs.
While still in early stages of its development, Spicy already implements arbitrary non-overlapping MC-ONIOMn (Section 2.1), electronic embedding with distributed multipole expansions up to the quadrupoles (Section 2.2), the GMBE in terms of FCR together with a set of fragment generation algorithms (Section 2.3), direct and micro-cycle driven optimizations, a couple of interfaces to external computational chemistry programs (Section 3.1), and a VMD-inspired 105 selection language for layer and fragment setups. Emphasis is put on a general implementation that exposes important tuneable parameters and which avoids implementing special cases for specific layouts or method combinations.

| Interfaces to quantum chemistry programs
Spicy interfaces a couple of computational chemistry programs which are utilized to perform individual subsystem calculations. Interfaces to the xTB program, 106 Psi4, 107 and Turbomole 108 are currently available, offering access to all methods available in these programs. As most computational chemistry programs do not offer a C-application programming interface (API) or cannot easily be wrapped with a C-API, file-based interfaces are often required, an approach commonly taken by external interfaces. 46,47,68,96,109 Both input generation for external programs and output handling from the external sources are unified in common data types in Spicy. The common input data type allows specifying computational models, convergence options, and approximations in a structured data type. However, the definition is general enough to access arbitrary methods of the underlying computational chemistry engines. Another data type stores the calculation results from the external programs and comprises energies and nuclear derivatives thereof, distributed multipole moments, the orbital basis, and a one-particle density matrix or wavefunction. Multipole moments for quantum mechanical calculations are automatically obtained by passing the appropriate data to Stone's GDMA program 66 as formatted checkpoint files (.fchk). These formatted checkpoints are serialized from Spicy's unified data structures and thus enable the GDMA program to be used with programs that do not natively produce formatted checkpoint files.
Spicy requests subsystem calculations from an asynchronous calculation backend in a concurrently running thread, clearly separating the logic of ONIOM and fragment methods from the work to obtain required physical data. Calculation requests are queued, and the backend orchestrates the calculations. Furthermore, it allows extending the calculation backend to a network-distributed scheduling system, scattering the calculations over multiple machines, for example, on a high-performance-computing cluster.

| i-PI client and optimizations
Similar to the calculation of subsystems' properties (Section 3.1), Spicy does not implement the optimizers itself but interfaces external, well-tested, and proven engines. The i-PI path integral MD engine 110 uses a network-based communication protocol, which requests force evaluations from a computational chemistry program and provides new geometries to those clients based on a time propagation step. A superset of the original, client-sided protocol has been implemented in Spicy, which adds additional communication options to exchange Hessians and energies. The extended protocol is a superset, strictly compatible with the original implementation. Spicy interfaces the pysisyphus optimization engine 68 via this extended i-PI protocol, giving full access to pysisyphus' optimization algorithms and allow searching stationary points (minima and first-order saddle points) in internal and Cartesian coordinates. Pysisyphus is also used in micro-cycle driven optimizations, where one i-PI client thread per depth slice of the ONIOM tree is utilized. The very low costs of green threads and low latency, high bandwidth UNIX sockets ensure high efficiency in this asynchronous calculation model.
The i-PI protocol and Spicy's asynchronous communication model will also facilitate the integration of molecular dynamics engines.

| Package composition and build system
Spicy is written in standard Haskell2010 111 with a common set of extensions and Haskell dependencies and uses the Cabal build system. Thus, Spicy is a fairly standard and portable Haskell package. A test suite is regularly executed by continuous integration, fully statically linked build artifacts are available, and dependencies are updated regularly.
However, due to reliance on large computational chemistry packages, the dependency graph for an actual Spicy runtime is rather large and complex, comprising 211 Haskell and 360 non-Haskell dependencies at the moment. Installing and satisfying all dependencies is an additional challenge in software stacks that mix different ecosystems, such as C/C++, Fortran, Python, and Haskell. Additionally, composing a working runtime environment can be complicated by conflicting versions of MPI or BLAS/LAPACK implementations and more. Consequently, reproducing a computational environment at a different point in time or on a different machine is extraordinarily difficult. Spicy can optionally be built with Nix, a purely functional software management system aiming at full reproducibility. 112 By integration of nixpkgs and the NixOS-QChem overlay, Spicy and all its dependencies, including the computational chemistry programs, can be built fully reproducible, ensuring perfect transferability. 113-115

| TESTS AND APPLICATIONS
In scientific software, implementation errors can be particularly insidious; they can manifest as hard to detectable numerical errors, especially since reference values may simply not be available for new methods. For this reason, Spicy includes a suite of unit-and property tests which ensures that key parts of the ONIOM code exhibit correct behavior, as well as a selection of test calculations to check the proper interaction between Spicy and its calculation providers. These test suites are distributed with Spicy and can be automatically executed through the Nix and Cabal build systems.

| Verification tests
Spicy has been tested on a variety of chemical systems to cover several methods and use cases. Some of these tests feature new systems while others ensure consistency with previous results and implementations. Several test cases are presented in the following. If not specified otherwise, calculations were performed using the standard settings of the involved programs. In particular, geometries were converged using the following criteria: absolute maximum force ≤ 4:5 Â 10 À4 E h a 0 À1 , root mean square (RMS) of all forces ≤ 3:0 Â 10 À4 E h a 0 À1 , absolute maximum displacement ≤ 1:8 Â 10 À3 a 0 À1 , RMS of all displacements ≤ 1:2 Â 10 À3 a 0 À1 .
In their article accompanying the implementation of ONIOM3 in Gaussian, 54 Dapprich et al. 78 present a set of geometry optimizations. These calculations on ethanal, trifluoroethanal, propanal, and cyclobutene have been reproduced in Spicy, using Psi4 107 as the calculation provider. Reassuringly, results show excellent agreement between both implementations. Bond lengths and angles are accurately reproduced (see Table 1; Scheme 1). Minor differences are easily reconciled due to the fact that the original calculations used a non-standard modification of the B3LYP Note: Parameter names refer to the labels in Scheme 1. Distances are in picometers, angles in degrees. a "Model n" refers to a model system that includes the water molecule, hydroxyl group, and n CH 2 groups along the alkane chain. b Refers to a non-ONIOM calculation.
the cluster geometry converges within 11 steps in the inner layer and 230 cycles in the outer semiempirical layer, yielding a tremendous reduction of potentially expensive QM calculations of 89%. In this example, a speedup of around 300% (201 s versus 600 s) could be achieved on a local workstation with a Intel Xeon W-2223 processor.
We conclude that Spicy's micro-cycle driven optimizations are an excellent tool to reduce computational effort during ONIOM geometry optimization.
Another set of original calculations examines the validity of the GMBE approximations as implemented in Spicy. For this, a cluster of 25 water molecules was chosen, as the individual molecules make natural fragments and also have pronounced multipoles to heighten the importance of charge embedding. The system was treated at the GFN-2 level of theory. 118 Using GMBE truncated at the dimer level without any charge embedding incurs an error of 186mE h for a relative error in the GFN-2 energy below 0.15%. Introducing multipoles through an X-Pol scheme 63 reduces the error to 68mE h (0.05% relative error) but requires an iterative scheme which, in this case, converges in six iterations. Using gas phase multipoles instead gives nearly identical results (within 2:4mE h of the X-Pol method) while requiring only one additional calculation. Higher accuracy could of course be achieved by extending GMBE to include trimers.
Finally, a geometry optimization of cyclohexane demonstrates the robustness of the GMBE approach. Cyclohexane is fragmented using the overlapping ego-graph algorithm with length 1, leading to propane fragments. Despite its simplicity, this scheme performs very well, with all bond distances being replicated in sub-pm accuracy when compared to a non-GMBE calculation with the same method. Bond angles are also of high quality, with a RMSD of only 0.1 when compared to the non-GMBE reference.
To evaluate the scaling behavior of Spicy's fragment method implementation, a series of gradient calculations on a n-alkane of increasing chain length were conducted. Utilizing the SMF group generation discussed in Section 2.3, methyl and methylene groups are obtained. Fragments are then generated in a two-step procedure. First, nonoverlapping ego-graphs of length one are constructed resulting in CH 3 -CH 2and -CH 2 -CH 2 -fragments. In a second step, dimers of these preliminary fragments are formed within 750. Figure 9 illustrates the wall times of RI-MP2/cc-pVTZ calculations with respect to the size of the molecule. The wave function calculations were provided by Psi4 1.6.1 107 running on an Intel Xeon W-2155 CPU.
Clearly, Spicy can achieve linear scaling in this case. For n ¼ 20 the cost of a standard MP2 calculation was already 1774 s, that is nearly eight times higher than that of the fragment calculation. Even though the fragmentation scheme is extremely simple, the resulting error is only 0:6mE h , compared to the canonical calculation. A profiling analysis of the runtime revealed that ≈ 90% of the spent time is due to the wave function calculations and analyses in Psi4 and GDMA while the remaining 10% mainly comprise file handling and parsing in Spicy. Construction costs of neighbor lists, groups as well as fragments are negligible ( ≤ 2%).
In addition to these calculations, Spicy includes many additional test cases. Interested readers are referred to Spicy's online presence under https://gitlab.com/theoretical-chemistry-jena/quantum-chemistry/Spicy.

| Application to hemoglobin
As a demonstration of Spicy's extended capabilities and flexibility, this section presents an application of a complex ONIOM setup for the oxy-hemoglobin macromolecule (PDB code 6BB5 121 ). As a protein with multiple active sites, it is a typical use case for MC-ONIOMn. A partitioning for an ONIOM calculation of hemoglobin arises naturally. The real, outermost layer includes all atoms and must be treated with an efficient method; xTB's GFN-FF 106 is a sensible choice. Dividing according to the individual heme units leads to a MC-ONIOM setup (see Figure 10). All groups (as detected by SMF) within 0.5 nm of each metal center are collected into "pockets," forming four centers. These layers are treated with the GMBE formalism using GFN-2. 118 Figure 11 depicts one such pocket. Note that the automatically detected groups are quite small, rarely exceeding the size of a benzene ring. Finally, in each of these pockets, the actual heme units form the deepest sublayers. For these, Turbomole's implementation of HF-3c 122 provides the electronic structure. The converged wavefunctions of the heme centers were checked by SCF stability analyses. Figure 12 demonstrates the embedding of the inner heme layer in the electrostatic field. The inclusion of the electrostatic field induces an energetic shift of 0:16E h , or more than 4 eV, in the energy of the heme unit. All of these electrostatic interactions would have been lost completely in a simple mechanical embedding scheme.
This calculation of a macromolecule with over 9000 atoms could be conducted on a local workstation in a short period of time ( ≈ 20 min with the GFN-FF topology pre-generated, running on a Intel Xeon W-2155 CPU) while still delivering high quality electronic structure at the important reactive sites.

| OUTLOOK AND CONCLUSIONS
In this contribution, we have presented the Spicy program, a comprehensive free implementation of ONIOM and fragment methods. Importantly Spicy offers a combination of unique features.
• Spicy's ONIOM implementation allows setups with any number of layers.
• MC-ONIOM methods allow a tree-like structure of the multilayer setup with an arbitrary number of centers per layer to reduce computational costs for weakly interacting centers. • A multipolar electronic embedding scheme provides higher accuracy for electrostatic interactions.
• Support for complex layer definitions via an easy-to-use domain-specific language.
• A set of generation algorithms and implementation of FCRs allows utilization of a wide range of fragment methods.
• Fragment methods, the MC-ONIOM method and electronic embedding can be combined flexibly to obtain generalized or extended ONIOM-like methods. • A micro-cycle driven optimization scheme provides efficient structure optimizations on large systems with different coordinate systems and optimizers.
F I G U R E 1 2 One home unit of hemoglobin embedded in the polarizing field of the surrounding protein fragments. Multipoles were obtained at the cost-efficient GFN-2 level to enhance the accuracy of the Hartree-Fock description of the inner heme electronic structure. Blue color on the isosurface denotes positive electrostatic potential while a red tint signifies negative electrostatic potential.
• Utilizing a wrapper approach, more computational methods than available in a single program can be combined.
This previously unavailable flexibility in the system setup in combination with the large range of available methods in the underlying computational chemistry programs, enables calculations on systems, that would otherwise be inaccessible, as for example, force-field parameters may be missing.
Spicy relies on external, well-tested, and performant codes whenever possible, an approach that has gained traction in recent years and enables more rapid development, specialization, and tuning of individual components. 107,109,[123][124][125] With Psi4 and Turbomole, two performant quantum chemistry codes with extended capabilities for wavefunction and density functional theory (DFT) calculations are interfaced. xTB provides access to fast, semiempirical methods, applicable to a wide range of systems. By interfacing pysisyphus, a powerful optimization engine is available, and the GDMA program provides facilities for wavefunction analysis.
We have demonstrated Spicy's ability to produce correct results for MC-ONIOMn calculations, micro-cycle driven optimizations, and fragment calculations. To the best of our knowledge, the MC partitioning enabled the first calculation on the notoriously difficult hemoglobin, where all four heme centers were calculated using quantum-mechanical methods. GMBE was used to obtain high-level multipoles in the pockets surrounding the hemes and finally wavefunction calculations treated each heme separately in the electric field of the protein backbone, expanded up to quadrupoles.
Among other applications, Spicy will facilitate the computational investigations of photocatalysts in semiconducting polymers as well as self-healing polymers in our group.
While Spicy is a young program in its early stages of development, it already implements an extensive set of methods. Near future development efforts will concentrate on further generalization of fragment and ONIOM methods and implement a multilayer-FCR scheme 62 as a superset of the presented ONIOM and fragment schemes. The implementation of a network-distributed scheduling system will facilitate massively parallel fragment calculations, while at the same time, we aim to further reduce resource consumption by means of sparse linear algebra and streaming optimizations. Mid-term the development will focus on molecular dynamics and dynamic system partitioning in the spirit of ONIOM-XS. 64 In a long-term perspective we aim to provide multilayer excited state calculations in Spicy, that is, by explicitly accounting for embedding of excited states.
Spicy is built as a platform open to contributions, and we hope to provide a useful tool and basis for further application and developments in fragment-and ONIOM-like multiscale methods.

ACKNOWLEDGMENTS
Phillip Seeber would like to thank Luise Modersohn for the helpful discussions on efficient implementation of several algorithms and mathematical details and Prof. Christoph Jacob for helpful discussions and valuable hints on the implementation of GMBE. Open Access funding enabled and organized by Projekt DEAL.

FUNDING INFORMATION
The authors gratefully acknowledge the financial support provided by the German Research Foundation within the TRR CATALIGHTproject number 364549901-TRR234 (project C5), as well as CRC 1375 NOA (project A4).

CONFLICT OF INTEREST
The authors have no conflict of interest.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available in Spicy's GitLab repository at https://gitlab.com/ theoretical-chemistry-jena/quantum-chemistry/Spicy.