Computational Mapping of Dirhodium(II) Catalysts

Abstract The chemistry of dirhodium(II) catalysts is highly diverse, and can enable the synthesis of many different molecular classes. A tool to aid in catalyst selection, independent of mechanism and reactivity, would therefore be highly desirable. Here, we describe the development of a database for dirhodium(II) catalysts that is based on the principal component analysis of DFT‐calculated parameters capturing their steric and electronic properties. This database maps the relevant catalyst space, and may facilitate exploration of the reactivity landscape for any process catalysed by dirhodium(II) complexes. We have shown that one of the principal components of these catalysts correlates with the outcome (e.g. yield, selectivity) of a transformation used in a molecular discovery project. Furthermore, we envisage that this approach will assist the selection of more effective catalyst screening sets, and, hence, the data‐led optimisation of a wide range of rhodium‐catalysed transformations.


Full Computational Details
Optimised geometries for all rhodium(II) complexes were calculated with the Gaussian09 1 software package in isolation using the standard BP86 2-4 density functional as implemented in Gaussian with the DZP basis set 6-31G(d) [5][6][7][8] on all atoms apart from rhodium where the Stuttgart/Dresden effective core potential MWB28 9 was used. Optimisations used 'tight' convergence criteria. Vibrational frequencies were not computed, and so the energetic data do not include a correction for zero-point energy, although we would expect this to be quite small. In the absence of frequency calculations, stationary points have not been verified as minima. However, most ligands and complexes are large and optimization to transition states seems unlikely for these carefully built low symmetry starting geometries.
Geometry optimisations were started from crystal structure geometries of the complex of interest (see CSD refcodes in Table S3 below), or by careful structural modification of related complexes.
Conformational searches used the default MMX force field in PCModel. 10 GMMX was used for stochastic conformational searches, generally with default settings. 500 iteration conformational searches were performed on the dirhodium complex as well as its carbene complex (stop criteria defined as Emin found 10 times and duplicates found 50 times). These conformational searchers were attempted for a subset of complexes to enhance conformational sampling but were hampered by missing parameters (circumvented by replacing atoms with elements where parameters existed and suitable restraints) and difficulties with convergence, most likely due to the large number of connected rings arising from the dirhodium core. A selection of conformers were then re-optimised fully at the DFT level, as described above. A full re-parameterisation of the force field lay outside the scope of this project, but for 3d and 4g, we explored the impact of conformational change on the descriptors. These are summarised in Tables S1 and S2 below. While for 3d, the crystal structure geometry led to the lowest energy conformer, this was not the case for 4g. However, the range of energies found for 4g was small and the descriptors for the XRD-derived conformer and the lowest energy species found are reasonably similar. Inspection of descriptors shows limited variation in structural and energetic parameters, but a larger range for the steric descriptors (He8 and |wV|) as well as for the energy for Diazo precursor to form the carbene complex (Ecoord), as might be expected. 11 While we recognise that conformational change will have a larger impact for the prediction of selectivity, reactivity and dynamic behaviour of some of the catalysts, in view of the computational problems with sampling conformer space reliably, we decided against including Boltzmann-averaged descriptors in the present version of this database.

Design of Descriptor Database
The 14 steric and electronic quantum chemical descriptors outlined below (Table S4) were then captured from optimised geometries. The coordination energy of the carbene generated from a symmetrical α-diazo malonamide precursor was calculated from converged energies (a.u.) using the equation below.   Figure S3). Acetonitrile complexes were initially calculated using the B3LYP functional, however, due to an inability to obtain convergence on several carboxamidate complexes, the functional was changed to BP86 for all other calculations. This functional may over-bind slightly but is computationally robust and converges reliably for all catalysts considered here.
3. Several steric descriptors were considered, including an adapted version of the He8 ring used in Bristol's LKBs, 12 Distance-Weighted Volume 13 and first-generation Sterimol parameters. 14 The interaction energies between the dirhodium complexes and a ring of 8 helium atoms, i.e.
He8 ring interaction energies were calculated as single-point energies at BP86/6-31G(d)/MWB28 level of theory, where the He8 ring was aligned 1.9 Å from the rhodium core (average r(Rh-C) bond length). Distance-Weighted Volume was derived from the MolQuO web app (http://rodi.urv.es/~carbo/quadrants/index.html), aligning the quadrants with the Rh-C bond and then removing the carbene ligand from the optimised geometry ( Figure S2). Firstgeneration Sterimol parameters were calculated by aligning the L vector with the Rh-Rh bond and using a python script available here: https://github.com/bobbypaton/Sterimol. Sterimol parameters were found to be prone to outliers due to the extreme size of many of the ligands and less capable of describing the steric environment around the Rh atoms, so were not used in subsequent analysis. We note that the more recent versions of this descriptor, which explore conformational variation (wSterimol), 15 may be able to address this, provided conformational searches can be carried out (alas, see above). 5. We also considered calculating free ligand descriptors to supplement this database, aligning more fully with Bristol's Ligand Knowledge Bases. However, existing LKBs focus on a single ligand and the effect of modifying this on both metal fragment and ligand coordination. In the present case, this is not particularly helpful, as we are interested in the effect of four ligands on the dirhodium "core". In addition, the ligands together determine the coordination site on Rh, so a ligand-centric "view" of this would be misleading. Finally, some ligands are capable of multiple coordination modes and, once metal coordination is removed, conformational space might lead to considerable variation which is not meaningful for the catalysis here. This does not preclude capturing these ligands, e.g. in the context of the recent LKB-bid database capturing wider bidentate ligand space, but we have not included such descriptors here. 16 The descriptors used for analysis are included as Table 1 in the manuscript. Descriptors tested can also be found in a data table supplied separately as part of the ESI.  Distance weighted volume, or quadrant occupation (see discussion above for details), gives a measure of the steric bulkiness of the ligand and its influence over the metal centre, and was calculated using the formula below.

Descriptors derived from the
, , = ∑ −1 where = 3 and = 1, = distance of atom to metal centre and = van der Waals radius of atom.

A B
For these dirhodium complexes, the frontier molecular orbitals correspond to metal orbitals available for bonding of additional ligands (e.g. the carbenes considered for descriptors) with the dirhodium core. This is illustrated in Figures S3-S5 below. Modelled Rhodium(II) complexes (parameters/descriptors) into sets of orthogonal components that represent most of the variation in the original data. The principal components can then be plotted in two/three dimensions and cluster data based on variance in the original data. This means the integrity of the chemical descriptors is retained, as the underlying data are not manipulated, and catalysts with similar properties will be neighbours, while dissimilar catalysts will be distant in the resulting PCA plot. The benefit of PCA is that the result is representative of the original chemical observations. This is because PCA represents the variance between data points, not the data points themselves, so the analysis does not project absolute values.
After capturing the descriptors outlined a relevant subset must be selected for use in principal component analysis, to ensure the subsequent model can be reliably interpreted and deconvoluted.
This means that the model must be generated from descriptors that can be related to chemical features important to the properties of the catalyst. Inclusion of redundant descriptors will make the PCA model harder to interpret as they will not contribute clearly to clustering and have low descriptor loadings which indicates low covariance with a principal component. Low descriptor loadings will reduce the quality of the PCA plot by lowering the amount of variance captured across the first three principal components. The most easily interpreted model will be the model that uses the fewest descriptors to clearly cluster catalysts with similar properties, meaning the ultimate decision on the correct number of parameters is qualitative.
To simplify the number of PCA plots that need to be analysed manually the best solution for each discrete number of descriptors can be calculated ( Figure S7). PCA results can be superficially ranked by the total variance captured and by the mean squared error of projection (the amount of information lost through PCA), and the top solution for n descriptors is selected by the highest percentage variance captured (for solutions, see Table S5). For qualitative and unsupervised analyses there is often no correct number of parameters for a model, instead models must also be ranked by their interpretability. Models that give even parameter loadings are favoured over those that bias results towards a fewer number of parameters, as they are often better at highlighting complex relationships in the original data. 17 Figure S7. Metrics for optimal PCA solutions of n descriptors.
The percentage of variance captured decreases, and the mean squared error of projection increases, as more descriptors are added to the PCA model. This is because the solutions with fewer descriptors have less information to capture and can effectively be represented with only three principal components. While a PCA model that captures 100% of the available information looks appealing from a metric point of view, it does not consider the quality of the clustering in the model or how relevant the information captured is. The next step is to evaluate the PCA plots for their interpretability in the context of dirhodium(II) chemistry ( Figure S8 and S9).
Analysis of the first two principal components (PC1 and PC2) ( Figure S8)