Using Diffusion Maps to Analyze Reaction Dynamics for a Hydrogen Combustion Benchmark Dataset

We use local diffusion maps to assess the quality of two types of collective variables (CVs) for a recently published hydrogen combustion benchmark dataset1 that contains ab initio molecular dynamics (MD) trajectories and normal modes along minimum energy paths. This approach was recently advocated in2 for assessing CVs and analyzing reactions modeled by classical MD simulations. We report the effectiveness of this approach to molecular systems modeled by quantum ab initio MD. In addition to assessing the quality of CVs, we also use global diffusion maps to perform committor analysis as proposed in.2 We show that the committor function obtained from the global diffusion map allows us to identify transition regions of interest in several hydrogen combustion reaction channels.


INTRODUCTION
−6 For some molecular systems such as the alanine dipeptide, physically intuitive CVs such as dihedral angles perform very well, and for chemical reactions simple CVs based on distances are often assumed to be adequate.In such cases, they can also be readily combined with enhanced sampling techniques such as metadynamics 7,8 or umbrella sampling 9,10 to identify reaction barriers and transition pathways.
Several methods have been developed to identify CVs, including those using machine learning, 11 but they are often validated using molecular systems for which a lot is known already, especially for the alanine dipeptide 12−15 However, in general, it is not so clear how to identify relevant CVs for complex chemical or material systems in the condensed phase or when chemical reactions occur.For example, aqueous phase transformations can involve many solvent molecules that undergo concerted rearrangements that are mediated through long-range interactions. 16Furthermore, these studies usually utilize classical molecular dynamics (MD) which typically limits one to studying only conformational changes.For chemical reactions involving bond forming and breaking, a quantum mechanical treatment is needed but can be computationally slow or prohibitive and the CVs can be unintuitive for complex reaction mechanisms.
Diffusion maps have been shown to be an effective tool for identifying good CVs and computing committor functions and have been widely used in many MD applications, 2,17−19 and relevant to the topic explored here, for searching for rare transitions within dynamical information. 2,19,20While the diffusion coordinates (DCs) obtained from the eigenvectors of the diffusion map operator have been proposed as a means to obtain CVs, their interpretation in terms of the potential energy surface (PES) can be unclear.Recently, Trstanova and co-workers have suggested that DCs obtained from a "local" diffusion map can be constructed from samples along a trajectory to identify high-quality CVs as long as these samples satisfy a so-called quasi-stationary distribution (QSD). 2 However, this new diffusion map technique was applied and tested for toy models and molecular systems that can be well described by classical force fields.Even so, the CVs and reaction coordinates for simple systems can often be chosen by intuition, and such new formulations have not been fully tested on chemically complex reactive chemistry.
In this work, we apply the diffusion map technique proposed in 2 to a hydrogen combustion benchmark dataset that contains ab initio MD (AIMD) trajectories and normal modes along minimum energy paths (MEPs) traversing through transition states between reactant and products. 1Unlike conformational coordinates for systems such as alanine dipeptide, the main reaction coordinates of the hydrogen combustion reaction channels are less obvious as bonds break and form.To the best of our knowledge, this is the first time this approach has been used to select CVs and identify the main reaction coordinate for realistic chemical systems in which quantum mechanical effects are important.We evaluate and compare CVs obtained from two commonly used approaches, i.e, internal coordinates (ICs) and principal component analysis (PCA) to identify potentially good CVs by computing the correlations between different CVs and the leading DCs obtained from a local diffusion map constructed from MD snapshots within a metastable region of the configuration space.We also examine the possibility of using diffusion maps to compute committor functions which describe the probability of reaching either of two local minima from a particular configuration using an ensemble of short trajectories. 21,22The committor function is of central importance because it generalizes the concept of a transition state by explicitly accounting for dynamics on a highdimensional PES.
We first show that the constant scaling parameter used to define the kernel matrix from which the DCs are derived from the local diffusion maps, which works well for configurational problems like alanine dipeptide, can be made more robust for chemically reactive systems by making the scaling parameter a configuration-dependent quantity.With this, our results show that for most test problems, the leading DCs obtained from a local diffusion map and QSD criteria enable us to identify good PCA-based candidate CVs.However, we also observe that not all IC-based CVs that are highly correlated with the DCs are important ones for the hydrogen combustion reactions, although ICs might be an intuitive choice.Using a global diffusion map constructed from MD snapshots initiated from the transition state, we observe that the value of the committor function is close to 0.5 at the transition state for all systems, which is expected.And while the committor function separates the reactant and product regions well when projected into a low-dimensional space spanned by previously identified good CVs, for systems with larger number of degrees of freedom such a separation becomes less clear, indicating the potential limitation of such analysis for large systems.
The paper is organized as follows.In Section 2.1, we review several techniques to construct a diffusion map.In Section 2.2, we introduce the mathematical description of the overdamped Langevin dynamics and the QSD which is the required distribution of molecular configurations used to construct an effective diffusion map.In Section 2.3, we describe an algorithm to compute committor probabilities based on the diffusion map.We review two commonly used approaches for constructing CVs in Section 2.4 and provide details on the hydrogen combustion dataset in Section 2.5.In Section 3 of results, we report the good CVs for all hydrogen combustion reactions identified by diffusion maps and validate such an assessment by examining the PES of four representative reactions in different pairs of CVs.We also show the committor functions obtained from global diffusion maps for four representative reactions and discuss how well they characterize the reaction mechanism of these reactions.Finally, we offer concluding remarks and future directions in Section 4.

Diffusion Coordinates and Diffusion Map
Construction.We consider an overdamped Langevin dynamics where x is a molecular configuration, U(x) is the potential energy at x that takes into account quantum mechanical forces among different atoms, β is the inverse temperature, and W t is a standard Wiener process.The infinitesimal generator of the Markov process x ( ) t t 0 associated with ( 1) is related to the Kolmogorov operator For example, for any smooth observable f : in L 2 (Ω), the expected value of f(x t ) along a trajectory governed by the overdamped Langevin dynamics (1), denoted by ( ) t , satisfies the backward Kolmogorov equation where the region Ω is compact, the elliptic operator L has a discrete set of eigenvalues λ j and the corresponding eigenfunctions ψ j (x).The solution to (3) can be expressed in terms of eigenpairs (λ j , ψ j (x)) of L, i.e., = = g x t b e x ( , ) ( ) where the coefficient b j is derived from the duality with the forward Kolmogorov operator 19 and the eigenvalues of L satisfy that 0 = λ 1 < λ 2 ≤ λ 3 ≤ ... The nonzero eigenvalues and the corresponding eigenfunctions of L evaluated at x can be used to define a set of DCs e x e x ( ( ), ( ), . . . ) As a result, the distance between x and y can be measured in terms of the L 2 norm of the DCs associated with x and y.This is often referred to as the diffusion distance D t (x,y) , where If there is a large spectral gap between the λ k+1 and λ k+2 , the diffusion coordinate ( 5) is dominated by the first k components x e x ( ( ), ( ), . . ., ( ), ( )) As a result, the diffusion distance D t (x,y) can be computed from (6) by keeping the first k terms.The use of a k-component diffusion coordinate for a relatively small k allows us to achieve significant dimension reduction.
However, finding the DCs in high dimensions by computing the eigenvalues and eigenvectors of a discretized (by, e.g., a finite-element method) L defined in (2) is computationally intractable in general.Alternatively, one can obtain approximate DCs (5) by constructing a transition probability matrix

Journal of Chemical Theory and Computation
from configurations sampled along a trajectory using a Gaussian kernel.This kernel is referred to as a diffusion map. 2,23In the following, we briefly describe the basic steps of a diffusion map construction and how DC can be obtained from a diffusion map.
The initial step in the construction of a diffusion map associated with the operator (2) is to build a kernel matrix from configurations { } = x i i n 1 sampled along a trajectory of an overdamped Langevin dynamics (1), which is of the form i k j j j j j j j y The local parameter ε scales as distances between the samples.The kernel matrix A ϵ can be further normalized by, for example, configuration-dependent densities or the row sums of the kernel matrix, to yield 19

[ ] = [ ]
where is the ith row sum of A and α ∈ [0, 1] is an appropriate chosen parameter.
To obtain a transition probability matrix P ε,α from A , , we scale it by a diagonal matrix D, i.e., where the ith diagonal entry of D equals the ith row sum of A , .Importantly, in the limit of n → ∞ and ε → 0 This expression allows us to construct an approximation to L from the transition probability matrix of sampled configurations along a Langevin trajectory. 25Therefore, we can obtain approximate DCs (7) by computing eigenvalues and eigenvectors of the matrix on the left-hand side of (11).The parameter α ∈ [0, 1] determines the type of the continuous operator on the right hand side of (11) in the limit n → ∞ and ϵ → 0. In particular, when = 1 2 , A , converges to the backward Kolmogorov operator that appears in (2) and (3) (see 23 ).
Several variations of the diffusion map have been developed to improve the effectiveness of dimension reduction through the use of DCs.For MD simulations, one can use the energybased kernel suggested in 19

[ ] = [ ] A
A e e where U(x) represents the energy of the configuration x and β is the inverse temperature.A generalization of this kernel was established and proven theoretically to ensure a small set of DCs can capture the underlying manifold in. 26Nevertheless, in situations where sampling is highly varied due to some structural instability, finding a proper scaling parameter ε in the kernel (8) can be difficult due to large variations in local scales associated with the sampling.In such situations, it may be better to define a kernel matrix with configurationdependent scaling parameters { } = i i n 1 instead of a single constant, 17,27 for example i k j j j j j j j y A simple rule of selecting those configuration-dependent scaling parameters is to use the nearest neighborhood criterion, i.e., the th smallest value of the distances where the parameter 0 < r < 1 determines the size of neighborhood.A range of procedures for selecting the local scales have been proposed. 27,28t an abstract level, the application of the normalization defined in (9) or (12) to the locally scaled kernel ( 13) can be viewed as a special case of the weighted kernel matrix 27 i k j j j j j j j y For example, the weighting factor used in ( 9) is 12).Due to the presence of configurationdependent local scale parameter ε i such as the one defined in (14), this formulation yields more robust diffusion maps than the ones constructed from a constant scale kernel (8).From our experiments with ab-initio MD and potential energy data, we found that the diffusion map constructed from the formulation (15) provided more reasonable information than diffusion maps defined in (9) or (12).

Local Diffusion Map and Quasi-Stationary Distribution.
Trajectories from the Langevin dynamics (1) tend to stay in a metastable region for a very long time before exiting the region.It has been suggested that DCs obtained from a "local" diffusion map constructed from samples along such a trajectory can be used to identify high-quality CVs as long as these samples satisfy a so-called QSD. 2,29y definition, a QSD for the stochastic process (1) is a probability measure ν on a metastable region Ω that satisfies for any initial configuration x 0 ∈ Ω and any smooth function f : d and the random variable T is the first exit time of the process x t , i.e., It follows from ( 16) that the expected value of f(x) in Ω with respect to the measure ν can be approximately obtained from the expected value of f(x t ) along a sufficiently long Langevin trajectory that stays within Ω.
A practical question we need to address in using samples along a Langevin trajectory to construct a local diffusion map is to ensure that these samples are within a metastable region Ω and satisfy a QSD.A practical procedure for achieving such a goal is based on the connection between the first eigenfunction u 1 of the Kolmogorov operator (eq 2) and ν(x) described by the equation 29

Journal of Chemical Theory and Computation
The expression given in (18) suggests that we can determine whether samples along a Langevin dynamics satisfy a QSD by monitoring the first eigenvalue of the operator L associated with the diffusion map constructed from these samples.When the eigenvalue does not change much, we can consider the samples to satisfy a QSD.In practice, we construct diffusion maps from a trajectory data iteratively and keep track of their first few eigenvalues instead of just the first one to determine whether samples along a trajectory satisfy the QSD.

Global Diffusion Map and Committor Function.
The diffusion map is an effective tool for computing the committor function. 2,30For two metastable regions A and B, the committor function assigns to snapshot x ∈ Ω the probability of a trajectory starting from x to reach B first rather than A, namely where the random variable τ A (τ B ) is the first time when a trajectory initialized at x 0 hits A (B).Note that q(x) = 0 if x ∈ A and 1 if x ∈ B. Ideally, as a trajectory gets closer to B from A, the committor function continuously increases from 0 to 1 and the isocommittor surfaces (q(x) = 0.5) can be used to define transition regions. 31n a global region Ω that includes A and B, the committor function can be interpreted as a solution to the backward Kolmogorov equation with a boundary condition where L is the operator defined in (2).To find an approximate solution, one can exploit the asymptotic property of the diffusion map (11) and can compute an approximate committor function q by solving the following system of linear equations 2,30 where c and b represent the set of indices for configurations in the complement of A ∪ B and in the metastable region B, respectively.Furthermore, if local distance scales are highly variable for given snapshots, one can use the kernel defined with different local scales (15) for solving the eq 21.

Dimension Reduction and the Choice of CVs.
There are many ways to choose CVs.In this work, we focus on two particular types of CVs, ICs and PCA, that are commonly used to define reaction coordinates of molecular systems.ICs refer to bond lengths, bond angles, and dihedral angles of a bonded molecule.Compared to Cartesian coordinates, ICs have the desirable advantage that they are invariant under an overall translation or rotation of the molecule which do not alter the potential energy of the molecule.Selected ICs are sometimes used as CVs.A well-known example is the use of two dihedral angles of an alanine dipeptide molecule as CVs to examine conformational changes of the molecule. 2,27These will be examined with respect to DCs in Section 3.
PCA is one of the most common and useful dimensionality reduction techniques to represent complex information as lowdimensional data.We apply the PCA to identify directions in which the data are varied most, [e.g., the projection of data onto the 1st principal component (PC) with the largest variance, the second largest variance in the second PC, and so on].Given , where N is the number of samples, the PCs are defined as the eigenvectors of the covariance of X or the right-singular vectors via the singular value decomposition where X̅ is the mean overall samples, column vectors of Y are the PCs and the diagonal matrix Σ contains the singular values of X − X̅ .Each configuration can be expanded as a linear combination of the PCs contained in Y.The coefficients of the jth PC (y j ) for all configurations contained in X can be obtained from If the first few singular values in Σ are much larger than other singular values, each configuration in X can be well represented by the first few PCs.The coefficients associated with these PCs can be used as CVs.
2.5.Hydrogen Combustion Dataset.We consider the effectiveness of using a local diffusion map to assess the quality of CVs obtained for several reactions in the hydrogen combustion benchmark data published in. 1 Table 1 lists all 19 reactions contained in the hydrogen combustion benchmark data. 1 The benchmark data contain molecular configurations sampled along an MEP for each reaction, including the reactant, transition, and product states.The dataset contain 290,000 potential energies and 1,270,000 forces for hydrogen compounds in different molecular configurations.These configurations are generated through normal mode sampling and AIMD at different temperatures beginning from different points along the 0 K intrinsic reaction coordinate (IRC) or a The number of atoms involved in each reaction, the total number of degrees of freedom (DoF) in Cartesian coordinates, and total number of degrees of freedom in ICs (DoF int .)

Journal of Chemical Theory and Computation
equivalently MEP, including the transition state.In regards the AIMD simulations for the 19 reaction channels, each reaction has 10,000 snapshots obtained at four different temperatures: 500, 1000, 2000, and 3000 K generated from running AIMDs using the Q-Chem software pacakage 32 with the transition state as the starting point.As reported in, 1 some of the reactions are relatively simple, e.g., reactions 5 and 6, involving the association and dissociation of two atoms along certain directions which constitutes the only degree of freedom; we will not examine this type of reaction in this paper.Among the 19 reaction channels, we focus on reactions 09, 11, 14, and 16 as representatives for association, O-transfer, H-transfer, and substitution, respectively.These reactions also include molecules with more atoms such that the number of degrees of freedom in ICs can be as large as 18.
We should also note that in, 1 the MEP is plotted in terms of two coordination numbers (CNs) that are considered as internal reaction coordinates (IRC).A CN is defined as the number of bonded or closest neighbors of a central atom or molecule of interest.In the hydrogen combustion study, 1 the ith atom in a molecule is defined as where r j is the distance between atom i and atom j and r 0,j is the equilibrium distance between atom i and j.Each Fermi-Dirac function in (24) is close to zero if r j − r 0,j is large and close to 1 if r j − r 0,j is small.Therefore, (24) effectively gives the number of neighboring atoms of atom i that are close to be in equilibrium positions.In, 1 one or two CNs were chosen as CVs that incorporated the IRC MEP that connects a local minimum of the PES (reactant) to another local minimum (product).
To generate additional samples near the reactant and product states (local minima), which are often available in a practical setting, we ran some additional MD simulations using QChem 32 starting from either the reactant or product state.We used density functional theory (DFT), specifically the ωB97X-V functional and the cc-pVTZ basis set to perform potential energy calculations and MD simulations.For all MD simulations, we performed Langevin dynamics at 300 K with a 0.12 femtosecond (fs) time step; we set the reactant provided in the IRC data 1 as the initial configuration of the dynamics.For reactions 11, 14, and 16, we generated 4000 snapshots.We could only generate 105 snapshots for reaction 09 before dissociation occurred.

Checking QSD for Local Diffusion Map Construction.
As we discussed in Section 2.1, a local diffusion map constructed from molecular configurations within a metastable region can be used to assess the quality of CVs as long as the configurations used to construct the diffusion map satisfies a QSD. 33ue to the connection between the distribution of configurations along an MD trajectory and the eigenfunction  associated with the first eigenvalue of corresponding Kolmogorov operator (2) as discussed in Section 2.1, we can check whether the sampled MD snapshots satisfy QSD by monitoring the spectrum of the diffusion map constructed from these snapshots.To be specific, every m = 100 MD steps, which constitutes an iteration of the QSD checking procedure, we construct a diffusion map based on the weighted kernel matrix (15) with weights being free energies and local scale parameters from (14) with r = 0.1, using samples along the trajectories available at that point and compute the first few eigenvalues of the corresponding approximate Kolmogorov operator L ε .These eigenvalues are shown in Figure 1 for reactions 11, 14, and 16.Because each MD trajectory consists of N = 4000 snapshots, N/m = 4000/100 = 40 iterations were performed to monitor the change in the eigenvalues of L ε .
Figure 1 shows that the first 5 eigenvalues of the diffusion map for reaction channels 11 and 16 do not change much during the entire simulation.For reaction channel 14, the dominant eigenvalues of the diffusion map begin to fluctuate around a few mean values, indicating that the subsequent trajectory samples no longer fulfill the QSD defined on the metastable region.Based on those measures, we pick the subset of the snapshots up to the point when the first 5 eigenvalues of the diffusion map begin to change more significantly.The configurations within such a subset are deemed to satisfy QSD.For reaction 9, because there are only 105 snapshots in the AIMD trajectory, we use all of them because they appear to be within a metastable region.

Implementation Details.
In all experiments, we used ICs of sampled configurations to construct local diffusion maps.For global diffusion maps used in Section 3.4 to perform committor analyses, we found that root mean-squared deviation aligned Cartesian coordinates, as defined in, 34 were sometimes more effective.When using ICs to construct a diffusion map kernel matrix, it is important to note that the absolute difference in two angles should never be larger than π when the difference of two ICs is used to evaluate a kernel matrix element defined in (8).For example, if θ 1 = π/6 and θ 2 = 2π − π/6, the absolute difference between θ 1 and θ 2 should be π/3 instead of 2π − π/3, i.e., when the difference between two angles exceeds π, we need to subtract 2π from the difference.Otherwise, the Euclidean difference between the two sets of ICs can be artificially increased by the extra π in the angle difference.This increase can yield distorted DC.
We perform PCA on sampled ICs.Ideally, we would like to use configurations along or close to the MEP because singular values resulting from PCA are likely to decrease faster and the dominant PCs obtained from such an analysis are likely to represent the main reaction mechanism well.For example, Figure 2 shows that the singular values obtained from the PCA performed on the configurations sampled along the MEP decreases much faster than those obtained from the configurations sampled within a metastable state of reaction 19.However, because the MEP is unknown in general, this approach is not practical.In the following, we use MD snapshots sampled within a metastable region to perform the PCA even though such an analysis is not optimal in the sense that more PCs may be required to capture the reaction mechanism.
In addition to using diffusion maps to access the quality of CVs, we validate the quality assessment by plotting the PES in these CVs to see if a saddle point can be observed on such a surface.The presence of a saddle point would indicate that the chosen CVs are good CVs for describing the reaction mechanism near the transition state.To plot the PES with respect to two CVs c 1 and c 2 , we evaluate the potential energy at where z ts denotes coordinates of the transition state and u 1 , u 2 are either two elementary basis vectors, i.e., columns of an identity matrix when ICs are chosen as the CVs, or two PCs when PCA-based CVs are to be examined.We plot the energies  We highlight some of the relatively high correlation coefficients in red.

Table 2 reports the correlations of the IC and PCA-based
CVs with the first two DCs (labelled as DC2 and DC3) for each representative reaction.In the construction of the diffusion map, we set r = 0.1 in (14), and use the kernel ( 9) for (a) and (b) (3 and 4 atoms) and the energy-based kernel (12) for (c) and (d) (5 and 6 atoms).We observe that for each reaction several ICs can be highly correlated with the same DC.For example, for reaction 9, the absolute values of the correlation coefficient between IC2 and DC2 and between IC3 and DC2 are over 0.9, respectively.For reaction 11, IC3 and IC6 are highly correlated with DC2.For reaction 16, IC4 and IC9 are highly correlated with DC2.For reaction 14, IC2, IC3, IC6, and IC7 are all highly correlated with DC2.By contrast, we find that each DC tends to be highly correlated with only one of the PCA-based CVs.
To validate the diffusion map-based assessment of CVs shown in Table 2, we plot PESs in different combinations of CVs.We evaluate the energy at each pair of CVs using the formula 26 where z st is the transition state specified in the hydrogen combustion dataset. 1 For PCA-based CVs, u 1 and u 2 are chosen to be the first and second PC vectors or the first and the third ones.For IC-based CVs, u 1 and u 2 are standard basis vectors, i.e., columns of an identity matrix, with the position of 1 indicating which IC is chosen.When the projected transition state corresponds to a saddle point on this PES, we consider the CVs in which the PES is shown as good CVs. Figure 3a,b shows that a saddle point can be found in the projected PES in both (PC1,PC2) and (PC1,PC3) for reaction 9. PC1 appears to define the main reaction coordinate for reaction 9 along which the barrier is located at the full transition state.This is consistent with the high correlation coefficient shown in Table 2. PC2 and PC3 define the direction orthogonal to the reaction coordinate along which the barrier is a local minimum.We also observe from Figure 3d that the main reaction coordinate appears to be along IC2 which corresponds to the O1−H bond length.This is consistent with the relatively high correlation coefficient between IC2 and DC2 in Table 2 although the correlation coefficients between IC3 (H−O1−O2 angle) and DC2 is slightly higher.A saddle point can clearly be seen in Figure 3d when the PES is plotted in terms of IC2 and IC3, whereas no such saddle point can be seen in Figure 3c where the PES is plotted in IC1 (O1−O2 bond length) and IC3 (H−O1−O2 angle).
Furthermore, we observe that the three components of the first PC, which correspond to the contribution of IC1, IC2, and IC3 to the PC vector, respectively, are −0.088,0.996, and 0.0142.This indicates that IC2 contributes the most to the first PC and consistent with the observation that the CVs defined in terms of PC1 and IC2 correspond to the same reaction coordinate.No clear saddle point can be seen in Figure 3c where the PES is plotted in terms of IC1 (the O−O bond distance) and IC3 (H−O1−O2 angle).This observation indicates that the O1-O2 bond distance is not a good CV for characterizing the reaction.This is also consistent with observation that IC1 has a relatively low correlation coefficient with DC2 and DC3.
For reaction 11, Table 2b shows that PC1 and PC2 are highly correlated with DC2 and DC3, respectively, indicating that they are good CVs.This observation is consistent with the PES shown in Figure 4a where a saddle point can be clearly seen at the transition state.Furthermore, it appears that the main reaction coordinate is mostly aligned with PC1, although it is not strictly parallel to PC1, and PC2 also contributes to the reaction coordinate.We see from Table 2b that the correlation coefficients between PC3 and DC2, and between PC3 and DC3 are relatively smaller, indicating that PC3 may not be a good CV.This is also evident from Figure 4b which shows that no clear saddle point can be seen from the PES plotted in PC1 and PC3.Table 2 also shows that both IC3 (H1−H2 bond length) and IC6 (H2−H1−O1−O2 dihedral angle) are highly correlated with DC2, and IC5 is moderately correlated with both DC2 and DC3.We see in Figure 4c that a saddle point is present in the PES plotted in IC3 and IC6, and the main reaction coordinate appears to be well described by IC3 (H1−H2 bond length), which is also the largest component in PC1 in magnitude.No clear saddle point is observed in Figure 4d where the PES is plotted in IC3 and IC5.These plots are consistent with the observed correlation coefficients between different ICs and DCs.They indicate that ICs that are highly correlated with DCs are indeed good CVs.
Table 2c shows that PC1 and PC3 are highly correlated with DC2 and DC3 for reaction 16, which is somewhat surprising.However, this may be explained by the observation that second and third singular values of the mean subtracted snapshot matrix are σ 2 = 2.45 and σ 3 = 1.55, which are very close.Both are an order of magnitude smaller than σ 1 = 24.28 and two to three times larger than σ 4 = 0.83. Figure 5b shows that PC1 and PC3 are indeed better CVs to describe the reaction mechanism.In fact, the main reaction coordinate seems to be along the PC1 direction.Such information cannot be obtained from the PES plotted in PC1 and PC2 shown in Figure 5a where no saddle point can be found.Table 2c also shows that IC4 (O1−H3 bond length), IC7(H3−O1−O2 torsion angle), and IC9 (H3−O1−O2-H2 dihedral angle) have relatively high correlations with either DC2 or DC3 indicating that they may be good CVs. Figure 5d confirms that the main reaction coordinate appears to be in the direction of IC4 (O1−H3 bond length) with some contribution from IC7 (H3−O1−O2 angle).A saddle point can clearly be identified at the transition state.On the contrary, no reaction mechanism can be inferred from the PES plotted in IC1 and IC3 in Figure 5c.Both IC1 and IC3 have low correlations with DC2 and DC3.
Finally, Table 2d shows that PC1 and PC2 are highly correlated with DC2 and DC3, respectively, for reaction 14, indicating that they may be good CVs.This prediction is confirmed in Figure 6a in which the PES is plotted in PC1 and PC2.A saddle point can be observed at the full-dimensional transition state.However, the main reaction coordinate does not appear to be aligned with either PC1 or PC2 but a combination of them.For this reaction, the first 4 singular values of the mean subtracted snapshot matrix are σ 1 = 9.41, σ 2 = 5.59, σ 3 = 3.99, and σ 4 = 2.26.Because σ 1 is not significantly larger than σ 2 , both PC1 and PC2 (and possibly PC3 also) are important in describing the main reaction coordinate.We see from Figure 6b that no saddle point can be clearly observed in the PES plotted in PC1 and PC3, which indicates that PC2 is a better CV than PC3, an observation that is consistent with the conclusion drawn from the correlation coefficients reported in Table 2.
Reaction 14 has many more degrees of freedom (12) compared to other reactions considered earlier.In Figure 6c, we plot the PES in IC4 (O1−O3 bond length), which is highly correlated with DC3, and IC7(H2−H1−O1 angle) which is highly correlated with DC2, as we can see from Table 2.A saddle point can clearly be seen at the transition state indicating that both IC4 and IC7 are good CVs as predicted by their high correlations with DC3 and DC2, respectively.The main reaction coordinate does not seem to be aligned with either one of them but a combination of the two.On the contrary, no saddle point can be observed in Figure 6d where the PES is plotted in IC12 (O4−O2−H2−O3 dihedral angle) and IC5 (O2−O4 bond length).Neither one of these ICs is highly correlated with DC2 or DC3 as we can see from Table 2d.

Committor Analysis.
We use the AIMD dataset in 1 to build global diffusion maps and compute committor functions as described in (21).In our experiments, we set r = 0.002 in (14) to determine local scale parameters and construct a global diffusion map from snapshots at a given temperature.We verified that values of the computed committor functions at transition states are close to 0.5.Figure 7a shows the computed committor function for reaction 9 evaluated at configurations along AIMD trajectories generated at T = 3000 K.The committor function is plotted in the plane formed by two ICs: IC2 (O1−H distance) and IC3 (H−O1−O2 angle) that are highly correlated with the first diffusion coordinate as reported in Table 2.The variation of the IC1 (O1−O2 distance) among all the snapshots is less than 0.16 Å; therefore, the 2D view of the committor function in Figure 7a fully captures the main features of committor function.We can clearly see that the values of the committor function are closer to 1 to the left of the transition state, while they are closer to 0 to the right of the transition state.
We also plot the projected locations of the reactant and product in this two-dimensional space and observe that they are in reactant and product regions that are clearly separated by the transition state.From this figure, we can clearly see that O1−H distance is globally the main reaction coordinate, which is consistent with the observation we made in Figure 3d in a local region around the transition state.We also see that the information provided in the committor function is consistent with the free energy surface plot shown in Figure 7b in which the configurations to the left of the transition state have much lower energies, whereas the configurations to the right of the transition state have slightly lower energies compared to the free energy at the transition state.
Figure 8a shows the committor functions in the PC1−PC2 plane at selected configurations along the AIMD trajectories of reaction 11 generated at T = 500 K.The configurations are selected by restricting the coefficients of the other four PCs (out of a total of six) to be close to those obtained at the transition state.We again observe that the committor function has similar behavior near the transition state as before, and that in this case, PC1 is the main reaction coordinate as is consistent with the PES plot as shown in Figure 4b.However, we also observe that away from the transition state, the main   2. We used eq 9 to construct the diffusion map with = 1 2 and computed the committor probabilities.

Journal of Chemical Theory and Computation
reaction coordinate is not completely determined by PC1 as observed when we plot the projected locations of the reactant and product in the PC1−PC2 plane, in which the reactant and product PC coefficients are not close to those at the transition state.Therefore, the plotted locations of the reactant and product in the PC1−PC2 plane do not completely describe their proximity to the transition state in this figure .Because the committor function is only evaluated at configurations along the AIMD trajectory, we do not have the committor function value at configurations in which the PC1 and PC2 coefficients are fixed at those associated with the reactant or product while the coefficients of all other PCs' are fixed at those associated with the transition state.
In the case of reaction channel 14, the selected configurations with ICs close to those at the transition state except IC4 and IC7 have committor function values close to zero to the upper right of the transition state and close to one to the lower left of the transition state in the IC4−IC7 plane as shown in Figure 9a.This is consistent with the PES around the transition state plotted in Figure 6b.Near the transition state, the main reaction coordinate appears to be mostly determined by IC7 with a small contribution from IC4.However, away from the transition state, the main reaction path is determined by a different linear combination of IC4 and IC7 depending on the location of the path.
Figure 10 shows the committor function and free energies of selected molecular configurations along the AIMD trajectories associated with reaction 16 at T = 1000 K.The configurations are selected to either have their ICs close to those associated with the transition state except IC4 and IC7 or have their PC coefficients close to those associated with the transition state except the coefficients of PC1 and PC3.The committor function and free energy are plotted in the plane of IC4 and IC7 and in the plane of PC1 and PC3 which are determined to be good CVs based on their correlation with the first two DCs as shown in Table 2.This is consistent with the corresponding potential energy plot near the transition state shown in Figure 5d although the number of configurations shown in these plots is limited.

CONCLUSIONS
Inspired by the recent work presented in, 2 we considered the use of diffusion maps to analyze several reaction channels involved in a hydrogen combustion system.For this purpose, the hydrogen combustion dataset is ideal for analyzing diffusion maps and committor functions as the molecular   2. Snapshots near the TS are thresholded with respect to the ICs so that the two-dimensional projection of snapshots is performed approximately.We used eq 12 to construct the diffusion map and computed the committor probabilities.For reaction 14, the committor value at the TS is close to 0.5 only at 500 K. species are relatively small, the energies (and forces) are generated from reliable DFT using the ωB97X-V DFT functional 35 with the cc-pVTZ basis set, and configurations near reaction barriers as well as configurations in metastable regions are well sampled for a variety of reaction channels.In particular, we use local diffusion maps constructed from configurations sampled along AIMD trajectories within the metastable region to identify and assess CVs obtained from ICs and PCA for a chemically reactive system.Unlike a system such as alanine dipeptide for which we can use a constant scaling factor in the construction of a diffusion map kernel, we found that configuration-dependent scaling was important in the construction of local diffusion maps for reactive systems such as hydrogen combustion.
We found that the correlations between the first few PCs and the first two DCs tend to be high.The two PCs that have the highest correlations with respect to the first two DCs were found to be good CVs for characterizing the reaction path near the transition state and validated by the presence of a saddle point on the PES.Although diffusion maps are in general considered as nonlinear dimension reduction techniques, whereas PCA is a linear dimension reduction technique, in a local diffusion map the leading DCs can correlate well with the leading PCs as can be seen in Figures 3−6, and also indicated by the fast decay of the singular values.However, globally, the leading PCs from a single PCA may not be able to fully characterize the entire reaction path, and even a local diffusion map may not be sufficient to characterize the main reaction coordinate.We also observed that, while several ICs can be highly correlated with the first two DCs, they are not uniformly all good CVs.Nonetheless, the use of diffusion maps allows us to narrow down the IC choices and provides some alternatives that we otherwise would not have considered.
Using AIMD trajectories initiated from the transition states, we also constructed global diffusion maps that can be used to compute approximations to the committor functions associated with different reaction channels.When we examined the committor function in the plane of two CVs identified by a diffusion map, we found that for all reaction channels, the value of the committor function at the transition state is close to 0.5.For reaction channels that contain only a few degrees of freedom, the reactant and product are clearly separated by the selected CVs, and the main reaction path characterized by the committor function agrees well with that identified from the free energy surface in the selected CVs.However, for reaction channels that involve more degrees of freedom, the global reaction coordinate appears to be more complicated, and the reactant and product may not be easily separated on a two-  2. We used eq 12 to construct the diffusion map and computed the committor probabilities.
dimensional slice of committor function defined in a highdimensional space.Even so, insights into the CVs provided by the diffusion maps have proved useful enough such that we will report the transition state free energies found for all hydrogen combustion reaction channels in a related publication.Finally, the results presented in this work provide an important baseline for comparison for future work.

Figure 1 .
Figure 1.The x-axis and the y-axis represent the QSD checking iteration number and the amplitudes of eigenvalues of diffusion maps, defined in (7), respectively.The changes of the first few dominant eigenvalues of the diffusion maps with respect to the QSD checking iteration number are shown for reactions 11 (left), 14 (middle), and 16 (right).The diffusion maps are constructed every m = 100 AIMD steps of a 4000-step trajectory from snapshots sampled along the trajectory.

Figure 2 .
Figure 2. The x-axis and the y-axis represent the index and the magnitude of singular values (in a log scale) obtained from (22), respectively.Normalized singular values of two matrices containing configurations sampled within a metastable region (blue) and along the MEP (red) of reaction 19 are shown.The normalization is performed by dividing all singular values by the largest singular value.

3 .
Assessing the Quality of CVs.We construct the diffusion map and use it to obtain two DCs: DC2 and DC3.We use these DCs as a means to assess different definitions of CVs by examining the correlation of CVs with the DCs.To be specific, we compute the Pearson correlation coefficient between CVs and the DCs as follows where cov(C, D) denotes the covariance between random variables C and D, and σ C (σ D ) denotes the standard deviation of C (D).If C or D is chosen as an IC, it can be easily obtained directly from the Cartesian coordinates of the snapshots used to construct the diffusion map.For the PCA, we use the variable in eq 23.

Figure 3 .
Figure 3. PESs of reaction 9 in pairs of PCs and pairs of ICs show relatively high (yellow) and low (blue) energy regions.The unit of energy is kcal mol −1 .In the lower panels, the unit of the angle H−O1−O2 is radian and that of the bond length is Å.

Figure 4 .
Figure 4. PESs of reaction 11 in pairs of PCs and pairs of ICs show relatively high (yellow) and low (blue) energy regions.The unit of energy is kcal mol −1 .In the lower panels, the unit of angles is radian and that of the bond length is Å.

Figure 5 .
Figure 5. PESs of reaction 16 in pairs of PCs and pairs of ICs show relatively high (yellow) and low (blue) energy regions.The unit of energy is kcal mol −1 .In the lower panels, the unit of angles is radian and that of the bond length is Å.

Figure 6 .
Figure 6.PESs of reaction 14 in pairs PCs and pairs of ICs show relatively high (yellow) and low (blue) energy regions.The unit of energy is kcal mol −1 .In the lower panels, the unit of angles is radian and that of the bond length is Å.

Figure 7 .
Figure 7. Committor analysis for reaction 9.The unit of angle H−O1−O2 is radian and that of bond length O1−H is Å.(a) Committor probabilities and (b) free energies at 3000 K are plotted in the space of (IC2, IC3) listed in Table2.We used eq 9 to construct the diffusion map with =

Figure 8 . 1 2
Figure 8. Committor analysis for reaction 11.(a) Committor probabilities and (b) free energies at 500 K are plotted in the space of (PC1, PC2).We used eq 9 to construct the diffusion map with = 1 2 and computed the committor probabilities.

Figure 9 .
Figure 9. Committor analysis for reaction 14.The unit of angle H2−H1−O1 is radian and that of bond length H1−O3 is Å.(a) Committor probabilities and (b) free energies are shown in the space of IC4 and IC7) listed in Table2.Snapshots near the TS are thresholded with respect to the ICs so that the two-dimensional projection of snapshots is performed approximately.We used eq 12 to construct the diffusion map and computed the committor probabilities.For reaction 14, the committor value at the TS is close to 0.5 only at 500 K.

Figure 10 .
Figure 10.Committor analysis for reaction 16.The unit of angle H3-O1-O2 is radian and that of bond length O1−H3 is Å. (a,c) Committor probabilities and (b,d) free energies at 1000 K are projected onto the spaces of (IC4, IC7) and (PC1, PC3) listed in Table2.We used eq 12 to construct the diffusion map and computed the committor probabilities.

Table 1 .
19 Reactions Contained in the Hydrogen Combustion Benchmark Dataset a

Table 2 .
Absolute Values of Correlation Coefficients between ICs and PCA-based CVs and DC2, DC3 for Representative Hydrogen Combustion Reaction Channels 9, 11, 14, and 16 a