Exploring Carbamazepine Polymorph Crystal Growth in Water by Enhanced Sampling Simulations

In this work, the polymorphism of the active pharmaceutical ingredient carbamazepine (CBZ) was investigated by using molecular dynamics simulations with an enhanced sampling scheme. A single molecule of CBZ attaching to flat surfaces of different polymorphs was used as a model for secondary nucleation in water. A novel approach was developed to compute the free energy profile characterizing the adsorption of molecules with orientation aligned with the crystal structure of the surface. The distribution of states that showed alignment was used to rescale the adsorption free energy to include only the contribution that is consistent with crystal growth. The resulting free energy surfaces showed favorable thermodynamics for the most stable form, Form III and the second most stable form, Form I. The primary crystallization product, a dihydrate, was found to be less favorable, implying a nonclassical crystallization pathway. We suggest that a major contribution determining the energetics is the hydrophobicity of the surface. This thermodynamic ranking provides valuable information about the molecular pathways of polymorph growth and will further contribute to the understanding of the crystallization process of CBZ, which is imperative since polymorph formation can alter the physical properties of a drug significantly.


Dimerization in water
The fundamental unit of all anhydrous polymorphs of CBZ is the hydrogen-bonded dimer, in which the amide groups of both monomers are hydrogen-bonded in a ring motif (cf. Figure S1 (a)), but other patterns are seen in some of the forms.In solution, it can form dimers of solutes in different ways depending on the solvent, both through interactions of the amide group by hydrogen bonding and the weak interactions of the aromatic rings.As shown by Hunter et al. 1 in solutions of deuterated methanol -a polar solvent -CBZ dimerizes in a way that the contact of the non-polar rings with the water is minimized.In deuterated methyl chloride instead, the dimer mentioned in the previous paragraph appears, since the formation of hydrogen bonds between the CBZ molecule is now a more favorable interaction.To determine whether the same occurs under the conditions used in our systems, a simulation in water at a supersaturated concentration was run.
This simulation did not show the dimer as in Figure S1 (a), there were hydrogen bonded configurations, but none of them exhibited the ring motif, there were only single hydrogen bonds.As seen in Figure S2, the center-of-mass distance is also much lower than the average distance of the amide groups to each other, which implies that if a dimer is formed, it is by the interaction of the non-polar ring system, which is expected since water is a polar solvent, similar to methanol, and disrupts the formation of hydrogen bonds between monomers.
Indeed, upon visual inspection a π-stacked dimer is seen that resembles the pattern in Form

Testing of forcefields
Two conventional force fields, OPLS-AA (Optimized Potentials for Liquid Simulations) 2 and the generalized Amber force field (GAFF) 3 were tested using different charge calculation schemes (AM1BCC and RESP) before choosing the most adequate one.The molecules in both the solution and the solid form were investigated and the performance of the force field was assessed by computing density, the partition coefficient logP, and the energy of formation.
The RESP charges were obtained from the electrostatic potential calculated in Gaussian 16 4 with a HF/6-31G* basis set for a molecule in a dielectric continuum representing the water environment (ϵ = 78.3553),and exported to Gromacs input with the ACPYPE utility. 5

Partition coefficient logP
The partition coefficient log P ow for CBZ was calculated using where ∆G w is the solvation free energy in water, ∆G o the one in octanol, R the universal gas constant (8.31446261815J •mol −1 • K −1 ), T the temperature (300 K) and e Euler's number. 6FF/RESP GAFF/AM1BCC OPLS 0 Since the solubility of water in octanol is non-negligible, 7 the solvation free energy in octanol was calculated both for the pure compound ('dry octanol') and for octanol with 27 mol% water ('wet octanol').Parameters and coordinate files for both dry and wet octanol were taken from the MDPOW package (https://github.com/Becksteinlab/MDPOW)for OPLS, which uses refined parameters for octanol developed by Zangi.8 For GAFF, all topologies were generated using antechamber.9 Results are plotted in Figure S3. Fr the GAFF/RESP calculations, the RESP charges of CBZ were calculated both in water as described before and in octanol using a dielectric continuum representing n-octanol (ϵ = 9.8629) and used for the calculations in the respective solvents.The charges obtained from the octanol environment were used both for dry and wet octanol.Values show good agreement with experimental data for GAFF/AM1BCC and OPLS, slightly less so for GAFF/RESP.
The latter is also much more sentitive to the choice of solvent, which makes sense considering the charge calculation.However, the calculated value for GAFF/RESP in wet octanol still comes reasonably close to the experimental value.

Density in the solid form
The densities of the four anhydrous forms of CBZ were calculated from simulations in the crystal phase.Unit cells for all forms were obtained from the Cambridge Structural Database (CSD) and were then replicated in a modified Atomic Simulation Environment (ASE) 10 script, yielding simulation boxes that contained between 216 and 432 residues.For Form II, hydrogen atoms had to be added in OLEX2 11 using the command hadd.Each form was first minimized using a steepest descent algorithm and equilibrated in a NVT ensemble for 100 ps using the velocity rescaling thermostat. 12They were then run in a NPT ensemble with anisotropic pressure scaling for 1 ns, using the the same thermostat as before and the Berendsen barostat. 13A change in box size was observed upon switching to anisotropic scaling, however, the fluctuations during the simulations were small (<= 5%, see table S1 for GAFF/RESP), so the crystals are stable in simulation.The average density of each system in anisotropic pressure scaling was calculated, the results are plotted in Figure S4.
The general trend in all force fields is the same: Forms I and III show higher densities (> 1.3 g • cm −3 ), while both Form II and Form IV lie below or around 1.24 g •cm −3 .This makes sense looking at the crystal packing in I and III vs. in II and IV.Both of the latter show gaps in the unit cell, Form II in along c and Form IV along a.In contrast, Form I and lit.
GAFF/RESP GAFF/AM1BCC OPLS 1.20 Form III show a more compact and efficient packing, which makes higher densities plausible.
This trend is also seen in the colored 'x' markers, which are the densities as obtained from the CIF files of each respective form.The best agreement with the CIF numbers is seen for Form II and Form III, but overall, the densities are reproduced within reasonable limits and especially the GAFF/RESP force field performs well.

Energy of formation
A last test was to evaluate the energy of formation in GAFF/RESP and compare it to experimental data on the thermodynamic stability as given by Grzesiak et al.The energy of formation is calculated as the potential energy difference between the energy per molecule in the (minimized) solid and the energy of an isolated molecule in vacuum as Grzesiak et al. report thermodynamic stabilities in the order III > I > IV > II. 14The energies of formation calculated from the minimizations of the solids were −123.for Form IV, which puts them in the order III > I > II > IV.It seems therefore that GAFF/RESP reproduces the experimental order reasonably well.

Setup of slabs
The crystal structures of the relevant surfaces are shown in fig.sS5-S8.In all simulations, z is defined as the component that is perpendicular to the surface.In Form I this is the x component (100 surface), since it has been reported to be the main direction of crystal growth in Form I upon sublimation. 15The 100 surface of Form I and the 001 surface of Form II show the same type of stacking of molecules and both forms show needle-like crystals, so it can be inferred that crystal growth in Form II will proceed along z, thus this direction was chosen.In Form III and IV the y component (010 surface) was originally chosen due to the orientation of the surface, since the molecules are mostly flat on these surfaces, but more were added for Form III due to the reported prismatic morphology of the crystals.The 001 direction was chosen for the dihydrate since it has been reported to be the main direction of crystal growth in this form. 16abs of the crystal forms were created using the same unit cells as before, replicating them in the modified ASE script and adding water on top of the slabs using the gmx solvate utility of Gromacs. 17The 011 and 102 surfaces of Form III were cut manually and the resulting slab was rotated using gmx editconf to have z perpendicular to the surface.Since this rotation can introduce small misalignments in the crystal, they were then equilibrated for 1 ns in a NPT simulation, before adding water on top and equilibrating the full system.

Stability of slabs
To evaluate the stability of the unrestrained slabs in water, each surface slab was run for 300 ns in isotropic NPT (v-rescale thermostat, Parrinello-Rahman barostat 18 ) at 300 K and 1 bar, starting from the same box as for the metadynamics run, but without the additional molecule in solution.The box was equilibrated until the water density is close to the limit for TIP3P and the surfaces were inspected visually before and after the simulation.developing a step in the intact crystal structure.However, the crystal lattice is still visibly preserved in all surfaces, for the 010 surface of Form III, Form II and the dihydrate, the disorder at the surface is even minimal.

Convergence and sampling of metadynamics simulations
Convergence of the metadynamics simulations is evaluated by comparing the negative of the stored grid (i.e. the applied bias averaged over the simulation) to the potential of mean force (PMF) obtained from force integration over the trajectory.Good matching of the two is seen for all forms, with minor deviations (see S18-S20).The completeness of the sampling is assessed by the coverage of the collective variable space during the simulation, i.e. how much of the surface separation distance is covered.All forms show good sampling between −0.5 and 1.5 nm, where the upper and lower walls were applied.Form I does not go below 0 nm, since the SSD is not defined as in the other forms, but as the minimum distance of the solute center of mass to any atom on the surface.

Sampling of the accessible configuration space
To be able to compare between different simulations, one needs to ensure that the full volume of the solution region is sampled, i.e. the full configuration volume in the 'unbound' state.
The good sampling of the SSD and the PMF approaching a constant value at larger distances is an indicator that the volume has been sampled adequately, but it was also calculated explicitly by computing the solvent-excluded surface of a PDB file consisting of all solute positions over the course of the simulation (more exactly all its center-of-mass positions, which were approximated by choosing the position of the N1 atom).This procedure is analogous to the calculation of the unbound volume in a paper by Bertazzo et al., 19 and was performed using the NanoShaper software. 20A key difference is that Bertazzo et al.only calculate the volume of the states defined as unbound and use this to correct their free energy of binding and obtain a standard free energy.In our case, we are calculating the full volume covered by the solute over the course of the simulation as a proxy for the coverage of the 'unbound' state, since we only want to ensure the correct entropy contribution in solution.
The calculated volume was then compared to the available volume between the surface of the slab and the upper wall at 1.5 nm, correcting for the van der Waals radius of the nitrogen atom (about 1.8 Å).However, since the wall is a restraint and not a hard constraint, larger SSD values are possible, so the volume between the surface and the maximum SSD value was also calculated, again corrected for the van der Waals radius of the nitrogen atom.
In principle, the available volume should be slightly higher than this, since the solute is not restrained at the surface, but 0.5 nm below it (lower wall), but most of the volume between the surface and the lower wall is already covered by molecules of the slab, so the contribution of this volume to the total volume is minor (depending on the structure of the slab, diffusion into the crystal structure is possible however).Table S2 shows the comparison of the calculated SES volume (V SES ), the volume between surface and upper wall (V s−uw ) and the volume between the surface and the maximum SSD value (V s−mcv ).Figures S21-S29 show the calculated volume compared to the positions of surface and upper/lower walls in the simulation box.All results indicate a good coverage of the available volume, meaning the configuration space in solution is sampled fully and the correct entropy contribution is obtained.
Table S2: Solvent-excluded surface volume of the N1 atom of the solute during the simulation and available volume.V s−uw is the volume found between the surface and the upper wall (1.5 nm from the surface) and V s−mcv is the volume between the surface and the maximum SSD (both are corrected for the van der Waals radius of N1).All values are given in Å3 .

Form
Surface

Finding states
The criteria to determine whether a state falls into crystal growth or not are described in the method section of the main paper.The distributions of the angles between the sixmembered rings and the center-of-mass distances of those chosen from the angle distribution of the bound state are discussed there as well.Table S3 shows supplementary information for these states regarding the frequency of correct angle alignment and correct overall alignment in the bound data set.

Distribution of states and rescaled PMF integration
In fig.sS30 to S38, the distributions over the SSD of the found crystal growth states in the bound data set are shown, together with the rescaled PMF calculated from them.The part of the PMF used for integration of the free energy is marked as filled.As described in the methods section, the free energy change in the rescaled PMF is calculated from where the index b iterates over the part where the rescaled PMF is below zero and Z free is the free volume.The error estimate is obtained by integrating first over W r + ∆W r , then over W r − ∆W r and estimating the total error as the difference between the two divided by four (max-min/4).fa;skdfaklsdjf;laksjdf

Alternate states
As described in the method section of the main paper, there are three distinct ways the molecule can be oriented at the surface (cf.fig.S42).
(a  configuration is simply regarded as orientation (iii), since it does not matter what orientation the amide has in this one, since it is not in the vicinity of another amide group.
Form I shows most states in full alignment (bottom left corner), with a minor contribution from state (ii), where the amide group is rotated (bottom right).Very few states are in an orientation where the ring system is rotated (upper half of the plot) and both the (ii) and (iii) peaks are very scattered.Form II shows an almost even partitioning of states between the different orientations, with slightly more in orientation (iii).All peaks are quite sharp,         (iii), although some states do appear in full alignment.All peaks show a higher spread than in the 011 surface.The 100 surface of Form III has most states in either full alignment or   with the amide group rotated (ii).Few states are found in orientation (iii) in the upper half of the plot.All peaks show some spread in both directions of the axes.For the 010 surface, and orientation (iii), with the full alignment slightly favored.The peaks show some spread in y direction (C7-N1 vector), but not much in x (C15-O1 vector).
The 001 surface of Form III on the other hand shows almost no states with the ring system rotated (iii), but most of them in full alignment or with the amide group rotated (ii), again favoring the former slightly.The spread of peaks is low.Form IV shows a preference for fully aligned states, with only very minor contributions from (ii) and (iii), but the spread of peaks is large, indicating that the solute has a lot more conformational flexibility.In the dihydrate (DH), orientation (ii) seems to be favored, with a second peak showing for the full alignment.The spread of peaks is moderate.

Figure S1 :
Figure S1: Hydrogen-bonded dimer/fundamental unit (a), π-stacked dimer found in Forms III and IV (b) and dimer motif in the supersaturated solution (c).

Figure S2 :
Figure S2: Distribution of distances in the supersaturated solution of CBZ in water.O1 and N2 are the names of the heavy atoms involved in hydrogen bonding.

Figure S3 :
FigureS3: Matching of logP data from the simulation and experimental logP for CBZ (logP = 2.45).Calculated results were obtained both from pure octanol (dry octanol) and octanol with 27 mol% water (wet octanol).
force fields were the same as in the simulations of CBZ in water.

Figure S5 :
Figure S5: Surfaces for Forms I and II of CBZ (top-down view).

(a)
Figure S6: (010) surface for Forms III (side view) and IV (top-down view) of CBZ.

Figure S8 :
Figure S8: Additional surfaces modeled for Form III of CBZ (side view).

Figure S9 :
Figure S9: Snapshots of Form I surface (100) before (a) and after (b) the NPT simulation.The periodic image in the x direction is shown on top for both pictures.
Figure S10: Snapshots of Form II surface (001) before (a) and after (b) the NPT simulation.The periodic image in the z direction is shown on top for both pictures.
Figure S11: Snapshots of the 011 surface of Form III before (a) and after (b) the NPT simulation.The periodic image in the z direction is shown on top for both pictures.
Figure S12: Snapshots of the 102 surface of Form III before (a) and after (b) the NPT simulation.The periodic image in the z direction is shown on top for both pictures.
Figure S13: Snapshots of the 100 surface of Form III before (a) and after (b) the NPT simulation.The periodic image in the x direction is shown on top for both pictures.
Figure S14: Snapshots of the 010 surface of Form III before (a) and after (b) the NPT simulation.The periodic image in the y direction is shown on top for both pictures.
Figure S15: Snapshots of the 001 surface of Form III before (a) and after (b) the NPT simulation.The periodic image in the z direction is shown on top for both pictures.
(a) before NPT (b) after NPT

Figure S16 :
Figure S16: Snapshots of Form IV surface (010) before (a) and after (b) the NPT simulation.The periodic image in the y direction is shown on top for both pictures.
Figure S17: Snapshots of the dihydrate surface (001) before (a) and after (b) the NPT simulation.The periodic image in the z direction is shown on top for both pictures.

Figure S18 :
Figure S18: CV sampling and matching of grid bias stored by PLUMED (light blue) and PMF obtained by force integration (dark blue) for Form I (a,b), Form II (c, d) and the 011 surface of Form III (e, f).

Figure S19 :
Figure S19: CV sampling and matching of grid bias stored by PLUMED (light blue) and PMF obtained by force integration (dark blue) for the 102 (a, b), the 100 (c, d) and the 010 (e, f) surfaces of Form III.

Figure S20 :
Figure S20: CV sampling and matching of grid bias stored by PLUMED (light blue) and PMF obtained by force integration (dark blue) for the 001 surface of Form III (a, b), Form IV (c, d) and the dihydrate (e, f).

Figure S21 :
Figure S21: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form I slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S22 :
Figure S22: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form I slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S23 :
Figure S23: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form III 100 slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S24 :
Figure S24: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form III 010 slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S25 :
Figure S25: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form III 001 slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S26 :
Figure S26: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form III 011 slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S27 :
Figure S27: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form III 102 slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S28 :
Figure S28: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the Form IV slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S29 :
Figure S29: Solvent-excluded surface (pink) of the positions of the solute center of mass during the simulation for the dihydrate slab (one periodic copy shown).Black indicates the position of the surface (middle) and the lower and upper wall (bottom and top).

Figure S30 :Figure S31 :
Figure S30: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form I.

Figure S32 :
Figure S32: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form III 011.

Figure S33 :
Figure S33: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form III 102.

Figure S34 :
Figure S34: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form III 100.

Figure S35 :
Figure S35: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form III 010.

Figure S36 :
Figure S36: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form III 001.

Figure S37 :
Figure S37: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in Form IV.

Figure S38 :Figure S39 :
Figure S38: Distribution of found crystal growth states over the collective variable (a) and integration over rescaled PMF (b) in DH.

Figure S40 :
Figure S40: Discarded 'satellite' states from the distribution of center-of-mass distances for all states with fully aligned angles in the bound data set for (a) the 001 surface of Form III and (b) the 102 surface of Form III.Note that the solute, marked in red, should align with the grey molecule at the bottom.

Figure S41 :
Figure S41: States from the two secondary maxima in Form IV.In both (a) and (b), the solute in red shows good alignment with the grey molecules in the repeating layers of the slab, so these were counted as aligned.

Figure S42 :
Figure S42:The three orientations of the solute molecule at the surface in alignment with the crystal structure: (i) full alignment of all atoms in the molecule of the slab to the solute, (ii) full alignment of the ring system, but the amide group in the solute is rotated 180 • with respect to the amide group at the surface, (iii) the ring system in the solute is rotated 180 • with respect to the ring system in the slab.

Figure S43 :Figure S44 :
Figure S43: Additional vectors evaluated to characterize the orientation of the solute molecule on the surface, here shown in full alignment.(a) C7-N1 vector used to identify whether the ring system is rotated (b) C15-O1 vector used to identify the orientation of the amide group.

Figure S45 :
Figure S45: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form II.

Figure S46 :
Figure S46: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form III 011.

Figure S47 :
Figure S47: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form III 102.

Figure S48 :
Figure S48: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form III 100.

Figure S49 :
Figure S49: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form III 010.

Figure S50 :
Figure S50: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form III 001.

Figure S51 :
Figure S51: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for Form IV.

Figure S52 :
Figure S52: Probability distribution of aligned states with respect to the vectors defined in fig.S43 for DH.

Table S1 :
Change in box vectors and angles in % from NVT to anisotropic NPT in GAFF/RESP.

Table S3 :
Results for crystal growth states in the bound data set for all forms.