Basicities of Strong Bases in Water : A Computational Study †

Aqueous pKa values of strong organic bases – DBU, TBD, MTBD, different phosphazene bases, etc – were computed with CPCM, SMD and COSMO-RS approaches. Explicit solvent molecules were not used. Direct computations and computations with reference pKa values were used. The latter were of two types: (1) reliable experimental aqueous pKa value of a reference base with structure similar to the investigated base or (2) reliable experimental pKa value in acetonitrile of the investigated base itself. The correlations of experimental and computational values demonstrate that direct computations do not yield pKa predictions with useful accuracy: mean unsigned errors (MUE) of several pKa units were observed. Computations with reference bases lead to MUE below 1 pKa unit and are useful for predictions. Recommended aqueous pKa values are proposed for all investigated bases taking into account all available information: experimental pKa values in acetonitrile and water (if available), computational pKa values, common chemical knowledge.


INTRODUCTION
−9 A core characteristic of a base B is its basicity, referring to the following equation a BH S B HS and expressed as the pK a value of its conjugate acid HB + : 10 The pK a values are different in different solvents.Out of all possible solvents used water is by far the most important and basicity data in water are important for several reasons.Firstly, many of these bases are also used in water.Secondly, water is the "championsolvent" by the availability of pK a data of medium strength bases.An as diverse as possible range of bases with available pK a values in any one solvent is very useful for development of different prediction and com-putation methods, such as e.g.QSAR.The pK a data of strong bases are currently scarce in water, so, additional data would be very welcome.Thirdly, for any base, especially the well-known ones, it is beneficial to know its basicity in the most important solvents and water certainly is one of those.
There are significant gaps in our knowledge concerning the basicity of strong and superstrong bases in water.Strong bases like phosphazenes, amidines, etc are difficult to study in aqueous solution due to low solubility of the nonpolar compounds and their very high basicity.Relatively high acidity (proton donicity) of water results in levelling effect.The bases with pK a in water higher than ca 13, are all almost fully protonated in water, even if their basicities actually differ by orders of magnitude.
A correlation equation was proposed by Kaljurand et al. 11  Equation 4 gives some insight into the basicities of strong bases in water.For example in the same paper 11 pK a values of DBU and t-BuP 1 (pyrr) in MeCN are measured as 24.34 and 28.42 respectively.Using equation 4 gives the corresponding pK a values in water as 14.6 and 17.7.These estimated aqueous pK a values can only be considered very approximate because of the quite high scatter of points around the regression line in Eq. 3 and, especially, because the highest aqueous pK a value used in regression analysis was that of phenyltetramethylgua-nidine, leading to a strong extrapolation.Nevertheless, these estimates show that the superbasic region lies mainly around and above the basicity of the hydroxide ion (pK a of H 2 O in water can be calculated to be 15.74 using equations 1 and 2 as well as the autoprotolysis constant of water 12 K w = 10 −14 ).Experimental measurements in that region are very difficult and need several approximations like measuring in solution with high concentrations of alkali hydroxide or in mixed solvents. 13Another issue is the low solubility of many strong bases in water.Surfactants have been used for overcoming this problem, 14 altering somewhat the properties of the solvent.Given these difficulties it is unlikely that accurate aqueous pK a values can be measured for strong bases unless a breakthrough in pK a measurement methods is made.
In this paper we use computational methods as well as available experimental data for obtaining estimates of reasonable reliability for the aqueous pK a values of a series of strong and very strong neutral organic bases.Among others the following are included: DBU, TBD, MTBD, TMG, t-BuP 1 (pyrr), t-BuP 1 (dma), EtP 2 (dma), t-BuP 4 (dma) (see Figure 1 for base structures).
Computational methods are free of the abovementioned problems: compounds with low solubility and high basicity can be studied.During the last decade continuum solvation models (CSM) 15 became an important tool for addressing the solvation phenomena, enabling researchers to establish Gibbs free energies of solvation and calculate pK a values with reasonable accuracy.Just few examples: substituted phenols and carboxylic acids in water 16 using CPCM, 17 substituted phenols in dimethyl sulfoxide and acetonitrile 18 using IEF-PCM, 19 and various CH and NH superacids in 1,2dichloroethane 23 using SMD, 24 as well as guanidinebased superbases in acetonitrile using IPCM 20 and amines in aqueous solution using SVPE, 21 PCM, 21 IEF-PCM, 21 CPCM, 22 SMD 22 and SM8 22 methods.The results indicate that the accuracy of CSM-based pK a predictions is often in the range of 0.4−0.7 pK a units, although sometimes worse accuracy is observed. 21The results of Liptak and Shields 16 are especially encouraging.They demonstrated that even pure continuum approach could be a method of choice when modeling solvation in water, the difficult solvent that typically implies the so-called cluster -continuum representation 25 due to strong specific solvation and short range solute -solvent interactions.
Eckert et al. 26 applied the COSMO-RS procedure 27 combining polarized continuum theory with a statistical thermodynamics treatment to calculate pK a values for the different classes of organic acids in acetonitrile.The method predicts pK a values of substituted phenols in MeCN with the MUE of 0.8 pK a units.Similar MUE was later found by Heldebrant et al. for carboxylic acids. 28Klamt, et al. 29 used COSMO-RS to predict pK a values of organic and inorganic acids in water.The error of 0.5 pK a units is reported to measure RMS deviation between pK a estimates from linear regression and corresponding experimental values.For bases of low and medium strength in aqueous solution a RMS accuracy of 0.66 pK a units was reported. 30or computing aqueous pK a values a special care should be taken in choosing the appropriate thermodynamic cycle.In the recent report 31 Ho and Coote explored different pK a calculation strategies and arrived at the conclusion that direct thermodynamic cycle involving deprotonation equilibrium is generally unsuitable for pK a calculations in water.In contrast, the proton exchange scheme using an acid with established pK a value as a reference yielded reasonably accurate results and, therefore, should be considered as a more viable alternative.
The purpose of the present study is a prediction of aqueous pK a values for a number of strong guanidine and phosphazene bases using popular COSMO-RS, CPCM and SMD protocols and different thermodynamic cycles.

EXPERIMENTAL Computational Methods
pK a computations with the COSMO-RS approach 32 were done similarly as in Ref. 33 using Turbomole 34 version 6.5 and COSMOthermX 35 version C30 with parametrization 1401.The two-step COSMO-RS computation protocol 32 was used.COSMO BP/TZVP geometry optimizations within RI approximation were carried out first in the conductor limit with Turbomole software package 34 for the studied base and corresponding conjugated acid.As the second step, for the resulting solvated structures COSMO-RS calculations were performed taking water as a real solvent and computing the deviations from ideal conductor by evaluating the differences in electrostatic and H-bonding energies according to the default procedure implemented in the COSMOtherm software. 35All stable conformers were taken into account and statistically weighted as is customary in the COSMO-RS procedure.
From the first step of the COSMO-RS protocol a σ-surface is obtained, that can be used to quantitatively describe the charge delocalization in ions. 36In the case of cations the Weighted Average Negative Sigma (WANS) 36 parameter is used: where σ is the polarization charge density; p(σ) the probability function of σ and A the surface area of the cation.The more extensive is charge delocalization in a cation the lower is its WANS value.
The CPCM 17 and SMD 24 calculations of pK a values of bases B were based on the thermodynamic cycles presented in Scheme 1 and Scheme 2 involving the gasphase acidities (∆G acid,g ) of BH + , equal to the gas-phase basicities (GB) of B. The gas-phase basicity GB of the base B is defined as Gibbs free energy of deprotonation equilibrium of the conjugated acid BH + .
To calculate absolute aqueous pK a values from the direct thermodynamic cycle the following equation was applied together with the corresponding expansion for ΔG acid,s : acid,s a p ln( 10) The relative pK a calculations are based on the proton exchange cycle presented in Scheme 2 with the follow- The absolute aqueous free energy of solvation of the proton with appropriate standard state correction ΔG s (H + ) is based on the results of Tissandier et al. 37 and equals -265.9 kcal mol −1 .RT ln(24.46)reflects the change in the standard conditions from 1 atm to 1 mol L −1 and provides the necessary correction to GB values.The geometries were optimized both in solution (CPCM/HF/6-31G* with default cavities based on UFF radii and SMD/M05-2X/6-31G* with default cavities based on intrinsic atomic Coulomb radii) and in the gas phase using the same functional and basis set combination.The ΔG s (B) and ΔG s (BH + ) values are defined as the differences in SCF energy of the structure in solution and in the gas phase. 38For the SMD calculations both electrostatic and non-electrostatic SCF energy terms were taken into account.The latter term represents cavity formation, dispersion interactions and the changes in solvent structure, and is usually denoted as CDS energy.When available, experimental GB values were used in this study for pK a calculations.For the bases with unknown experimental gas-phase basicity, GB values were calculated at B3LYP/6-311G** level of theory.All geometry optimizations, both in the gas phase and in solution, were followed by frequency calculations to confirm the optimized structures to be the true minima on the potential energy surface.All thermal corrections were calculated for the standard state of 1 atm at 298.15 K.
All CPCM, SMD and GB calculations were carried out with the Gaussian09 software package. 39o supplement the computational methods, an alternate scheme was used to predict pK a values in water based on reliable experimental pK a values in MeCN, and Gibbs free energies of solvation of all the species both in water and MeCN.
where Eff solv is the solvation effect between MeCN and H 2 O in pK a units: The ∆G solv in a given solvent is defined in the case of bases as The solvation Gibbs free energies of neutrals and cations were calculated (see Supporting Information) using the SMD/M05-2X/6-31G* method.The G values of the proton were taken from experiments. 40,41The X corr in equation 10 is a correction term derived from the same calculation for PhTMG, for which reliable pK a values are known both in MeCN and in water.

RESULTS AND DISCUSSION
Altogether 27 strong neutral bases were investigated, with base strength varying by 16 pK a units.Table 1 presents the aqueous pK a values computed with COS-MO-RS, CPCM and SMD methods along with experimental aqueous and MeCN pK a data as well as gasphase basicities from literature where available.The pK a values of some bases with reliable pK a values available in MeCN were computed according to Eq. 10.Because of the difficulties with measurements mentioned above reliable experimental data in water can be found only for the less basic region of the investigated bases.

Correlation Analysis and Errors of Computational Methods
In Table 2 the data of linear regressions between all used computational methods and the experimental values are presented alongside with error analysis of the computational methods.When interpreting the correlation analysis data it is important to keep in mind that in the higher basicity region, e.g. if the pK a of a base is higher than ca 12, the experimental values can also contain significant errors.
From the first section it is evident that if the dataset is not divided into compound groups none of the methods seems to reproduce the experimental values satisfactorily.COSMO-RS is by a narrow margin the best with R 2 = 0.74 and Mean Unsigned Error (MUE) of 1.04 pK a units.It is evident from Figure 2, that the dataset seems to contain in broad terms two compound groups -phosphazenes and amidines/guanidineswhich is also chemically and structurally reasoned.This reasoning is supported by the WANS values, which for phosphazenes are below 2.4 and for amidines/guanidines above 2.4.To put the WANS values into perspective, the WANS values for some common small cations are as follows: H Among the amidines/guanidines group there seem to be two outliers, DBU and MTBD that are by their published experimental pK a values seemingly better grouped with phosphazenes.However, the WANS values of their cations do not support their exclusion from the amidines group.
Taking into account the two groups additional group-wise correlations were made.For phosphazenes the correlation improved drastically for all CPCM and SMD computations, the R 2 being around 0.9 and the best MUE being around 0.7 pK a units, if computational schemes relative to phosphazenes are used.Both the direct SMD models and schemes relative to PhTMG give considerably worse MUEs, up to 3.6 pK a units.The direct CPCM models gives still good results with MUE = 1.05.COSMO-RS differs from CPCM and SMD models by worse R 2 value 0.67, but seems still to have acceptable errors (MUE = 1.03).The poor correlation is mostly due to the least basic phosphazenes (4-NO 2 and 2,5-Cl 2 substituted PhP1(pyrr)), which deviate strongly but no concrete reason could be found.
For amidines/guanidines the correlation and error characteristics remain poor (R 2 = 0.2 .. 0.5) because of deviation of DBU and MTBD.
Inspection of the results of correlation between calculated and experimental aqueous pK a values presented in Table 2 reveals that the regression line slopes are rather low for all computational methods.Similar  1, excluding the ones that are obtained from correlation analysis from other solvent (HP1(dma) and HP1(pyrr)).For DBU the experimental value of 11.9 is used.observations regarding low slope values for aqueous pK a calculations using implicit solvation approach have been reported on several occasions. 51It was shown by Adam 51 that adding one explicit water molecule to the anions of phenols increased the value of the slope of pK a regression line from 0.50 to 0.88 while for aliphatic carboxylic acids adding two water molecules to the anions changed the value of slope from 0.50 to 1.01.In contrast, the pK a regression for unhydrated anilinium ions was characterized by the slope value of 0.70 which is higher than the slope value for unhydrated phenols and carboxylic acids and seems to be insensitive to hydration.Kelly et al. 51 studied the effects of adding explicit water molecules to the anions of monoprotic acids and also arrived at a conclusion that in terms of slope of the pK a regression equation the performance of pure polarized continuum model is improved after implementing the cluster-continuum approach.The same authors argued that adding explicit water molecules is usually justified in case of small size anions and anions with significant charge localization.They also noted that the addition of water molecules does not always lead to improved calculation accuracy and that for several acids reliable pK a values were obtained using pure continuum treatment of aqueous medium. 51The results of Chipman 51 obtained for neutral OH and cationic NH acids are consistent with those reported by Adam. 51It is evident that implicit solvation treatment of the latter group of acids yields aqueous pK a values that are in reasonable agreement with the experiment while for the former acid group characterized by small to medium size anions with high degree of charge localization pure polarized continuum approach fails. 51n this respect it is important to note that the present study is all about alkylated guanidine and phosphazene bases.The ionic species -protonated basesinvolved are bulky and the charge is extensively delocalized in the cations.This is evidenced by their WANS values being in general below 4 (only phenyl guanidine above 5).WANS values of charge-localized cations are significantly higher as evidenced by the WANS examples given above.Under these circumstances the cluster-continuum protocol has not been considered a mandatory choice in this study.However, the slope value for the group of phosphazene bases is still low and this finding deserves further attention.In particular, it would be important to discriminate between deficiencies of implicit solvation approach and other possible reasons, most importantly the uncertainties of experimental pK a and GB values.

Assigning Recommended pKa Values to the Bases
The following criteria were taken into account when assigning the recommended pK a values: 1.The experimental data of moderately basic compounds (pK a around 11 or below) are much more reliable than computational values.At the same time experimental values of bases with high basicity are not very reliable and due to the specifics of the pK a measurement methods tend to be underestimated, rather than overestimated.2. Computations using reference bases are generally more reliable than direct computations, because the errors in solvation energy partially cancel, and the reliability increases with increasing similarity of the reference base and investigated base.

Computations via MeCN pK a values according to
Eq. 10 are more reliable than computations via gas-phase basicities (Eq.9), because (a) the same base is used, (b) pK a values in MeCN are more reliably known than GB values and (c) MeCN as a medium is more similar to water than the gas phase.The only counterargument is that solvation energies in two different solvents are used, instead of just one solvent as in Eq. 9. 4. The basicity order in water can differ significantly from the gas phase but not too much from acetonitrile.5. Several sources of experimental pK a data have very limited or completely missing experimental parts.This precludes judging their reliability and decreases their trustworthiness.6. Correlations between the computational and available experimental data range from poor to fair.In addition they cover the low to medium basicity range only.These two factors together make these correlations of little use for correcting/adjusting the predicted pK a values of strong bases and consequently were not used.
The assigned recommended values are presented in the last column of Table 1.Comments on some of the more important bases follow.The recommended pK a value 13.5 for DBU has been assigned taking into account all computations but ignoring the experimental values of 11.5 and 11.9.The experimental value of PhTMG 11.77 and comparison of these two bases in MeCN implies that both experimental aqueous pK a values of DBU are most probably underestimated.The sources of the experimental values do not contain any descriptions of experimental pK a determination.
The recommended pK a values of MTBD and TBD are primarily based on Eq. 10.Their basicity order does not match that of most computations and also not the basicity order in the gas phase, but matches the experimental basicity order in MeCN.More efficient solvation of TBDH + is the reason for TBD being more basic in MeCN than MTBD.The same is expected in water.They fall nicely in the correct area: for MTBD COS- Assigning the new recommended values to DBU and MTBD removes the above described problem that these bases seriously fall off from the correlations between experimental and computational values.
The published aqueous pK a values of HP 1 (dma) and HP 1 (pyrr) (13.32 and 13.93, respectively) were obtained from correlation analysis from values in MeCN and THF. 14The results of the present calculations do not support those values.Without exception, all computed values are higher.This can be due to the much less hindered basicity center in the cations and thus more effi-cient solvation stability of the protonated forms of the bases than in the case of the phosphazene bases that were used for correlation analysis in Ref. 14.The recommended values are based first of all on Eq. 10, but are also well supported by CPCM and SMD calculations if phosphazenes are used as reference bases.With the new recommended values these two phosphazene bases drift away from the phosphazene series of the correlations described in the previous section.The reason is that the basicity centers of these two bases are less sterically screened than those of any other phosphazenes in this study.
Figure 3 displays the correlations between pK a values given as recommended in Table 1 and computed by the relative SMD and CPCM methods (Eq. 9, using PhTMG and PhP1(pyrr) as reference bases).Only those compounds (12 phosphazenes and 7 amidines/guanidines) that had either an experimental value prior to this study or their recommended values have been obtained using Eq. 10 have been used in the correlation.The SMD method demonstrates slope values between 0.78 and 1.16.The R 2 of phosphazenes is good 0.92, that for amidines/guanidines is worse with 0.76.For CPCM the slope values for both groups are identical with 0.80.The R 2 values are very similar as well with 0.89 and 0.90.

Correlations of Aqueous pKa Data with MeCN pKa Data
In order to gain further insight into the quality of the computed aqueous pK a values they were correlated with the experimental pK a values in MeCN.MeCN was chosen as a reference solvent because (1) reliable experimental pK a values of nearly all investigated bases are available from the literature, (2) there is a fairly good correlation between pK a values in water and acetonitrile as the equation 3 suggests and (3) the true ionic basici-ties can be measured in acetonitrile, unlike THF, where the actually measured values refer to ion-pair basicities. 52igure 4 shows that the basicity region investigated in this work is almost fully covered by measured pK a values in MeCN (except EtP 2 (dma) and t-BuP 4 (dma), which are too basic to be directly measured in MeCN and the MeCN pK a values have been estimated from THF data 49 ).The computed aqueous pK a values correlate quite well with the experimental data in MeCN, thereby indirectly confirming that the computed aqueous pK a values do not contain major errors.The correlation between experimental values between water and MeCN shows that dispersion of points around regression line increases along with the increase of basicity.

CONCLUSIONS
In the pK a range above 12 neither experiments nor computations by any single approach are sufficiently reliable for assigning reliable pK a values for bases.The best estimates of pK a values are obtained by combining knowledge from experiments in water, in other solvents and from the gas phase with different computational methods and chemical reasoning taking into account the expected reliability of experiments and computations, as well as the chemical properties of the involved species.

Figure 1 .
Figure 1.Structures of some of the investigated bases.

Figure 3 .
Figure 3. Correlation between the pKa values obtained with the SMD (a) and CPCM (b) relative schemes and recommended pKa values.Relative methods use PhP1(pyrr) and PhTMG as references for phosphazenes and amidines/guanidines, respectively.

Figure 4 .
Figure 4. Correlation between pKa values in water and experimental values in MeCN.Relative CPCM and SMD methods use PhP1(pyrr) and PhTMG as references for phosphazenes and amidines/guanidines, respectively.t-BuP4(dma) is left out for clarity reasons.

Table 2 .
Statistical data of regression analysis between experimental and computational pKa data for both compound groups a experimental values from Table