pK50—A Rigorous Indicator of Individual Functional Group Acidity/Basicity in Multiprotic Compounds

In this work, we show that the apparent pKa measured by standard titration experiments is an insufficient measure of acidity or basicity of organic functional groups in multiprotic compounds—a frequent aspect of lead optimization in pharmaceutical research. We show that the use of the apparent pKa in this context may result in costly mistakes. To properly represent the group’s true acidity/basicity, we propose pK50—a single-proton midpoint measure derived from a statistical thermodynamics treatment of multiprotic ionization. We show that pK50, which may be directly measured in specialized NMR titration experiments, is superior in tracking the functional group’s acidity/basicity across congeneric series of related compounds and converges to the well familiar ionization constant in the monoprotic case.


■ INTRODUCTION
Protic ionization strongly influences properties and behavior of chemical compounds dissolved in polar solvents. Of these, water is the most important solvent in a number of sciences, such as biochemistry, pharmaceutical science, medicinal chemistry, ecology, toxicology, and agrochemistry. The level of interest in ionization phenomena can be judged by the large number of publications on this topic in the recent scientific literature. Long gone are the times when most molecules of interest were monoprotic in which a single ionization site provided unambiguous interpretation of pK a data and a welldefined "acid" or "base" label. Modern molecules are mostly multiprotic, featuring complex dissociation patterns. The underlying issues have been summarized and illustrated in recent publications. 1−4 The reader is encouraged to review these first, since not all the details are discussed below.
In general, a compound with n ionization sites exists in solution as a collection of 2 n ionization states called microstates or microspecies. Each pair of related microstates differing by one proton is joined by an elementary chemical equilibrium characterized by an ionization microconstant. This observation was noted in the early 20th century, 5 has received a solid theoretical treatment, 6−8 and has been confirmed by numerous experiments. 9 It is crucial to have a clear intuitive under-standing of the microstate distribution. The "2 n " number results from all possible combinations of the n sites being either protonated or deprotonated. Each of the microstates can be represented by a binary vector of length n, e.g., (0, 0, 1, 0, 1, 1, ..., 1), where 1 and 0 indicate the presence and absence of a proton at a given ionization site, respectively. In the case of n = 3, the associated eight microstates (0,0,0), (0,0,1), ..., (1,1,1) can be depicted as the 3D vertices of a cube connected by 12 edges (i.e., 1-proton transitions, Figure 1).
In the more general case, microstates of an n-protic molecule can be portrayed as vertices of a hypercube in n dimensions. The number of hypercube vertices is 2 n , and its number of edges is n2 n−1 . For n = 10, the number of microstates is equal to a staggering value of 1024! Thus, one can see why, in general, a complete picture of microscopic ionization (i.e., the determination of all microconstants) for complex multiprotic molecules is virtually impossible to obtain experimentally. 8 Standard titration experiments are able to resolve only the global formal charge states of the molecule in question, i.e., the macrostates yielding no more than n of the usually reported apparent pK a constants (macroconstants). Resolution of the microstates requires the use of highly specialized experimental techniques, normally not performed in routine work. 6−13 One of the reports studied aqueous ionization of cetirizine (an antihistamine drug sold in the USA under the trade name ZYRTEC) as its subject. 9 We used numerical results of this impressive work, and in Figure 2, we present all the ionization microstates of cetirizine arranged analogously to the scheme depicted in Figure 1. a It is apparent that the knowledge of all the microconstants is the source of very rich and detailed information about all the protonation microstates, such as their relative contributions to the respective parent macrostates as well as, for a given microstate, relative probability of dissociation for each of their attached protons. For example, the neutral macrostate labeled by the zero formal charge (HCet 0 ) is strongly dominated (99.1%) by a zwitterionic microstate with one proton bound to the piperazine nitrogen distal to the aromatic rings. No such dominance exists, however, for the +1 macrostate where not one but two microstates make nontrivial contributions. By reviewing the published experimental data, it is evident that the case of multiple microstates with comparable contributions happens more often than expected. The occurrence of this phenomenon naturally increases with the number of ionizable groups. Therefore, the prevalent view of multiprotic ionization occurring sequentially (i.e., one ionizable group at a time) is generally incorrect, even in the approximate sense. The observed macroconstants, 2.10, 3.01, and 8.17, are complex mathematical functions of the relevant microconstants 8 and are assignable only to the formal charge states of the cetirizine molecule acting as the substrate, i.e., H 3 Cet ++ , H 2 Cet + , and HCet 0 , respectively. Attempts to "assign" each of these numbers to an individual ionizable group are doomed to failure. Although one can argue that the 8.17 macroconstant approximately "belongs" to the distal nitrogen due to the  Detailed view of the dissociation macro-(in the left column) and microstates (in the right column) compiled from the recently published results measured for cetirizine�a triprotic molecule denoted symbolically by "Cet". 9 Individual microstates have been arranged in a way corresponding to Figure 1. Green percentage numbers next to microstates indicate relative contribution (probability) of this microstate to its parent macrostate. Red numbers next to reaction arrows indicate pK a microconstants. Numbers next to protonated groups indicate dissociation probability of that group in the given substrate microstate. strong dominance of one particular microstate, a similar argument fails for the other two macroconstants, as evident from Figure 2. Stark evidence further supporting this truth comes from an impressive experimental work of Maxwell and Partington 14 who measured all six ionization macroconstants for mellitic acid (Figure 3).
These pK a values range from 1.4 to almost 7, spanning 5.6 orders of magnitude and resulting, naturally, from complex interactions between 64 of the acid's microstates (see Supporting Information 1). b "Assigning" these different pK a values to individual carboxyl groups is obviously nonsense since they are all equivalent! The above considerations lead us to the following important conclusion: In general, macroscopic pK a values obtained by standard titration for multiprotic molecules are not properties of individual ionizable groups.
Thus, we have arrived at the central problem addressed by this article: if routinely measured macroscopic pK a values are not properties of individual groups, then how can one discern group contributions to multiprotic ionization? We next propose new concepts that successfully address this question.

■ THEORY
In contrast to macroconstants, microconstants are properties of individual groups. The problem is that microconstants are also functions of the molecular environment. For example, basicity of the distal nitrogen in cetirizine strongly depends on the protonation states of the other two groups ( Figure 4). Therefore, questions like "is this amine group strongly or weakly basic?" cannot be answered by microconstants alone, since the answer must be the unsatisfying "it depends".
An average microconstant, on the other hand, might provide an answer. A careful comparison of Figures 2 and 4 provides an idea on how to properly calculate such average. For the distal amine group, there are four microstates where this group is protonated and the other four where this group is deprotonated (Figures 4 and S1 in Supporting Information 3). One can now imagine a formal "dissociation reaction" where a population of the four protonated microstates loses a proton, yielding a population of the four deprotonated microstates. An "equilibrium constant-like" expression which we named the averaged single-proton acidity (ASPA) for a given ionizable group G is thus defined by the following Square brackets indicate molar concentrations. The "sum of microstates" is an abbreviation for the sum of respective molar concentrations at a given pH. Because the numerator and denominator in the above equation represent mixtures of different formal charge states, ASPA is not a constant, but a function of pH. To the best of our knowledge, the concept of ASPA is novel.
Another even more useful concept, laboriously derived by protein scientists from statistical thermodynamics considerations and referred to as "site titration curve", 15,16 has been adopted by us to the world of small molecules and labeled by averaged site protonation (ASP) The above equation represents fraction protonated of a given, specific ionizable group G. ASP also varies with pH between 0 and the number of dissociable protons on G (usually 1).
Finally, an individual ionizable group property that is a pHindependent constant is called the single-proton midpoint by us and assigned an intuitive symbol of "pK 50 " The single-proton midpoint corresponds to an intrinsic "group pK a ". This statement can be supported on statistical thermodynamics grounds. 15,17,18 Instead of a series of complex  The fraction protonated for the monoprotic acid is given by At its titration midpoint, we have ASP HA = 50%, leading to the [HA] = [A − ] equality, thus Our analogy leads to one more useful relationship. Equations 4 and 5 put together lead to the explicit pH dependence for the fraction protonated of the monoprotic acid In the general, multiprotic case, the following relationship between ASPA and ASP of the same group G can be shown as Although this equation is reminiscent of eq 7 where ASPA appears instead of the K a , we would like to underline the pH dependence of ASPA, unlike in the monoprotic case.
The analogy between pK 50 (group) and pK a (monoprotic acid) is understood by the virtue of similarity between eqs 3 and 6.
All of the above equations and principles were implemented in our computer program ADMET Predictor in 2012 as an extension to the S+pK a predictive model. 4,19 The model uses artificial neural network ensembles to predict microconstants for 10 major types of ionizable functional groups (hydroxyacids, amide acids, aliphatic amines, heteroaromatic amines, heteroaromatic acidic NH, thioacids, thione bases, N-oxides, acidic CH, and π-excessive carbobases). The predicted values depend on the input tautomer. More details are published in ref 4. The same software was used to perform all subsequent calculations as well as micro-and macroconstant predictions.

■ RESULTS AND DISCUSSION
The ionization constant, K a , as well as ASPA values span a very broad spectrum of magnitudes. Therefore, it is much more convenient to use negative logarithms of these quantities, denoted by pK a and pASPA, respectively. The pASPA profiles derived from the measured microconstants for cetirizine 9 are shown in Figure 5.
The pASPA values increase (i.e., averaged group acidity decreases) with pH for all three ionizable groups. The blue curve representing the distal nitrogen starts at about 4 and plateaus at approximately 8.2 for high pH. This is quite understandable in the view of Figure 4: at low pH, most of the cetirizine molecules are fully protonated and the acidity of the distal nitrogen is at its highest (conversely, its basicity is the lowest) mainly due to repulsive interactions received from the protonated proximal nitrogen. As the pH increases, the population of cetirizine molecules is being gradually deprotonated and average acidity (i.e., average tendency to deprotonate) of the distal nitrogen gradually decreases (its basicity increases) up to the value 8.17, representing its microconstant in the one-proton macrostate. In fact, a general rule for each ionizable group is that its high pH acidity asymptotes at its one-proton microconstant. Similarly, low pH asymptotes are equal to the respective microconstants at the fully protonated macrostate.
What do pASPA curves look like for a system as complex as mellitic acid? Figure 6 reveals that these are quite simple. The pASPA profiles for all six carboxyl groups overlap perfectly, as expected from the molecule's symmetry. Also expected is the gradual decrease in the individual −COOH acidity with pH, commensurate with the average protonation state of mellitic acid.
Switching attention to ASP, the ASP profiles obtained from measured microconstants for cetirizine are shown in Figure 7.
As expected, the proximal amine and carboxyl deprotonate quite early at pH ∼ 3. The distal nitrogen stays protonated until much higher pH, and at 7.4, its ASP is 85%. In addition, its ASP midpoint, the pK 50 , is at 8.16, very close to the apparent pK a of the HCet 0 macrostate (8.17). This is commensurate with the respective microstate dominance, as discussed above and illustrated in Figure 4. In contrast, the pK 50 values of the proximal N and carboxyl are markedly different from the macroscopic pK a they dominate. Thus, the "true" basicity of proximal amine is represented by pK 50 = 2.35, while the experimental H 3 Cet ++ pK a was measured at 2.10. The analogous numbers for the carboxyl group's acidity are 2.72 and 3.01, respectively. We have observed a general trend for significantly interacting groups: their pK 50 values are closer to each other than the respective apparent pK a assigned to protonation macrostates they dominate.
The predicted pK 50 values for carboxyl groups in mellitic acid obtained from the ASP profile shown in Figure 8 are identical and equal to a reasonable value of 3.55. Keep in mind that it is not possible to obtain this measure of group's acidity from the apparent pK a shown in Figure 3.
The pASPA and ASP quantities reduce to the expected simple pK a and simple fraction ionized for monoprotic molecules, providing another confirmation of their validity (see Figures S3 and S4 in Supporting Information 3). Additionally, in some relatively rare cases of "pure" ionization macrostates, i.e., the macrostates very strongly dominated by just one microstate, their apparent pK a values are numerically very close to the pK 50 of one of the ionizable groups. Such is the case of the HCet 0 macrostate in cetirizine, but Figure S5 in Supporting Information 3 provides an even more illustrative example. In cefaclor, the acidities of its three functional groups are separated by 4−5 orders of magnitude, resulting in three "pure" macrostates.
How does the ASP relate to the known fraction ionized profile? The latter (a.k.a. Bjerrum plot) is defined as a statistical average of the number of bound ionizable protons Figure 6. Negative logarithm of the pASPA for mellitic acid derived from microconstants calculated by the ADMET Predictor. Profile curves for all six carboxylic acid groups overlap perfectly. Each carboxyl group's pASPA varies from 2.34 at low pH to 5.88 at high pH. dependent on pH. As Figure S6 in Supporting Information 3 illustrates, the molecular fraction ionized profile is a simple sum of the individual ASP plots arranged in the order of decreasing pH.
It is important to note that these predicted values can be verified experimentally with techniques that have atomic resolution and are sufficiently focused on individual ionizable groups. Such is the case for NMR titrations of multiprotic molecules. 20 Al Khzem et al. performed elaborate heteronuclear multiple-bond correlation titrations of aminoglycosides using 1 H, 13 C, and 15 N chemical shifts. 21−23 Out of these, the 15 N spectra were most promising in isolating ionizations of individual aliphatic amine groups. The researchers measured chemical shifts, δ in ppm, of each aliphatic amine group at different pH values and from the resulting δ(pH) plots determined "inflection points of sigmoidal curves". The thusobtained "individual pK a values not available by potentiometric methods", in the authors' language, were in fact approximate pK 50 values of the amines. To better illustrate that such is the case, we have extracted raw 15 N shifts for all five primary amine groups of tobramycin, an aminoglycoside antibiotic, from  these δ values to the [0,1] interval using minimax scaling. In this form, the experimental points are directly comparable to our ASP profiles predicted for tobramycin, as shown in Figure  9. Relevant details can be found in the Excel spreadsheet of Supporting Information 2. The agreement between experimental points and predicted curves is quite good. Next, we have extracted experimental pK 50 for tobramycin from Table 1 in ref 23 (last row labeled "15N HMBC NMR") and compared these against our pK 50 predictions. 4,19 Results shown in Table 1 below once again illustrate good agreement. We have observed similarly good agreement for other aminoglycosides (data provided in Table S1 through S4 in Supporting Information 3).
Precise and accurate prediction of individual ionizable pK 50 has important, real-world implications in drug design and development. Relying on the apparent pK a "assignments" to a functional group instead may result in costly mistakes. As an example, one of our customers reported the following: "Our lead compound consisted of a tertiary amine group on one end, a complicated scaffold in the middle, and a substituted phenol ring on the other end. We have measured pK a around 9.5 and "assigned" it to the amine. It was crucial to lower the amine's basicity, and thus, (at great expense) we have synthesized a large number of derivatives with that goal in mind and measured their pK a . To our surprise, nothing worked�the "amine's pK a " did not change or even went up! Obviously, we cannot reveal the customer's proprietary chemical structures, data, or even the derivative pattern. Instead, we have prepared an illustrative example of a made-up surrogate compound, made-up substituents, made-up derivatives, and predicted 4,19 pK a . In particular, we have replaced the "complicated scaffold in the middle" by a simple aliphatic chain; see Figure 10. Table 2 contains predicted pK a for the model compound and its derivatives.
The numbers illustrate a general trend observed by the customer: regardless of the amine substitution, the pK a barely changed, in stark contrast to expectations. For example, the strained aziridine ring in the third row should have significantly lowered the amine basicity. Instead, the pK a increased by 0.13. The morpholine derivative was expected to lower the amine pK a by ∼2 log units, but the apparent "basicity" increased by 0.14.
The answer to this behavior is revealed by examining the model compound's ionization microstates in Figure 11.
The primarily aminic pK a (at the 71.9% level) is predicted at 10.46, while the value 9.61 in Table 2 is primarily phenolic. Furthermore, the pK a "identity" is not preserved across derivatives�as shown in Figure 11, a higher value of 10.07 is predominantly phenolic, while the lower one may be labeled as aminic. Such an "identity switch" occurs four more times. This illustrates the fallacy of functional group "assignments" using apparent pK a �the apparent pK a is a result of complex interactions of all ionizable groups in a given compound and cannot be assigned to just one group. Indeed, at a 53.5%/ 46.5% split for the dimethyl derivative, the aminic/phenolic decision cannot be determined as solely acidic or basic. In contrast, Table 3 reveals that pK 50 values of the tertiary amine change according to expectations (e.g., morpholine pK 50 is markedly lower than the diethyl one).
Having a specific ΔpK a goal in mind, the customer could then focus on one or two of the most promising derivatives, saving much time and money. Indeed, the customer did perform such suggested computational analysis post mortem and completely agreed with our proposal and conclusions.
The last in our list of examples is Table S5 (Supporting Information 3) containing 24 compounds used in SAMPL6

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article competition, 24 their measured macroscopic pK a , corresponding predicted 19 pK a , and predicted pK 50 for major contributing groups.

■ CONCLUSIONS
Standard titration measurements of multiprotic compounds lead to apparent pK a values that represent macroscopic charge state transitions which are statistical averages of the appropriate ionization microstates and as such do not, in general, correspond to quantitative ionization of individual functional groups. Alkhzem et al. note: "potentiometric titration and UV methods have been used to determine ionization constants (pK a ). However, in polyamines, no direct link can be drawn from an individual amino group to a specific pK a value (i.e, pK 50 ) by those techniques." 22 Even if an apparent pK a happens to be numerically close to one functional group's pK 50 due to the dominance of a single microstate leading to an "assignment" of the pK a to this group, there is no guarantee that such an "assignment" will carry through a congeneric series of related compounds. In fact, as our last example shows, the "assignment" may experience an "identity switch" across the congeneric series due to changes in Figure 10. General representation of the customer's lead compound and our simplified model representing it in calculations. Note that the fluorine substituent does NOT represent R1! Figure 11. Predicted pK a and ionization microstates of the model compound and its dimethyl derivative.

Journal of Chemical Information and Modeling
pubs.acs.org/jcim Article microstate dominance. This phenomenon may make lead optimization difficult and costly. The novel concept of ASPA is an intuitive measure of instantaneous acidity/basicity of an ionizable group in a multiprotic molecule. Since the group's environment (i.e., other ionizable groups present in the molecule) depends on pH, so does its instantaneous acidity/basicity. The pASPA profiles show a general trend of higher acidity/lower basicity at low pH that gradually decreases/increases with growing pH, respectively.
The ASPA led us quite naturally to a previously known 15 concept of the probability of an ionizable group being protonated (a.k.a. fraction protonated) which we named the ASP. Similar to ASPA, the ASP depends on pH because the group's environment depends on pH. The ASP profiles usually resemble sigmoid titration curves. It is the midpoint of an ASP profile that provides an unambiguous and pH-independent measure of the true group's acidity�the pK 50 . The influence of other groups on the ASP midpoint is effectively averaged, and thus, the pK 50 carries through congeneric series in an expected manner regardless of microstate dominance shuffling, unlike the apparent pK a . Moreover, pK 50 values are directly measurable in elaborate NMR titration experiments with atomic resolution. We suggest that pK 50 should replace apparent pK a in applications involving acidity of individual functional groups.
Since experimental determination of ionization microconstants and pK 50 is still too expensive for routine molecular design, in our humble opinion, predictive computer modeling will remain the main avenue of obtaining these useful quantities. There are many software programs that predict microconstants via either linear free energy relationships, quantitative structure−property relationships, or ab initio quantum chemical calculations. See refs 2 and 3 for detailed discussion of these methods. Simple source code modifications may tailor any of these programs toward computing pK 50 values as well.
■ ASSOCIATED CONTENT
Predicted microstates of mellitic acid (TIF) Data used to draw Figure 9 (XLSX) Illustration of the fundamental concept behind ASPA, predicted pK a and microstate probabilities, predicted pASPA profile for acetic acid, predicted pK 50 , predicted statistical average, comparison of observed vs predicted pK 50 , and measured macroscopic pK a (PDF) This work has been funded by Simulations Plus, Inc. in its entirety.

Notes
The authors declare no competing financial interest. The pK a values were predicted for the model compound and its derivatives.

■ ACKNOWLEDGMENTS
We sincerely thank Robert D. Clark who suggested the name "pK 50 ".
■ ADDITIONAL NOTES a To avoid confusion, we would like to stress that the following discussion of cetirizine ionization patterns is in its entirety based on measured microconstants. For comparison, predicted pK a values for this compound are presented in Figure S2 in Supporting Information 3. Although quoted ionization macroconstants for mellitic acid are measured, its microscopic ionization patterns presented in Supporting Information 1 are predicted.