A pH-Dependent Coarse-Grained Model for Disordered Proteins: Histidine Interactions Modulate Conformational Ensembles

Histidine (His) presents a unique challenge for modeling disordered protein conformations, as it is versatile and occurs in both the neutral (His0) and positively charged (His+) states. These His charge states, which are enabled by its imidazole side chain, influence the electrostatic and short-range interactions of His residues, which potentially engage in cation−π, π–π, and charge–charge interactions. Existing coarse-grained (CG) models often simplify His representation by assigning it an average charge, thereby neglecting these potential short-range interactions. To address this gap, we developed a model for intrinsically disordered proteins (IDPs) that accounts for the properties of histidine (H). The resulting IDPH model is a 21-amino acid CG model incorporating both His charge states. We show that interactions involving previously neglected His0 are critical for accurate modeling at high pH, where they significantly influence the compaction of His-rich IDPs such as Histatin-5 and CPEB4. These interactions contribute to structural stabilizations primarily via His0–His0 and His0–Arg interactions, which are overlooked in models focusing solely on the charged His+ state.


Modifications in Coarse-Grained Modeling of Short-Range Interactions KH Model
We note that our implementation of KH model slightly differs from the original 1 .While the original model implies some repulsive contributions between specific amino-acids pairs (i.e., ε   < 0), we chose instead to assign a value of 0; Thus, such pairs interactions follow only excluded-volume contributions: Where   = 1kcal/mol.

Mpipi Model
We note key differences between our implementation of IDPH and Mpipi model and the reported model 2 .For the Mpipi model the electrostatic charge of charged amino acids was factored by 0.75.For example, Arg and Lys are assigned qi,j of +0.75, Histidine +0.375 and Asp+Glu share -0.75.Our implementation of Mpipi indeed includes the +0.75 factor, however this factor is not used for the IDPH model.To include the effect of this factor we repeated the calculations of this work, using the terminology of 'Mpipi Fullcharge' model.To include the effect of correct His charge representation (i.e., neutral charge at pH > pKa), we designed a control, "Control Mpipi model".Similarly, to investigate the effect of pure electrostatic charge of His in IDPH, we refer to "Control IDPH model", in which the energetic terms of His 0 -His 0 and His 0 -Arg were turned off compared to IDPH.
While Mpipi originally employ a different potential for specific amino acids pairs (as isoleucine), our implementation does not include for these values.Instead, we used only a single unified potential to represent all the contacts.These should not affect our insights regarding Histatin 5 variants as they do not include Ile residues.Furthermore, the CG models implemented in this work (HPS, KH, Urry and FB) employ the Lennard-Jones like potential (refer to the Methods section in the main text) to represent contacts between sidechains of interacting aminoacids.However, the Mpipi model originally employ the Wang-Frenkel potential (WF) obeying: To consistently use the same Lennard-Jones potential also for the Mpipi model, the   velues reported using the WF potential (see Fig.

Calibration of His Interaction Strengths in IDPH
Figure S2: Calibration Histidine short-range interactions in IDPH.A).The strength of pairwise interaction between residue i and its partner j, ε   (see Methods), for aromatic-aromatic (ππ) pairs as a function of the QM average energies of π-π interactions for the same pairs 3 .The strength of Phe-Phe, Phe-Tyr, Tyr-Tyr, His 0 -Tyr, and His 0 -Trp were kept the same in IDPH model as Mpipi.Only the strength of His 0 -His 0 was refined to match with the calibration line for the other interactions.B).Calibration of cation-π interacting pairs strengths.Arg-His 0 , His + -His 0 , His + -Phe, His + -Tyr, His + -Trp were refined to match with the calibration line of Arginine as a cation (Arg-Phe, Arg-Tyr, and Arg-Trp).Lys' interactions were kept as in the Mpipi model following their observed low abundance both in the Mpipi model 2 and in our work 3 .Experimental Rg values including errors (when reported) can be found in the following works 2,[4][5][6][7][8][9][10] .

Correlation between IDPH and Mpipi even for similar Rg of sfAFP suggests differences in interactions
Figure S3: The difference in probability for a contact between residue i and residue j (∆Pij) as predicted by IDPH against Mpip.High pH is colored in crimson and low pH in pink.These Pij values were sorted by the magnitude of ∆Pij for the higher pH difference case for convince.
Values smaller than 0.01 (absolute values) were nullified to highlight cases where the changes in probability are more significant.

Control IDPH Model: Importance of H 0 Short-Range Interactions
To validate the importance of short-range interactions of His, we checked whether a simple implementation of only assignment of the physical charge state of His within Mpipi (i.e.neutral charge instead 0.375) is enough to explain the experimentally observed dimensions, as shown in

Effect of pH on Monomeric CPEB4 Dimensions and Interactions
The dimension of the NTD CPEB4 monomer has been reported experimentally, from DLS liquid exclusion chromatography 12 .The reported 10.9 nm hydrodynamic diameter for a monomer at high pH corresponds to 55 Å hydrodynamic radius.While this seems to fit the theoretical 55 Å Rh for 448 using R0=3.53 and =0.449(where Rh~ R0N  , see Ref 13 ), observing Figure 1B in this reference suggests that for a 55 Å Rh dimensions, the experimental radii are overestimated by ~25% (calculated ~40 Å and experimental ~50 Å).The use of denaturants has been previously shown to overestimate the size of IDPs where the effect of low salt concentration dramatically decreases the dimensions due to electrostatic screening enabling compaction 14,15 .Therefore, we cannot compare our calculated Rh (refer to Rh methods assessment in Ref 16 where method #4: Nygaard -KR performs best for proteins ~4-5nm Rg) for CPEB4, which is ~43 Å at high pH conditions.Since Rg and Rh are correlated (see Fig. S9 below) the main text discussed Rg as a dimensional observable for comparison.
S1.A), were factored by ~1.105 factor to align the LJ potential (employing the newer   ) with the minimum position of the WF potential (see Fig.S1B).

Figure S4 :
Figure S4: Experimental Rg of Histatin's variants as a function of His content.Values were taken from the following work 11 .

Figure S5 :
Figure S5: The MSE values for the correlations between the computational Rg the variants of Histatin5 simulated using various models to their experimental Rg.The correlation is quantified by the MSE value for variants 1-7 (dark grey circles) or variants 1-6 (light grey circles).The electrostatic charge of His is mentioned in brackets for comparison of different models, otherwise it is +0.5.

Figure S6 :
Figure S6: The role of H 0 short-range interactions.The results are plotted employing the Control IDPH model as discussed in the main text and the supporting section S1.

Fig. S10 .
Fig. S10.Contact maps for IDR CPEB4 variants employing Mpipi with electrostatic +0.375 or without as Control display infrequent His-His or His-Arg interactions.For Control at pH < 7 a full charge is used for His instead +0.375 or 0.5.Otherwise, the Mpipi can be used as the control for low pH.At pH>7 Control IDPH is employed (i.e,His 0 -His 0 and His 0 -Arg were turned off).