Improved predictions of phase behaviour of intrinsically disordered proteins by tuning the interaction range

The formation and viscoelastic properties of condensates of intrinsically disordered proteins (IDPs) is dictated by amino acid sequence and solution conditions. Because of the involvement of biomolecular condensates in cell physiology and disease, advancing our understanding of the relationship between protein sequence and phase separation (PS) may have important implications in the formulation of new therapeutic hypotheses. Here, we present CALVADOS 2, a coarse-grained model of IDPs that accurately predicts conformational properties and propensities to undergo PS for diverse sequences and solution conditions. In particular, we systematically study the effect of varying the range of the nonionic interactions and use our findings to improve the temperature scale of the model. We further optimize the residue-specific model parameters against experimental data on the conformational properties of 55 proteins, while also leveraging 70 hydrophobicity scales from the literature to avoid overfitting the training data. Extensive testing shows that the model accurately predicts chain compaction and PS propensity for sequences of diverse length and charge patterning, as well as at different temperatures and salt concentrations.


Introduction
Biomolecular condensates may form via phase separation into coexisting solvent-rich and macromolecule-rich phases.Phase separation (PS) is driven by multiple, often transient, interactions which are in many cases engendered by intrinsically disordered proteins (IDPs) and low-complexity domains (LCDs) of multi-domain proteins [1][2][3][4][5] .The amino acid sequence dictates both the propensity of IDPs to phase separate and the viscoelastic properties of the condensates.Moreover, condensates of some IDPs reconstituted in vitro tend to undergo a transition to a dynamically arrested state, in which oligomeric species can nucleate and ultimately aggregate into fibrils 2,3,[6][7][8][9][10][11][12][13][14] .As accumulating evidence suggests that these processes may be involved in neurodegeneration and cancer [15][16][17][18] , understanding how the amino acid sequence governs PS and rheological properties of condensates is a current research focus.Due to the transient nature of the protein-protein interactions that underpin PS, quantitative characterization of biomolecular condensates via biophysical experimental methods is challenging, and hence molecular simulations have played an important role in aiding the interpretation of experimental data on condensates reconstituted in vitro 19 .However, molecular simulations of the PS of IDPs require a minimal system size of ∼100 chains and long simulation times to sample the equilibrium properties of the two-phase system.To enhance the computational efficiency of these simulations, the atomistic representation of the phase-separating protein is typically coarse-grained to fewer interaction sites while the solvent is modelled as a continuum.
A widely used class of coarse-grained models of IDPs describes each residue as a single site centered at the Cα atom.Charged residues interact via salt-screened electrostatic interactions whereas the remaining nonionic nonbonded interactions are incorporated in a single short-range potential characterized by a set of "stickiness" parameters.The stickiness parameters are specific to either the single amino acid or pairs of residues and were originally derived by Dignon et al. from a hydrophobicity scale 20 .Other models, based on the lattice simulation engines LaSSI 21 and PIMMS 22 , classify the amino acids into a reduced number of residue types with distinct stickiness, ranging from binary categorizations into stickers and spacers 4, 23,24 to more detailed descriptions and parameterizations 25,26 .Recently, the accuracy of the stickiness parameters has been considerably improved.This has been achieved by leveraging (i) experimental data on single-chain properties, (ii) statistical analyses of protein structures, and (iii) residue-residue free energy profiles calculated from all-atom simulations [26][27][28][29][30][31] .Notably, automated optimization procedures to improve of the stickiness parameters have been proposed by us and others 26,28,29,32 .In our previous work 32 , the procedure maximized the accuracy of the model with respect to experimental data reporting on conformational properties of IDPs, namely, small-angle X-ray scattering (SAXS) and paramagnetic relaxation enhancement (PRE) NMR data.To ensure the transferability of the model across sequence space, we trained the model on a large experimental data set and employed a Bayesian regularization approach 32,33 .As the regularization term, we defined the prior knowledge on the stickiness parameters in terms of 87 hydrophobicity scales reported in the literature.The resulting optimal parameters (previously referred to as M1 32 ) were shown to capture the relative propensities to phase separate of a wide range of IDP sequences.However, we also observed that a systematic increase in simulation temperature of about 30 °C is needed to quantitatively reproduce the experimental concentration of the dilute phase coexisting with the condensate on an absolute scale.Herein, we refer to this model as the first version of the CALVADOS (Coarse-graining Approach to Liquid-liquid phase separation Via an Automated Data-driven Optimisation Scheme) model (CALVADOS 1).
In this class of coarse-grained models of IDPs, nonionic interactions are modelled via a Lennard-Jones-like potential, which decays to zero only at infinite residue-residue distances.For computational efficiency, the potential is typically calculated up to a cutoff distance, r c , and interactions between particles that are farther apart are ignored.Although this truncation may introduce severe artefacts, in the different implementations of the models, the value of r c has varied considerably between 1 and 4 nm 20,[29][30][31][32]34,35 . Here,we systematically investigate the effect of the cutoff of nonionic interactions on single-chain compaction and PS propensity.We find

Amendments from Version 1
In the revised version of the article, we discuss the effect of the cutoff of the ionic interactions on both single-chain and phase properties.We also comment on the effect of temperature on the potential that describes ionic interactions in the model.We improved the clarity of the article by (i) specifying the source of the M1 parameters, (ii) adding a description of the error bars in Figure 3B in the figure caption, (iii) clarifying the y-axis label of Figure 4B in the figure caption, (iv) correcting the values of the x-axis tick labels of Figure 5, and (v) detailing how we used sequence lengths to determine optimal sampling frequencies and simulation lengths.
Any further responses from the reviewers can be found at the end of the article REVISED that decreasing the cutoff from 4 to 2 nm results in a small increase in the radius of gyration whereas the PS propensity significantly decreases.We exploit this effect to improve the temperature-dependence of the CALVADOS 1 model by tuning the cutoff of the nonionic potential.Further, we perform a Bayesian optimization of the stickiness parameters using a cutoff of 2.4 nm and an augmented training set.We show that the updated model (CALVADOS 2) has improved predictive accuracy.

Molecular simulations
Molecular dynamics simulations are conducted in the NVT ensemble using the Langevin integrator with a time step of 10 fs and friction coefficient of 0.01 ps −1 .Non-bonded interactions between residues separated by one bond are excluded from the energy calculations.Functional forms and parameters for bonded and nonbonded interactions are reported in the "Bonded and nonbonded interactions" subsection.Single chains of N residues are simulated using HOOMD-blue v2.9.3 36 in a cubic box of side length 0.38 × (N − 1) + 4 nm under periodic boundary conditions, starting from the fully extended linear conformation.Conformations are saved every ∆t ≈ 3 × N 2 fs if N > 100 and ∆t = 30 ps otherwise.Each chain is simulated in ten replicas for a simulation time of 600 × ∆t.The initial 100 frames of each replica are discarded, so as to obtain 5,000 weakly correlated conformations for each protein (Figure 1).The functional form of ∆t was inferred from calculations of the autocorrelation function of the radius of gyration, R g , for proteins of various N. Direct-coexistence simulations are performed using openMM v7.5 37 in a cuboidal box of side lengths [L x ,L y ,L z ] = [25, 25, 300], [17, 17, 300] and [15, 15, 150] nm for Tau 2N4R, Ddx4 LCD, and for the remainder of the proteins, respectively.In the starting configuration, 100 chains are aligned along the z-axis and with their middle beads placed in the xy-plane at random (x, y) positions which are more than 0.7 nm apart.Multi-chain simulations are carried out for at least 2 µs, saving frames every 0.5 ns (Figure S1, S2, and S3).After discarding the initial 0.6 µs, the slab is centered in the box at each frame as previously described 32 and the equilibrium density profile, ρ(z), is calculated by averaging over the trajectory of the system at equilibrium.The densities of the dilute and protein-rich phases are estimated as the average densities in the regions |z| < z DS − t/2 and |z| > z DS + 6t nm, where z DS and t are the position of the dividing surface and the thickness of the interface, respectively.z DS and t are obtained by fitting the semi-profiles in z > 0 and z < 0 to where ρ a and ρ b are the densities of the proteinrich and dilute phases, respectively.The uncertainty of the density values is estimated as the standard error obtained from the blocking approach 38 implemented in the BLOCKING software (github.com/fpesceKU/BLOCKING).

Bonded and nonbonded interactions
In this study, we used the following truncated and shifted Ashbaugh-Hatch potential 39 ,   where ϵ = 0.8368 kJ mol −1 , r c = 2 or 4 nm, and u LJ is the Lennard-Jones potential: σ and λ are arithmetic averages of amino acid specific parameters quantifying size and hydropathy, respectively.For σ, we use the values calculated from van der Waals volumes by Kim and Hummer 40 whereas, for λ, we use the recently proposed M1 parameters 32 and the values optimized in this work.
Salt-screened electrostatic interactions are modeled via the Debye-Hückel potential, where q is the average amino acid charge number, e is the elementary charge, πBc is the Debye length of an electrolyte solution of ionic strength c s and B(ϵ r ) is the Bjerrum length.Electrostatic interactions are truncated and shifted at the cutoff distance r c = 4 nm, irrespective of the value of r c used for Eq. 1.We use the following empirical relationship 41 to model the temperature-dependent dielectric constant of the implicit aqueous solution.As previously observed 31 , accounting for the temperature-dependence of ϵ r has a small effect on the predictions of the model.Indeed, the relative change in D upon an increase in temperature from 4 to 50 °C is only −3%.Similarly, at c s = 150 mM, the Debye-Hückel energy between like-charged residues at the cutoff distance, u DH (r = 4 nm), is 2.6 J mol −1 at 4 °C and 2.8 J mol −1 at 50 °C.The Henderson-Hasselbalch equation is used to estimate the average charge of the histidine residues, assuming a pK a value of 6 42 .
The amino acid beads are connected by harmonic potentials, of force constant k = 8033 kJ mol -1 nm -2 and equilibrium distance r 0 = 0.38 nm.

Optimization of the stickiness scale
The optimization of the stickiness parameters, λ, is carried out to minimize the cost function using an algorithm which is analogous to the one we previously described 32 .
2 χ R g and 2 χ P RE quantify the discrepancy between model predictions and experimental data, and are defined as where σ exp is the error on the experimental values, Y is either PRE rates or intensity ratios and N labels is the number of spin-labeled mutants used for the NMR PRE data.In the expression for the cost function, the coefficients are set to η = 0.1 and θ = 0.05.The prior is the distribution of λ, P(λ), derived from a subset of the hydrophobicity scales reported in is obtained via multivariate kernel density estimation, as implemented in scikit-learn 44 , using a Gaussian kernel with bandwidth of 0.05.This prior is 20-dimensional and contains information on the λ-distribution of the single amino acid as well as on the covariance matrix inferred from our selection of 70 hydrophobicity scales (Figure 2).
In the first step of the optimization procedure, the λ values for all the amino acids are set to 0.5, λ 0 = 0.5, and these parameters are used to simulate the proteins of the training set (Table 1 and Table 3).We proceed with the first optimization cycle, wherein, at each k-th iteration, the λ values of a random selection of five amino acids are nudged by random numbers picked from a normal distribution of standard deviation 0.05 to generate a trial λ k set.For each ith frame, we calculate the Boltzmann weight as , is lower than 60%.Otherwise, the acceptance probability follows the Metropolis criterion, where ξ k is a unitless control parameter.Each optimization cycle is divided into ten micro-cycles, wherein the control parameter, ξ, is initially set to ξ 0 = 0.1 and scaled down by 1% at each iteration until ξ < 10 −8 .From the complete optimization cycle, we select the λ set yielding the lowest estimate of ℒ. Consecutive optimization cycles are performed from simulations runs carried out with the intermediate optimal λ set.To show that the procedure is reproducible and that the final λ set is relatively independent of the initial conditions, we performed an additional optimization procedure starting from the M1 model, λ 0 =M1 32 .The optimization performed in this work differs from our previous implementation 32 also for the following details: (i) nine additional sequences have been included in the training set (Table 1 and Table 3); (ii) single chains are simulated as detailed in the "Molecular simulations" Subsection; (v) the average radius of gyration is calculated as

Results and discussion
When applying a cutoff scheme, we neglect the interactions of residues separated by a distance, r, larger than the cutoff, r c .For the most strongly interacting residue pair (between two tryptophans), the nonionic potential of the CALVADOS 1 model at r c = 2 nm takes the value of -5 J mol -1 , that is, only a small fraction of the thermal energy (Figure 3A).However, the Lennard-Jones potential falls off slowly whereas the number of interacting partners increases quadratically with increasing r.Therefore, in a simulation of a protein-rich phase, decreasing the cutoff from 4 to 2 nm (Figure 3A) can imply ignoring a total interaction energy per protein of several times the thermal energy.We first look into the effect of the choice of cutoff on the conformational ensembles of isolated proteins.We simulated single IDPs of different sequence length, N = 71-441, and average hydropathy, ⟨λ⟩ = 0.33-0.63.The average radii of gyration, ⟨R g ⟩, calculated from simulation trajectories are systematically larger when we use r c = 2 nm, compared to the values obtained using r c = 4 nm.CALVADOS 1 was optimized using the longer r c and estimating the ensemble average R g values as the root-mean-square R g , g R is systematically larger than ⟨R g ⟩, decreasing r c to 2 nm results in a slight improvement of the agreement between the calculated ⟨R g ⟩ values and the experimental data (Figure 3B).
To gain further insight into the effect of the cutoff, we performed simulations of single chains of α-Synuclein, hnRNPA1 LCD, PNt and Tau 2N4R (Table 1 and Table 2) using r c = 2, 2.5, 3 and 4 nm.Irrespective of the sequence, ⟨R g ⟩ decreases monotonically with increasing r c .However, the effect on compaction appears to be larger for long sequences and high content of hydrophobic residues, both of which result in an increased number of shorter intramolecular distances.For example, upon increasing the r c from 2 to 4 nm, the ⟨R g ⟩ of α-Synuclein (N = 140, ⟨λ⟩ = 0.33) decreases by 2.3% whereas the effect is more pronounced for hnRNPA1 LCD (N = 137, ⟨λ⟩ = 0.61) and Tau 2N4R (N = 441, ⟨λ⟩ = 0.38), with a decrease in ⟨R g ⟩ of 4.0% and 7.7%, respectively.
To investigate the effect of the cutoff distance on PS propensity, we performed direct-coexistence simulations of 100 chains of hnRNPA1 LCD, LAF-1 RGG domain (WT and shuffled sequence with higher charge segregation), and Tau 2N4R (Table 4).From the simulation trajectories of the two-phase system at equilibrium, we calculate c sat , i.e. the protein concentration in the dilute phase coexisting with the condensate.The higher the c sat value, the lower the propensity of the IDP to undergo PS.As expected from the increased contact density in the condensed phase, the choice of cutoff has a considerably larger impact on c sat than on chain compaction: decreasing r c from 4 to 2 nm results in an increase in c sat of over one order of magnitude.In contrast to what we observed for the ⟨R g ⟩, the decrease in c sat does not show a clear dependence on sequence length and average hydropathy.4).
From the multi-chain trajectories of hnRNPA1 LCD, LAF-1 RGG domain (WT and shuffled sequence) and Tau 2N4R obtained using r c = 4 nm, we estimate that the increase in nonionic energy per protein upon decreasing the cutoff from 4 to 2 nm is 13±1 kJ mol -1 (mean±standard deviation), respectively (Figure 4A).Assuming that the number of interactions neglected by the shorter cutoff is proportional to the sequence length and the amino acid concentration in the condensate, the small variance in the energy increase across the different IDPs finds explanation in the fact that the simulated systems display similar values of N 2 × c con (Figure 4A), where c con is the protein concentration in the condensate.The ratio U (r c = 2 nm)/U (r c = 4 nm) of the nonionic energies for r c = 2 and 4 nm is also largely system independent (Figure 4B).Moreover, decreasing the temperature by ∼30 K in the range between 310 and 323 K has a rather small impact on the relative strength of the electrostatic interactions with respect to the thermal energy, due to the decrease in the dielectric constant of water with increasing temperature (Figure 4C).Therefore, we speculate that the effect of decreasing r c can be compensated by simulating the system at a lower temperature (Figure 4B).
With these considerations in mind, we use the CALVADOS 1 model with r c = 2 nm to run direct-coexistence simulations of IDPs for which c sat has been measured experimentally (Table 4), i.e. variants of hnRNPA1 LCD, hnRNPA1 LCD * at various salt concentrations, and LAF-1 RGG domain variants.As we have shown that decreasing the range of the nonionic interactions disfavours PS, we perform these simulations at the experimental temperatures, which are lower by ∼30 K than those required to reproduce the experimental c sat values when the model is simulated with r c = 4 nm (Figure 3E-G).The two-fold decrease in r c enables the model to quantitatively recapitulate the experimental c sat data at the temperature at which the experiments were conducted.Notably, we show this for diverse sequences, across a wide range of ionic strengths, and for variants with different charge patterning and numbers of aromatic and charged residues.These results suggest that the range of interaction of the Lennard-Jones potential may be too large 77 .While the r −6 dependence is strictly correct for dispersion interactions between atoms, the nonionic potential of our model incorporates a variety of effective nonbonded interactions between residues, and hence the Lennard-Jones potential is not expected to capture the correct interaction range 31 .
Since CALVADOS 1 was developed using r c = 4 nm, we examined whether reoptimizing the model with the shorter cutoff could result in a comparably accurate model.As detailed in the Methods Section, we performed a Bayesian parameter-learning procedure 32 using an improved algorithm, an expanded training set (Table 1), and r c = 2 nm. Figure S5 shows that the new model tends to underestimate the c sat values of the most PS-prone sequences.We hypothesize that during the optimization the reduction of attractive forces due to the shorter cutoff is overcompensated by an overall increase in λ.We tested this hypothesis by performing the optimization with increasing values of r c , in the range between 2.0 and 2.5 nm, and found that the c sat values predicted from simulations performed with r c = 2.0 nm tend to increase with the r c used for the optimization (Figure 5).
Using r c = 2.4 nm for the optimization resulted in a model with improved accuracy compared to CALVADOS 1 (Figure 6), especially for the PS of LAF-1 RGG domain and the −23S+23T variant of A1 LCD.To test the robustness of the approach, the optimization was carried out starting from λ 0 = 0.5 for all the amino acids (Figure 6) and from λ 0 =M1 (Figure S6 and S7).The difference between the resulting sets of optimal λ values (Figure S7A) is lower than 0.08 for all the residues and exceeds 0.05 only for S, T and A. The model obtained starting from λ 0 = 0.5 is more accurate at predicting PS propensities and will be referred to as CALVADOS 2 hereafter.
The λ values of CALVADOS 1 and 2 differ mostly for K, T, A, M, and V, whereas the smaller deviations (|∆λ| < 0.09) observed for Q, L, I, and F (Figure 6A) are within the range of reproducibility of the method (Figure S7A).Although CALVADOS 1 was optimized using r c = 4 nm, predictions of single-chain compaction from simulations performed using r c = 2 nm are more accurate for CALVADOS 1 than for CALVADOS 2.   4.
This result can be explained by the opposing effects of decreasing the cutoff and calculating R g values as In fact, the 2 〈 〉 g R values predicted by CALVADOS 1 are strikingly similar to the ⟨R g ⟩ values predicted by CALVADOS 2 (Figure 7A).
The correlation between experimental and predicted R g values for the 67 proteins of Table 1 and Table 2 is excellent for both CALVADOS 1 and 2 (Figure 7B).On the other hand, CALVADOS 2 is more accurate than CALVADOS 1 at predicting PS propensities, as evidenced by Pearson's correlation coefficients of 0.93 and 0.82, respectively, for the experimental and predicted c sat values of the 26 sequences of Table 4 (Figure 7C).
Capturing the interplay between short-range nonionic and long-range ionic interactions is essential for accurately modeling the PS of IDPs 58,[78][79][80] .Our results show that the decrease in the range of the nonionic potential reported in this work does not significantly perturb the balance between ionic and nonionic forces.In fact, CALVADOS 1 and 2 accurately predict the PS propensities of A1 LCD * at various salt concentrations, as well as the c sat of variants of A1 LCD and LAF-1 RGG domain with different charge patterning (Figure 3E-G and Figure 6D-F).Moreover, CALVADOS 1 and 2 recapitulate the effect of salt concentration and charge patterning on the chain compaction of A1 LCD * 68 and p27-C constructs 67 , respectively (Figure 8A and 8B).
In the model, ionic interactions are also truncated and shifted.At the cutoff distance of 4 nm, the ionic energy decreases with increasing salt concentration and amounts to ±2.7 J mol -1 at c s = 150 mM and 20 °C.However, this energy is ∼ 43 times larger at c s = 10 mM, which suggests that the model may considerably underestimate the strength and range of charge-charge interactions at low salt concentrations.To investigate this aspect, we performed single-chain and direct-coexistence simulations using a longer cutoff of 6 nm for the ionic interactions.The change in cutoff has a small effect on both the R g (Figure S8A) and the c sat values predicted for systems at c s = 150 mM (Figure S8C).Conversely, simulations at low salt concentration are considerably affected by the increase in cutoff.For the PRE data of A2 LCD at c s = 5 mM, we observe an improvement in the agreement with experiments (Figure S8B).Instead, the accuracy of the phase behaviour predicted for A2 LCD at c s = 10 mM decreases significantly as the c sat value shows a ∼100-fold increase (Figure S8C).Since the vast majority of the available R g and c sat data in our training and test sets was measured at c s ≈ 150 mM, we are currently unable to further assess or improve the accuracy of the model at low salt concentrations.
As additional test systems, we considered constructs of the 1-80 N-terminal fragment of yeast Lge1, which have been recently investigated using turbidity measurements 71 .CALVADOS 2 correctly predicts that the WT Lge1 1-80 construct undergoes PS at the experimental conditions, albeit with a hundred times larger c sat (50 ± 6 µM at c s = 100mM) compared to experiments (< 1 µM).In agreement with experiments, CALVADOS 2 predicts that mutating all the 11 R residues to K increases c sat by over one order of magnitude whereas mutating the 14 Y residues of the 1-80 fragment to A abrogates PS (Figure 8C).

Conclusions
In the context of a previously developed Cα-based IDP model (CALVADOS), we show that neglecting the long range of attractive Lennard-Jones interactions has a small impact on the compaction of a single chain while strongly disfavouring PS.The effect can be explained by the smaller number of neglected pair interactions  4).  1) and test set (Table 2), respectively.Error bars represent the experimental error relative to exp g R .χ 2 values in the legend are averages over 67 different sequences or solution conditions (Table 1 and Table 2).(B and C) Comparison between experimental and predicted (B) R g (Table 1 and Table 2) and (C) c sat values for CALVADOS 1 (orange) and CALVADOS 2 (blue).Pearson's r coefficients are reported in the legend.Small squares in C show the same data as in Figure 6C-F whereas the large upward triangle, downward triangle, and circle show values for A2 LCD, FUS LCD, and Ddx4 LCD, respectively, at the conditions reported in Table 4. for a residues in an isolated chain compared to the dense environment of a condensate.Moreover, we find that the effect of reducing the range of interaction by a factor of two is relatively insensitive to sequence length and composition.Therefore, decreasing the cutoff of the Lennard-Jones potential of the Cα-based model engenders a similar generic effect on chain compaction and PS as a corresponding increase in temperature.We take advantage of this finding to solve the temperature mismatch of the CALVADOS model.Namely, we decrease the cutoff of the nonionic interactions from 4 to 2 nm and obtain accurate c sat predictions at the experimental conditions, whereas simulations at temperatures higher by 30 °C were required in the original implementation.Finally, we used the shorter cutoff to reoptimize the stickiness parameters of the model against experimental data reporting on single-chain compaction.The small expansion of the chain conformations is overcompensated by an overall increase in stickiness so that the resulting model tends to underestimate the experimental c sat values.By systematically increasing the cutoff used in the development of the stickiness scale, we find that performing the optimization using r c = 2.4 nm results in a model (CALVADOS 2) which yields accurate predictions from simulations run using r c = 2 nm at the experimental conditions.We present CALVADOS 2 as an improvement of our previous model by testing on sets of experimental R g and c sat data comprising 16 and 36 systems, respectively, which were not used in the parameterization of the model.This is a very important study as it shows to what extent the truncation of the short range interactions affect the dynamics of single chains and condensates of IDPs and how this feature can be used to balance the excess thermal energy needed to calibrate the original CALVADOS implementation.
I have the following comments.
Similarly, as it was done for the short-range interactions, the electrostatic interactions are also truncated (at a cutoff distance of 4 nm which is about 4 fold the Debye length at 300 K and 150 mM ionic strength).I wonder what the influence of this cutoff is.The authors could comment on that.
From equations 3 and 4, I understand that a temperature-dependent Debye length was used.How much does this length change when changing the temperature (relative to the cutoff)?I think it is important to comment on that in the paper.We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
could comment on that.
Response: We agree with the reviewers that the effect of the cutoff on the ionic interactions is an interesting aspect to examine.We first tested the effect on the single chain data.Although some proteins (A2 LCD and CAHSD) were simulated at low ionic strength (c s = 5-70 mM), we found that increasing the cutoff value from 4 to 6 nm leads to relative changes in the predicted R g values below 2.5% (Figure S8A).However, the effect on the predicted PRE data is more pronounced.In particular, for A2 LCD we observe a ~30% decrease in the X 2 PRE value (Figure S8B).
We also tested the effect on the phase behavior of A2 LCD (c s = 10 mM), A1 LCD* (c s = 150 mM), FUS LCD (c s = 150 mM), LAF-1 RGG domain (c s = 150 mM), and its shuffled variant (c s = 150 mM).As expected, increasing the cutoff from 4 to 6 nm has a negligible effect on the phase separation at high c s whereas a considerable increase in saturation concentration, c sat , is observed for simulations of A2 LCD at c s = 10 mM (Figure S8C).Coincidentally, we found that the c sat predicted for A2 LCD using the shorter cutoff is in better agreement with the reference experimental value.We now discuss this point in the results section: In the model, ionic interactions are also truncated and shifted.At the cutoff distance of 4 nm, the ionic energy decreases with increasing salt concentration and amounts to ±2.7 J mol -1 at c s = 150 mM and 20 ºC.However, this energy is ~43 times larger at c s = 10 mM, which suggests that the model may considerably underestimate the strength and range of charge-charge interactions at low salt concentrations.To investigate this aspect, we performed single-chain and direct-coexistence simulations using a longer cutoff of 6 nm for the ionic interactions.The change in cutoff has a small effect on both the R g (Figure S8A) and the c sat values predicted for systems at c s = 150 mM (Figure S8C).Conversely, simulations at low salt concentration are considerably affected by the increase in cutoff.For the PRE data of A2 LCD at c s = 5 mM, we observe an improvement in the agreement with experiments (Figure S8B).Instead, the accuracy of the phase behaviour predicted for A2 LCD at c s = 10 mM decreases significantly as the c sat value shows a ~100-fold increase (Figure S8C).Since the vast majority of the available R g and c sat data in our training and test sets was measured at c s ≈150 mM, we are currently unable to further assess or improve the accuracy of the model at low salt concentrations.
From equations 3 and 4, I understand that a temperature-dependent Debye length was used.How much does this length change when changing the temperature (relative to the cutoff)?I think it is important to comment on that in the paper.

Response:
The Debye length, D, used in the model shows a weak temperature dependence.For example, the relative change in D upon an increase in temperature from 4 to 50 ºC is only -3%.Ionic interaction energies between like-charge residues at a cutoff distance of 4 nm, u DH (r = 4 nm), also show a weak temperature dependence.At c s = 150 mM, u DH (r = 4 nm) is 2.6 J mol -1 at at 4 ºC and 2.8 J mol -1 at 50 ºC.We now discuss this point in the methods section: As previously observed [doi:10.1038/s43588-021-00155-3], accounting for the temperature-dependence of Er has a small effect on the predictions of the model.Indeed, the relative change in D upon an ○ increase in temperature from 4 to 50 ºC is only -3%.Similarly, at c s = 150 mM, the Debye-Hückel energy between like-charged residues at the cutoff distance, u DH (r = 4 nm), is 2.6 J mol -1 at at 4 ºC and 2.8 J mol -1 at 50 ºC.

What are the black lines in Fig 3B?
Response: We have clarified in the figure caption that the black error bars represent the relative error of the experimental measurement, i.e. σ exp / R g exp .

○
Intro 2nd paragraph: What is M1? M1 has not been defined.

Response:
We have changed the text to specify that M1 is the name used to refer to the CALVADOS 1 optimal stickiness parameters in our previous publication on the model (Tesei et al.DOI: 10.1073/pnas.2111696118).
○ Methods: simulation length of 6 x 0.3 x N 2 ps.What was the motivation to choose this particular simulation length?
Response: We have changed the text to clarify how we chose sampling frequencies and simulation lengths based on sequence length, N. Briefly, we saved every Δt ≈ 3 x N 2 fs if N>100, and Δt = 30 ps otherwise, to ensure that the radii of gyration for consecutive frames were weakly correlated irrespective of sequence length, N (Figure 1).The quadratic N-dependence of Δt was inferred from the lag time at which the autocorrelation function of the R g approximates 1/2, for a subset of proteins of different N. We simulated ten replicas per sequence, each for a simulation time of 600 x Δt .After discarding the initial 100 frames of each replica, we obtained 5,000 weakly correlated conformations for each protein.In the context of the optimization procedure, we observed that this number of frames is sufficient for accurately reweighting the trajectories, when the fractions of effective frames exceeds 60%.The updated version provides some interesting discussion, the data are all shared on GitHub and Zenodo, and all software parameters are made available in a convenient format.The paper is wellwritten, the methods well-described and logically motivated and the figures clear.
What more could one want from a paper?I strongly support indexing in its current form.
As a note, the conciseness of this review should not be seen as a lack of attention to detail.However, would any suggestions or comments I make materially affect the conclusions, clarity, or availability of data?I do not think so, and as such, it is in the author's best interest to use their time on the next set of questions than fine-tune what is already a very strong manuscript.

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and does the work have academic merit?Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?Yes Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Coarse-grained simulations of disordered proteins I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Values of the autocorrelation function (ACF) of the g R for a lag time of one, two and three frames as a function of sequence length, N. The autocorrelation is calculated for the sequences of Table 1 and Table 2 simulated using (A) CALVADOS 1 and (B) CALVADOS 2 for ∼ 6 × 0.3 × N 2 ps if N > 100 and for 18 ns otherwise.600 simulation frames are saved every ∼ 0.003 × N 2 ps if N > 100 and every 30 ps otherwise.The initial 100 frames are discarded from each simulation.

Figure 2 .
Figure 2. (A-T) Probability distributions of the stickiness parameters, P (λ), obtained from 70 min-max normalized hydrophobicity scales selected from the set by Simm et al.43 .Blue bars are histograms with bin width of 0.1.Black lines are obtained as 1D projections of the multivariate kernel density estimation implemented in scikit-learn44 , using a Gaussian kernel with bandwidth of 0.05.(U) Covariance matrix of the 70 min-max normalized hydrophobicity scales selected from the set by Simm et al.43 .The upper triangle of the matrix shows the covariance calculated directly from the 70 min-max normalized hydrophobicity scales whereas the lower triangle of the matrix shows the covariance calculated from the multivariate kernel density estimation averaging over the 70 min-max normalized hydrophobicity scales.

Figure 3 .
Figure 3.Effect of cutoff size on predictions of radii of gyration, g R , and saturation concentration, sat c , from simulations performed using the CALVADOS 1 parameters.(A) Nonionic Ashbaugh-Hatch potentials between two W residues with cutoff, r c , of 4 (blue solid line) and 2 nm (orange dashed line).The inset highlights differences between the potentials for r s ≤ r ≤ r c .(B) Relative difference between experimental and predicted radii of gyration, 〈 〉 g R , for r c = 4 (blue) and 2 nm (orange).2rχ values reported in the legend are calculated for all the sequences in Table1.Error bars represent the experimental error relative to exp g R . (C) 〈 〉 g R of α-Synuclein, hnRNPA1 LCD, PNt and human full-length tau (Table1 and Table 2) from simulations performed with increasing cutoff size, r c , and normalized by the value at r c = 2 nm.(D) Saturation concentration, c sat , for hnRNPA1 LCD, the randomly shuffled sequence of LAF-1 RGG domain, LAF-1 RGG domain and human full-length tau for increasing values of r c and normalized by the c sat at r c = 2 nm.(E-G) Correlation between c sat from simulations and experiments for (E) A1 LCD variants, (F) A1 LCD * WT at [NaCl] = 0.15, 0.2, 0.3 and 0.5 M and (G) variants of LAF-1 RGG domain (Table4).

Figure 4 .
Figure 4. (A) Comparison between nonionic energy difference per protein (∆U = U (r c = 2 nm) − U (r c = 4 nm), hatched) and N 2 × c con (orange), where N is the sequence length and c con is the molar protein concentration in the condensate.(B) Ratio between nonionic energies calculated with r c = 2 and 4 nm (open circles) compared to the ratio of the thermal energy, ′ RT RT , at T′ = T − 20 K and at T (orange), where R is the gas constant.(C) Increase in electrostatic energy relative to the thermal energy upon decreasing the temperature by 30 (black) and 20 K (orange).The data shown in this figure are obtained from simulations of hnRNPA1 LCD, LAF-1 RGG domain (WT and shuffled sequence) and Tau 2N4R performed at T = 310, 293, 323, and 277 K, respectively, and using r c = 4 nm.Error bars are standard deviations over trajectories of the systems at equilibrium.

Figure 5 .
Figure 5. Saturation concentrations, sat c , as a function of the cutoff used to optimize the model.c sat values are calculated from simulations performed using r c = 2.0 nm whereas the models are optimized using r c = 2.0, 2.2, 2.4, and 2.5 nm.Horizontal dotted lines represent experimental c sat values from the references reported in Table4.

Figure 7 .
Figure 7. (A) Relative difference between experimental and predicted radii of gyration for CALVADOS 1 (orange) and CALVADOS 2 (blue).Full and hatched bars show ( calc g R

Figure 8 .
Figure 8.Comparison between experimental R g values and predictions of CALVADOS 1 (orange) and CALVADOS 2 (blue) for (A) A1 LCD * at different salt concentrations (50 mM < c s < 500 mM) and (B) p27-C constructs of different charge patterning (0.1 < κ < 0.8).Experimental conditions and references are reported in Table 2. (C) Predictions of CALVADOS 2 direct-coexistence simulations of the PS of constructs of the 1-80 N-terminal fragment of yeast Lge1 simulated at c s = 100 mM.Protein concentration profiles are shown as a function of the long side of the simulation cell for WT (blue), -11R+11K variant (orange), and -14Y+14A variant (green).
phase transition of FUS liquid droplets and reversible hydrogels into irreversible hydrogels impairs RNP granule function.Neuron.2015; 88(4): 678-690.PubMed Abstract | Publisher Full Text | Free Full Text /doi.org/10.21956/openreseurope.16180.r29852© 2022 Gräter F et al.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Frauke Gräter 1 Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany 2 Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany Camilo Aponte-Santamaria 1 Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, Germany 2 Heidelberg Institute for Theoretical Studies (HITS), Heidelberg, GermanyTesei and Lindorff-Larsen present CALVADOS 2 which is an improvement of CALVADOS 1, their original coarse-grained model to simulate the dynamics of intrinsically disordered protein chains or condensates of them.They systematically studied the effect of the cutoff of the short-range interactions on the compactness of single chains and the phase behavior propensity.They optimized the model using a large set of intrinsically disordered proteins of varying amino acid length and for which experimental data exist.

○
Fig 4B, orange y-axis: title confusing.It is not the normalized temperature ratio but the kinetic energy ratio.

Table
1 and Table 2 simulated using (A) CALVADOS 1 and (B) CALVADOS 2 for ∼ 6 × 0.3 × N 2 ps if N > 100 and for 18 ns otherwise.600 simulation frames are saved every ∼ 0.003 × N 2 ps if N > 100 and every 30 ps otherwise.The initial 100 frames are discarded from each simulation.

Table 4 . Proteins and conditions used for the direct-coexistence simulations performed in this study and references to the experimental data.
Shaded rows highlight systems which are not included in the correlation plot of Figure 7C.

the work clearly and accurately presented and does it cite the current literature? Yes Is the study design appropriate and does the work have academic merit? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results? Yes Competing Interests: No
Intro 2nd paragraph: What is M1? M1 has not been defined.