Intrinsically disordered linkers determine the interplay between phase separation and gelation in multivalent proteins

Phase transitions of linear multivalent proteins control the reversible formation of many intracellular membraneless bodies. Specific non-covalent crosslinks involving domains/motifs lead to system-spanning networks referred to as gels. Gelation transitions can occur with or without phase separation. In gelation driven by phase separation multivalent proteins and their ligands condense into dense droplets, and gels form within droplets. System spanning networks can also form without a condensation or demixing of proteins into droplets. Gelation driven by phase separation requires lower protein concentrations, and seems to be the biologically preferred mechanism for forming membraneless bodies. Here, we use coarse-grained computer simulations and the theory of associative polymers to uncover the physical properties of intrinsically disordered linkers that determine the extent to which gelation of linear multivalent proteins is driven by phase separation. Our findings are relevant for understanding how sequence-encoded information in disordered linkers influences phase transitions of multivalent proteins.


eLife's transparent reporting form
We encourage authors to provide detailed information within their submission to facilitate the interpretation and replication of experiments. Authors can upload supporting documentation to indicate the use of appropriate reporting guidelines for health-related research (see EQUATOR Network), life science research (see the BioSharing Information Resource), or the ARRIVE guidelines for reporting work involving animal research. Where applicable, authors should refer to any relevant reporting standards documents in this form.
If you have any questions, please consult our Journal Policies and/or contact us: editorial@elifesciences.org.

Sample-size estimation
• You should state whether an appropriate sample size was computed when the study was being designed • You should state the statistical method of sample size computation and any required assumptions • If no explicit power analysis was used, you should describe how you decided what sample (replicate) size (number) to use Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission:

Replicates
• You should report how often each experiment was performed • You should include a definition of biological versus technical replication • The data obtained should be provided and sufficient information should be provided to indicate the number of independent biological and/or technical replicates • If you encountered any outliers, you should describe how these were handled • Criteria for exclusion/inclusion of data should be clearly stated • High-throughput sequence data should be uploaded before submission, with a private link for reviewers provided (these are available from both GEO and ArrayExpress) Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: The concept of sample size does not apply to our study. However, we did have to think deeply about the range of valencies that we "titrated" in our simulations as well as the range of linker lengths. These choices were made to connect with the experimental work performed in the Rosen lab.
Each simulation was performed at least 50 separate times using a different random seed for each trial. The goal was to generate reproducible statistics as judged by the similarities between calculated phase diagrams from one simulation to the next.

Statistical reporting
• Statistical analysis methods should be described and justified • Raw data should be presented in figures whenever informative to do so (typically when N per group is less than 10) • For each experiment, you should identify the statistical tests used, exact values of N, definitions of center, methods of multiple test correction, and dispersion and precision measures (e.g., mean, median, SD, SEM, confidence intervals; and, for the major substantive results, a measure of effect size (e.g., Pearson's r, Cohen's d) • Report exact p-values wherever possible alongside the summary statistics and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.
Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: (For large datasets, or papers with a very large number of statistical tests, you may upload a single table file with tests, Ns, etc., with reference to sections in the manuscript.)

Group allocation
• Indicate how samples were allocated into experimental groups (in the case of clinical studies, please specify allocation to treatment method); if randomization was used, please also state if restricted randomization was applied • Indicate if masking was used during group allocation, data collection and/or data analysis Please outline where this information can be found within the submission (e.g., sections or figure legends), or explain why this information doesn't apply to your submission: Additional data files ("source data") • We encourage you to upload relevant additional data files, such as numerical data that are represented as a graph in a figure, or as a summary table • Where provided, these should be in the most useful format, and they can be uploaded as "Source data" files linked to a main figure or table • Include model definition files including the full list of parameters used • Include code used for data analysis (e.g., R, MatLab) • Avoid stating that data files are "available upon request" Please indicate the figures or tables for which source data files have been provided: Most of the results we show are either explicit 1-and 2-dimensional histograms or the result of calculating first moments of specific quantities such as the fraction of molecules in the single largest cluster or the relative densities of molecules vs. the entire lattice. Given the large sizes of the simulation boxes, the large numbers of coarse-grained molecules that we deploy in the simulation cells, and the overall statistical efficiency of the sampling, the standard errors about the mean are typically in the second and third decimal places.
These criteria do not apply to our coarse-grained simulations because we do not allocate samples into experimental groups.