Mechanistic roles of protein disorder within transcription

Understanding the interactions of proteins involved in transcriptional regulation is critical to describing biological systems because they control the expression proﬁle of the cell. Yet sadly they belong to a less well biophysically characterized subset of proteins; they frequently contain long disordered regions that are highly dynamic. A key question therefore is, why? What functional roles does protein disorder play in transcriptional regulation? Experimental data exemplifying these roles are starting to emerge, with common themes being enabling complexity within networks and quick responses. Most recently a role for disorder in mediating phase transitions of membrane-less organelles has been proposed. binding. In some cases IDRs binding is associated with a folding reaction. Two extreme mechanisms have been outlined for these processes. In conformational selection only correctly folded members of the structural ensemble are binding-competent and folding precedes binding. In induced fit all members of the ensemble are able to bind, and subsequently complete any remaining folding. This describes potential mechanisms of coupled folding and binding, how they might be discriminated on the basis of kinetic studies, whilst providing an overview of current ﬁndings. dissociation rate constants This paper uses kinetic studies to characterise the fast association process between the transcription coactivator CBP-KIX and various ligands. It also describes a general mechanism of allostery between the two ligand binding sites of CBP-KIX based on its reduced dynamics upon binding either ligand. of single molecule ﬂuorescence, and kinetic stopped-ﬂow are to examine the nature of the multi-valent interaction FG-Nups and molecules, takes in the nuclear pore complex. FG-Nups remain highly with several single-amino acid fast, p53-CTD ﬁve ligands conformations. towards facilitate folding mechanisms of coupled folding binding found


Introduction
A third of eukaryotic proteins contain long disordered regions, that is, they do not have a well-defined tertiary structure, but can have varying degrees of fluctuating secondary structure and tertiary contacts. This prevalence of protein disorder, and the observation from bioinformatics studies that it can be evolutionarily conserved, have led to the suggestion that it plays important biological (functional) roles. But what are these roles? Early bioinformatic studies provided clues through cataloging proteins that are predicted to be intrinsically disordered (IDPs) or contain long intrinsically disordered regions (IDRs) [1]. It elapsed that IDPs and IDRs are overrepresented in signaling processes such as transcription [1]. In fact, over half of eukaryotic transcription factors are predicted to be mostly disordered, as a result of their depletion in 'order-promoting' residues (such as bulky hydrophobic amino acids) and enrichment in 'disorderpromoting' residues (such as polar and charged residues, glycine and proline) [2]. Protein disorder is not restricted to long 'linker' regions, but often includes identified interacting domains. Activation domains, which act to recruit the transcriptional machinery, are particularly disordered [2]. And whilst most DNA binding domains are ordered, many others are disordered. For example the second largest category of transcription factors (and the largest that operates solely within eukaryotes) is the bZIP family, whose basic disordered DNA binding domains fold into helices only upon DNA binding [3,4]. AT-hooks, which bind in the minor groove of AT-rich sequences, are also predicted to be nearly totally disordered [2]. Furthermore the regions directly flanking structured DNA binding domains exhibit significant disorder [5].
IDRs are extremely dynamic, adopting a myriad of highly heterogenous conformations that rapidly interconvert over a range of timescales [6,7]. They may have varying levels of fluctuating secondary structure, and even tertiary contacts. In some cases transient helical content can be quite high, for example, 70% in a segment of cMyb transactivation domain [8]. In others IDRs are fairly well described by random coil models, or as more 'structured' molten globules, for example, the NCBD domain of the transcription co-activator CBP [9]. Describing the structural heterogeneity of IDPs is challenging but over the last few years methods for combining multiple experimental measurement types, to generate representative ensembles have emerged [10]. A database of structural ensembles, pE-DB (http://pedb.vib.be/), is accessible online and currently holds 24 entries, none of which are transcription factors or coactivators [11].
IDRs frequently make specific interactions with macromolecular binding partners, that is, other proteins, or nucleic acids. These reactions can be associated with folding of the disordered region, which is known as coupled folding and binding [12,13,14 ] In reality all of these bound proteins are probably best viewed as lying at some point along a spectrum of disorder, just as is the case for the unbound state ( Figure 1).
Clearly regions that are disordered in the free and/or bound states of transcription factors play important biological roles. Removing these regions is expected to, and has been shown in many cases to, alter key parameters such as binding affinity and 'switch-like' properties. This is slightly distinct to uncovering a functional role of the biophysical property of protein disorder itself. It is the current evidence for some of these potential biological advantages of disorder within transcription factors that are reviewed here. Many of the principles discussed can apply equally to proteins with other cellular roles, especially for those involved in signalling. Finally, it is worth noting that although we concentrate here on the potential mechanistic roles of disorder, disorder may also present genetic advantages through improved possibilities for alternative splicing and enhanced evolutionary rates.

Fast binding
Transcriptional processes must respond quickly to potential changes in conditions, despite potentially low concentrations of the proteins involved. Consequently, one might assume that reactions will be characterized by high association rate constants. The limited kinetic evidence assembled so far indicates this may be the case. Examples of fast binding reactions are found with peptides of the disordered transcription factors that bind to the structured KIX domain of the co-activator CBP [17 ]. Indeed, if electrostatic enhancements are neglected, cMyb.KIX is the fastest reported reaction for formation of a structured complex [18]. Kinetic experiments using more ordered versions of cMyb have revealed similar binding rates, suggesting that the process follows an induced fit mechanism [17 ,19]. In pure induced fit mechanisms all proteins within the structural ensemble are able to bind to the partner protein, and subsequently fold (Figure 2). Similar behavior was observed this year in simulations of cMyb-KIX binding [20]. This contrasts with a pure conformational selection mechanism where only pre-folded proteins within the ensemble are binding-competent (Figure 2), and binding rates are decreased accordingly [12]. Since both induced fit mechanisms and fuzzy complexes remove this obstacle these may later elapse to be favored within transcription. Both mechanisms actually have the potential to modestly increase association rates over that of folded proteins by increasing the capture radius of the disordered protein and/or increasing the reactive surface of the target [21,22,23 ]. Whether this represents a functional advantage for disorder is less clear since pronounced enhancements in rates could anyway be achieved by altering charge interactions. So far there is currently little evidence that disordered proteins in general are characterized by different binding rates to those of folded proteins [18,24].

Weak binding
For proteins which undergo coupled folding and binding, binding affinities will be lowered by the disordered nature of the binder due to the energetic cost of folding it. On  has been observed to form additional a-helical structure upon binding the Mediator subunit Gal11 (pink), but not to bind in a defined orientation [59]. Structural ensemble was generated by combining NOE and spinlabelling data. PDB code 2LPB. (c) cartoon picture of a domain (green) that remains highly disordered whilst interacting with its partner (black) e.g. glutamine-rich domains of transcriptional activator Sp1 and TAF4 (component of TFIID) [60].
average equilibrium constants (K d ) are higher for complexes formed from IDRs in comparison with folded counterparts [22,24,25 ]. Tighter binding has been reported in cases where the protein is pre-structured before binding, for example, more helical versions of transcription factor p53, generated by chemical stapling or mutagenesis, bind to MDM2 with higher affinity [26,27]. Cellular studies further demonstrate that resulting altered p53 dynamics lead to failure to induce cell cycle arrest upon DNA damage [27]. Given typical physiological concentrations of DNA binding sites and transcriptional regulators this allows partial binding occupancies in vivo, which should be advantageous for signaling purposes. However disorder is not necessary; occupancy can be controlled by altering cellular concentrations, and binding affinities of both folded and disordered proteins range over many orders of magnitudes so it is possible for an IDP to form extremely stable complexes and vice versa [22,24].

A timely off-switch for a specific interaction
Specificity is important to control off-target effects in the cell, and is broadly related to the extent of specific stabilizing interactions present in the complex, that is, enthalpic gain of binding. This would generally dictate a low dissociation rate constant (k off ) for specific binding, however for signaling a high k off is desirable. Zhou has pointed that both might be achieved if the enthalpic gain is balanced with an entropic loss due to structure formation [28]. In support of this, assembled literature data have suggested that the interfaces for complexes formed by disordered proteins have a similar composition to those of folded proteins, but are on average associated with a 2.5 kcal/mol larger entropic cost [25 ]. Theoretically this cost might manifest in either k on or k off , however compiled data indeed indicate that this energetic difference results from changes in the lifetime of complexes [24]. This matches well with evidence from kinetic studies that disorder content largely modulates k off (rather than k on ) of cMyb from its partner CBP-KIX [17 ,19], and coarsegrained molecular dynamics simulations suggesting the same for the pKID region of CREB with CBP-KIX [29]. The effect is not always solely on k off however, for example for less structured mutants of ACTR changes in k on and k off contributed equally to destabilization of the complex with its partner NCBD [30]. Such differences are related to the mechanism of coupled folding and binding, and in particular the extent of structure formation in the transition state for binding.

(Dynamic) allosteric coupling
It is becoming clear that coupling between distant binding sites cannot always be interpreted in terms of visible propagated structural changes but are sometimes mediated by changes in protein dynamics and flexibility [31]. Most examples documented so far are for partially stable folded domains where loops or backbone/side-chain dynamics are suppressed upon binding either ligand. In such cases the protein stiffening upon binding the second ligand is reduced, thereby reducing entropic penalties [17 ,32]. However it has been suggested that this phenomenon, known as dynamic allostery, might be particularly effective for disordered proteins [33]. An enlightening example is the anti-toxin Phd, which is a transcriptional repressor with an intrinsically disordered C-terminal domain. Coupled folding and binding of this domain with its partner toxin Doc is associated with a stabilization of the folded DNA binding domain of Phd that enhances its DNA binding affinity [34]. More recently the same group have been able to demonstrate a critical role for disorder in modulating conditional cooperativity for the binding interaction between Phd and its cognate DNA sites. Strong repression activity of Phd requires two Phd dimers bound to nearby operator sites. Because of the proximity of operator sites this state requires conformational restriction of the two disordered tails that comes with a large entropic penalty [35 ]. The Conformational selection

Induced fit Current Opinion in Structural Biology
Mechanisms of coupled folding and binding. In some cases IDRs binding is associated with a folding reaction. Two extreme mechanisms have been outlined for these processes. In conformational selection only correctly folded members of the structural ensemble are binding-competent and folding precedes binding. In induced fit all members of the ensemble are able to bind, and subsequently complete any remaining folding.
tails thus mediate a negative cooperativity of binding the two sites by providing an entropic barrier. Pleasingly, mutagenesis studies showed the size of this barrier to increase with increasing disordered nature of the tail [35 ]. In the presence of Doc however the tail becomes more structured and the negative cooperativity is switched to positive cooperativity [35 ].

Flexible linkers and autoinhibitory tails
Transient self-interaction of disordered regions of proteins with their own folded domains can also modulate 158 Folding and binding binding behavior by competing with the ligands for the folded domain. For example the disordered N-terminal domain of PC4, a cofactor that recruits general transcription factors, also acts to reduce DNA binding affinity by forming transient interactions with the structured DNAbinding domain [41]. Hub proteins, such as the co-activator CBP, commonly have long flexible linkers between their domains. In this context flexibility may be important in enabling the required contacts to be made around promoter regions with differing topologies [42,43]. In some cases they may also harbor as yet unknown binding motifs [43]. The properties, for example, compactness of disordered linkers and tails are governed by amino acid sequence [44 ]. Importantly they can be altered by posttranslational modifications such as phosphorylation, for example, multi-site phosphorylation of the disordered Nterminus of PC4 enhances its previously described autoinhibitory effect [45]. Flexible regions may be important for assisting sliding of DNA binding domains along DNA, thus enhancing cognate site search. Simulations of homeodomains have shown that non-specific interactions of the N-terminal tail with DNA allow the protein to 'monkey bar' between separate DNA strands [46]. Specific and non-specific DNA binding of largely structured DNA binding domains are characterized by slightly different conformations [47].

Promiscuous binding
It has been proposed that disorder might enable greater binding promiscuity, which would be a potential functional advantage for proteins involved in signaling. Rather notoriously the same short region of p53 has been shown to bind in at least four different conformations, and by different mechanisms, with different partner proteins [48 ]. Despite interest in this potential role there remain few structurally characterised examples within the literature, and the ability to bind to multiple targets is certainly not restricted to IDPs.

Phase transitions
Recently a role for intrinsic disorder in mediating phase transitions has been described. Several membrane-less organelles, or nuclear bodies, exist within the nucleus: nucleoli, nuclear speckles, paraspeckles, PML and cajal bodies. They are formed via phase transitions similar to those in polymer condensation theory, and are described as dynamic liquid-like droplets, consisting of nucleic acids and proteins in continual exchange with the surrounding milleau [49][50][51]. The proteins found in these assemblies are frequently highly disordered, with low complexity sequences, and include transcriptional regulators [49][50][51][52][53]. They are also often associated with DNA or RNA binding. IDRs provide the opportunity for extensive multivalent interactions that are thought to drive formation of nuclear bodies [50], which play a functional role by sequestering proteins to modulate their concentrations, for example, MDM2, which controls p53 levels, is stored in the nucleolus [51]. Nuclear bodies provide a localized environment where reactants may be brought in close proximity and under altered environmental conditions [54 ]. For example PML bodies are involved in recruiting CBP, p300, RNA pol II, and transcription factors and co-repressors such as p53 and HIPK2 [55]. It has also been suggested that locally high concentrations of EWS domains (containing (G/S)Y(G/S) and SYGQQS repeats) upon DNA binding might lead to phase transition to form an assembly that interacts with RNA and even be part of the transcriptional machinery [53].

Conclusion
It is almost 30 years since the activation domains of transcription factors were described as 'negative noodles' [56] and bioinformaticians have now demonstrated conclusively that proteins involved in transcription have unusually high disorder contents [2]. Recent work shows that disordered proteins can have markedly different structural ensembles, that is, we have seen different 'flavors' of disorder. In light of this we should consider what roles each type of protein disorder play within transcriptional processes. When is disorder advantageous, or even necessary? Proposed roles generally revolve around features advantageous within signaling networks; speed, complexity and adaptability of response (Figure 3). Emerging work starts to describe functional roles for disorder, but further experimental and simulation studies are still needed for demonstrating these roles directly and much remains to be uncovered. Transcriptional regulators and factors are highly desirable drug targets in the case of disease. They have traditionally been considered 'undruggable' [57] but the recent selective targeting of a disordered region of TFIID highlights the opportunities [58 ]. Understanding their mechanisms of interactions may suggest optimal points within which to interfere with transcription, as well as assisting the development of artificial transcription factors.

17.
Shammas SL, Travis AJ, Clarke J: Allostery within a transcription coactivator is predominantly mediated through dissociation rate constants. Proc Natl Acad Sci U S A 2014, 111:12055-12060. This paper uses kinetic studies to characterise the fast association process between the transcription coactivator CBP-KIX and various ligands. It also describes a general mechanism of allostery between the two ligand binding sites of CBP-KIX based on its reduced dynamics upon binding either ligand.

25.
Teilum K, Olsen JG, Kragelund BB: Globular and disorderedthe non-identical twins in protein-protein interactions. Front Mol Biosci 2015, 2:1-6. An extensive analysis of the thermodynamic data associated with proteinprotein interactions made by folded and disordered proteins, including analysis of the nature of the interaction surfaces. Complexes formed by disordered proteins are weaker, the difference in affinities arising from the entropic term.
phd-doc operon. Reducing the disorder content of the tail by coupled folding and binding with the toxin Doc alleviates the effect resulting in conditional co-operativity, depending upon Doc concentrations.

44.
Das RK, Ruff KM, Pappu RV: Relating sequence encoded information to form and function of intrinsically disordered proteins. Curr Opin Struct Biol 2015, 32:102-112. This recent review describes our current understanding of how conformational ensembles are encoded by amino acid sequences, aiming to classify IDRs by type.