Information catalysis and intrinsically disordered proteins in large complexes

Background: Recent studies indicate that intrinsically disordered proteins (IDP) do not preferentially bind to chaperones in vivo, suggesting that their role is to either prevent pathological conformations, or to aid in the assembly of large complexes. This, in turn, suggests that large IDP complexes must form under the control of a sophisticated regulatory system of chemical cognition – in effect, an information catalysis – that, while it may have evolutionarily exapted existing chaperones, is likely to have evolved other, specialized, mechanisms or modalities of process modulation. Methods We model this using recently developed ‘statistical’ approaches from information theory that exploit the fact that information is itself a form of free energy. Results Information catalysis is found to arise directly from the ‘chain rule’ of information theory, via a statistical mechanical argument in which metabolic free energy both powers the transmission of information and applies available entropy as a tool to correct malformed complexes by local heating. Conclusions The regulatory mechanisms or modalities may well be IDP’s within the complexes themselves, an extension of the chaperone concept explaining the observed prevalence of disorder in large complexes. keywords: catalysis, information theory, rate distortion, regulation © 2012 Rodrick Wallace; licensee Herbert Publications Ltd. This is an open access article distributed under the terms of Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0),This permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. & Background The observation by Hegyi and Tompa [1] that intrinsically disordered proteins (IDP) display no preference for chaperone binding in vivo is striking: IDP are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Inferring the general from the particular, they suggest that its primary reason is not the assistance of folding, but promotion of assembly with partners, since IDP’s that bind to chaperones tend to bind to other proteins as well. These results might promote the idea of the extension and generalization of the chaperone concept. It seems, then, appropriate to suggest that one prime reason for IDP interaction with chaperones is to prevent amyloid formation. Others may be transport through physiological membranes and assistance for partner binding, i.e., assembly of complexes. Hegyi and Tompa note that IDPs have been observed in vitro to be very effective in binding, primarily manifested in binding to their partners at an increased speed. Their avoidance of chaperones, in general, may be related to this. When they do bind to chaperones, however, the reason might be that in vivo assembly of large complexes may be slowed by non-specific interactions, in the case of which chaperone assistance may be of help. For small – rapidly binding – complexes, Wallace [2] has described IDP reaction dynamics via a statistical mechanics approach to a ‘symmetry spectrum’ derived from a groupoid generalization of the wreath product of groups [3] that characterizes ‘conventional’ nonrigid molecule theory [4,5]. For large complexes, however, it seems likely that an even more general approach will be needed, one that reflects the operation of an elaborate regulatory system of chemical cognition analogous to what has been used to describe the immune system [6,7] or higher order neural and social function [8,9]. We will suggest that, while ‘ordinary’ chaperones may have been evolutionarily exapted (in the sense of Gould [10]) into regulatory function for large IDP complexes, other, less familiar, molecular regulators might well remain to be found. Tompa and Csermely [11], for example, have even gone so far as to suggest that IDPs within complexes may serve as self-chaperones, a significant generalization of the chaperone concept. We begin with some formal development, leading to the idea of cognitive control in large complexes. From the perspective of Atlan and Cohen [6], who introduce a cognitive paradigm for the immune system, cognition involves comparison of a perceived signal with an internal, learned or inherited, picture of the world, and, upon that comparison, choice of a single Rodrick Wallace. Journal of Proteome Science & Computational Biology 2012, http://www.hoajonline.com/journals/pdf/2050-2273-1-4.pdf 2 doi: 10.7243/2050-2273-1-4 response from a larger repertoire of possible responses. This inherently involves the transmission of information, since choice always necessitates a reduction in uncertainty ([12], p.21). Such cognition is, in a sense, routine, since even a thermostat would be cognitive from this perspective. The essential point is that large enough biological structures can follow a large multiplicity of possible ‘reaction paths’, and focus must thereupon shift from the details of the chemical machinery itself to the details of its behavior in the context of impinging external signals. Methods Symbolic dynamics of IDP complex formation Symbolic dynamics is a ‘coarse-grained’ perspective on physical systems that discretizes their time trajectories in terms of dynamically accessible regions so that it is possible to do statistical mechanics on symbol sequences ([13], Ch. 8) that can be said to constitute an ‘alphabet’. Within that ‘alphabet’, certain ‘statements’ are highly probable, and others far less so. The simple (ideal) oscillating reaction described by the equations dX dt Y dY dt X / = , / = ω ω − has the solution ( ) = sin( ), ( ) = cos( ) X t t Y t t ω ω so that 1 ) ( ) ( 2 2 ≡ + t Y t X , and the system traces out an endless circular trajectory in time. Divide the Y X − plane into two components, the simplest possible coarse graining, calling the halfplane to the left of the vertical Y axis A and that to the right B . This system, over units of the period 1/ (2 ) πω traces out a stream of A ’s and B ’s having a very precise grammar and syntax: ABABABAB... Many other such statements might be conceivable, e.g., AAAAAA..., BBBBB..., AAABAAAB..., ABAABAAAB..., and so on, but, of the infinite number of possibilities, only one is actually observed, is ‘grammatical’. More complex dynamical reaction models, incorporating diffusional drift around deterministic solutions, or elaborate structures of complicated stochastic differential equations having various domains of attraction – different sets of ‘grammars’ – can be described by analogous means ([14], Ch.3). Rather than taking symbolic dynamics as a simplification of more exact analytic or stochastic approaches, it is possible to generalize the technique to more comprehensive structures. Complicated cellular processes may not have identifiable sets of stochastic differential equations like noisy, nonlinear mechanical clocks, but, under appropriate coarse-graining, they may still have recognizable sets of grammar and syntax over the long-term. Proper coarse-graining may, however, often be the hard scientific kernel of the problem. The fundamental assumption for complicated biological reactions like the formation of large IDP complexes is that reaction trajectories can be classified into two groups, a very large set that has essentially zero probability, and a much smaller ‘grammatical’ set. For the grammatical/syntactical set, the argument is that, given a set of elaborate trajectories of length n , the number of grammatical ones, ) (n N , follows a limit law of the form


& Background
The observation by Hegyi and Tompa [1] that intrinsically disordered proteins (IDP) display no preference for chaperone binding in vivo is striking: IDP are extremely sensitive to proteolysis in vitro, but show no enhanced degradation rates in vivo. Inferring the general from the particular, they suggest that its primary reason is not the assistance of folding, but promotion of assembly with partners, since IDP's that bind to chaperones tend to bind to other proteins as well. These results might promote the idea of the extension and generalization of the chaperone concept. It seems, then, appropriate to suggest that one prime reason for IDP interaction with chaperones is to prevent amyloid formation. Others may be transport through physiological membranes and assistance for partner binding, i.e., assembly of complexes.
Hegyi and Tompa note that IDPs have been observed in vitro to be very effective in binding, primarily manifested in binding to their partners at an increased speed. Their avoidance of chaperones, in general, may be related to this. When they do bind to chaperones, however, the reason might be that in vivo assembly of large complexes may be slowed by non-specific interactions, in the case of which chaperone assistance may be of help.
For small -rapidly binding -complexes, Wallace [2] has described IDP reaction dynamics via a statistical mechanics approach to a 'symmetry spectrum' derived from a groupoid generalization of the wreath product of groups [3] that characterizes 'conventional' nonrigid molecule theory [4,5]. For large complexes, however, it seems likely that an even more general approach will be needed, one that reflects the operation of an elaborate regulatory system of chemical cognition analogous to what has been used to describe the immune system [6,7] or higher order neural and social function [8,9]. We will suggest that, while 'ordinary' chaperones may have been evolutionarily exapted (in the sense of Gould [10]) into regulatory function for large IDP complexes, other, less familiar, molecular regulators might well remain to be found. Tompa and Csermely [11], for example, have even gone so far as to suggest that IDPs within complexes may serve as self-chaperones, a significant generalization of the chaperone concept.
We begin with some formal development, leading to the idea of cognitive control in large complexes. From the perspective of Atlan and Cohen [6], who introduce a cognitive paradigm for the immune system, cognition involves comparison of a perceived signal with an internal, learned or inherited, picture of the world, and, upon that comparison, choice of a single doi: 10.7243/2050-2273-1-4 response from a larger repertoire of possible responses. This inherently involves the transmission of information, since choice always necessitates a reduction in uncertainty ( [12], p.21). Such cognition is, in a sense, routine, since even a thermostat would be cognitive from this perspective. The essential point is that large enough biological structures can follow a large multiplicity of possible 'reaction paths', and focus must thereupon shift from the details of the chemical machinery itself to the details of its behavior in the context of impinging external signals.

Symbolic dynamics of IDP complex formation
Symbolic dynamics is a 'coarse-grained' perspective on physical systems that discretizes their time trajectories in terms of dynamically accessible regions so that it is possible to do statistical mechanics on symbol sequences ( [13], Ch. 8) that can be said to constitute an 'alphabet'. Within that 'alphabet', certain 'statements' are highly probable, and others far less so. The simple (ideal) oscillating reaction described by the Many other such statements might be conceivable, e.g., AAAAAA..., BBBBB..., AAABAAAB..., ABAABAAAB..., and so on, but, of the infinite number of possibilities, only one is actually observed, is 'grammatical'.
More complex dynamical reaction models, incorporating diffusional drift around deterministic solutions, or elaborate structures of complicated stochastic differential equations having various domains of attraction -different sets of 'grammars' -can be described by analogous means ( [14], Ch.3).
Rather than taking symbolic dynamics as a simplification of more exact analytic or stochastic approaches, it is possible to generalize the technique to more comprehensive structures. Complicated cellular processes may not have identifiable sets of stochastic differential equations like noisy, nonlinear mechanical clocks, but, under appropriate coarse-graining, they may still have recognizable sets of grammar and syntax over the long-term. Proper coarse-graining may, however, often be the hard scientific kernel of the problem.
The fundamental assumption for complicated biological reactions like the formation of large IDP complexes is that reaction trajectories can be classified into two groups, a very large set that has essentially zero probability, and a much smaller 'grammatical' set. For the grammatical/syntactical set, the argument is that, given a set of elaborate trajectories of length n , the number of grammatical ones, ) (n N , follows a limit law of the form such that H both exists and is independent of path. If convergence occurs for some finite H n , then the process is said to be of order H n . This is a critical foundation of, and limitation on, the modeling strategy adopted here, and constrains its possible realm of applicability. It is, however, fairly general in that it is independent of the serial correlations along reaction pathways. The basic argument is shown in figure 1, where an initial IDP/ partner configuration, 0 S , can either converge on a normal large IDP complex f S via the set of high probability reaction paths to the left of the filled triangle, or it can converge to a thermodynamically competitive pathological state path S to the right, for example an amyloid intermediate or fiber.
The astute observer will have noted that we are, via coarsegraining and symbolic dynamics, assigning classic information sources to the two sets of thermodynamically competitive 'grammatical' pathways. The essential question is how a regulatory catalysis -the generalized chaperones of Hegyi and Tompa [1] -can act in such a circumstance to raise the probability of convergence on f S .

The dual information source of a cognitive regulatory process
The first step in answering that question lies in describing the activity of a large class of regulatory chaperone activity in terms of another information source. To reiterate, Atlan and Cohen [6], in the context of a study of the immune system, argue that the essence of cognition is the comparison of a perceived signal with an internal, learned picture of the world, and then choice of a single response from a large repertoire of possible responses. Such choice inherently involves information and information transmission since it always generates a reduction in uncertainty. Thus structures that process information are constrained by the asymptotic limit theorems of information theory, in the same sense that sums of stochastic variables are constrained by the Central Limit Theorem, allowing the construction of powerful statistical tools useful for data analysis.
More formally, a pattern of incoming input i for some unspecified function f . The i a are seen to be very complicated composite objects, in this treatment that we may choose to coarse-grain so as to obtain an appropriate 'alphabet'.
In a simple spinglass-like model, S would be a vector, W a matrix, and f would be a function of their product at 'time' i . The path x is fed into a highly nonlinear decision oscillator, h a 'sudden threshold machine' pattern recognition structure, in a sense, that generates an output While the combining algorithm, the form of the nonlinear oscillator, and the details of grammar and syntax, can all be unspecified in this model, the critical assumption that permits inference of the necessary conditions constrained by the asymptotic limit theorems of information theory is that, again, the finite limit Call such a pattern recognition-and-response cognitive process ergodic. Not all cognitive processes are likely to be ergodic in this sense, implying that H , if it indeed exists at all, is path dependent, although extension to nearly ergodic processes seems possible [9].
Invoking the spirit of the Shannon-McMillan Theorem, as choice involves an inherent reduction in uncertainty, it is then possible to define an adiabatically, piecewise stationary, ergodic (APSE) information source X associated with stochastic variates This information source is defined as dual to the underlying ergodic cognitive process.
Adiabatic means that the source has been parameterized according to some scheme, and that, over a certain range, along a particular piece, as the parameters vary, the source remains as close to stationary and ergodic as needed for information theory's central theorems to apply. Stationary means that the system's probabilities do not change in time, and ergodic, roughly, that the cross sectional means approximate long-time averages. Between pieces it is necessary to invoke various kinds of phase transition formalisms, as described more fully in e.g., [8].

Information catalysis
In the limit of large n , n n N H n )]/ ( [ log lim = ∞ → becomes homologous to the free energy density of a physical system at the thermodynamic limit of infinite volume. More explicitly, the free energy density of a physical system having volume V and partition function ) (β Z derived from the system's Hamiltonian -the energy function -at temperature β is [6] The latter expression is formally similar to the first part of equation (3), a circumstance having deep implications: Feynman [17] describes in great detail how information and free energy have an inherent duality. Feynman, in fact, defines information precisely as the free energy needed to erase a message. The argument is surprisingly direct [18], and for very simple systems it is easy to design a small (idealized) machine that turns the information within a message directly into usable work -free energy. Information is a form of free energy and the construction and transmission of information within living things consumes metabolic free energy, with inevitable losses via the second law of thermodynamics.
Information catalysis, in the circumstance of figure 1, arises most simply via the 'information theory chain rule' [15]. Given X as the information source representing the reaction paths of figure 1, and Y , an information source dual to the sophisticated chemical cognition of the generalized chaperone mechanisms of Hegyi and Tompa [1], one can define jointly typical paths Of necessity, then, These relations imply that, by means of the identification of information as a form of free energy, at the expense of adding the considerable energy burden of the regulatory apparatus, represented by its dual information source Y , it becomes possible to canalize the reaction paths of Within a cell, however, there will be an ensemble of possible reactions, driven by available metabolic free energy, so that, taking Q as representing an average of the variable Q , .
suggests an explicit free energy mechanism for reaction canalization, at the considerable expense of maintaining an embedding regulatory environment.
That is, quite counterintuitively, entropic loss -small κ -can be a powerful tool for regulating complex biological phenomena, in much the same sense that Tompa and Csermely [11] propose that entropy transfer can be used by generalized chaperones to trigger proper conformation in pathologically folded protein complexes.

The entropy transfer model
As noted, Tompa and Csermely [11] have suggested a localized heating model for IDP chaperone activity likely to be of considerable importance in large complex formation. The basic idea is that a large complex can become trapped in a local free energy minimum far from the physiologically required conformation. An IDP chaperone, having manyto-one binding flexibility, can 'cool' itself by binding to the misformed complex, and transferring entropy to kick the complex out of its misformed state, allowing it to continue a global search for the proper shape. The basic relation is the classic where S is the entropy change, Q the heat exchanged, doi: 10.7243/2050-2273- [1][2][3][4] and T the local temperature. Searching out and finding such misfolded complexes is, from the perspective of this analysis, a cognitive process, and the relation of entropy transfer to that cognition is subtle, requiring some formal development that revolves around the rate distortion function.
In general, transmission of information in real systems is not without error, and it is useful to ask about the minimum average error desired under a specific level of transmission channel noise, and the minimum channel capacity needed to achieve that average distortion, say D . The Rate Distortion Theorem shows that, for all possible distortion measures (surprisingly), there is a minimum channel capacity, R , that permits transmission below a given average value D . For a Gaussian channel, affected by random noise with zero mean and variance 2 σ , the relation has been calculated for the squared distortion measure as [15].
All possible such expressions can be shown to be convex in R and D , that is, reverse J-shaped [15]. This is an exceedingly powerful general condition [19,20].
Given that the cognitive chaperone system wishes to achieve a minimum distortion D in the complex it chooses to heat up, where does the metabolic free energy to direct cognition come from, or, more relevant, how much is needed, and, once used for cognitive purposes, where does it go? The Second Law of thermodynamics requires that free energy transfer (almost) always involves losses due to heatingentropy increase.
Again, the simplest model, given an available metabolic free energy intensity M , is that the probability density for a particular value of the cognitive channel capacity R -a free energy measure, based on Feynman's arguments -will be given as  (11) and demands for metabolic free energy can rise rapidly with required channel capacity, or its equivalent, a lessening of average distortion between what is wanted and what is observed by the cognitive regulatory system -the generalized chaperones.
The essential inference is that a most elegant and parsimonious use of the waste energy necessarily generated by such a cognitive process would be to heat large molecular complexes trapped at some malconformed intermediate -the proposed entropy transfer mechanism of Tompa and Csermely [11].

Discussion
Wallace [2], using a groupoid extension of conventional nonrigid molecule theory, introduced a literally astronomically large spectrum of possible symmetry classifications for small IDP/partner complexes. The size of the appropriate symmetry group (or groupoid) must grow exponentially in the number of amino acid bases within the flexible IDP frond. For 30 to 100 bases, the nonrigid symmetry set is indeed astronomical, and can only be addressed by a statistical mechanics argument. The understanding of large IDP/partner complexes, as inferred from the work of Hegyi and Tompa [1], faces even greater difficulties, since it appears compounded by a generalized chaperone regulatory structure that is likely to be another example of sophisticated chemical cognition, akin to the immune system. Given the Wallace results, cognitive biochemical processes regulating large IDP/partner complexes are not likely to yield to exact 'chemical' description, not only from considerations of symmetry group magnitude, but because their dynamics are particularly contingent on signals that may themselves arise from higher level, embedding, cognitive regulatory processes. However, such behaviors, in terms of the dual information source, are nonetheless constrained by the asymptotic limit theorems of information theory, and this may allow construction of regression model-like statistical tools useful for scientific inference, focusing on the behaviors of the chaperone system rather than on a detailed description of its mechanical function under all circumstances. The analogy is to describe the behavior of a computer in terms of its program, rather than attempting provide a full cross-sectional description of the state of each logic gate after each clock cycle.

Conclusions
Known chaperones may have undergone evolutionary exaptation -cooption of one adaptation/correlation for another purpose [10] -so as to contribute to regulating large IDP/partner complexes, as Hegyi and Tompa [1] speculate. This does not preclude the evolution or exaptation of other processes, mechanisms, or chemical species, into a similar role. That is, there may well be many other generalized chaperones for the control of large IDP/partner complexes.
Indeed, following Tompa's lead, a direct argument can be made as follows: Hegyi et al. [21], found that larger complexes need to have more disorder for successful assembly. Tompa and Csermely [11] directly stated that IDPs can themselves be chaperones. It is not at all far-fetched that they might also be involved in the assembly of large complexes. This suggests that some measure of self-chaperoning is provided by IDPs within complexes themselves, a significant extension of the chaperone concept doi: 10.7243/2050-2273- [1][2][3][4] in the spirit of the information catalysis we have suggested here, and would naturally explain the prevalence of disorder in large complexes, serving as an internal chaperoning element.

Competing interests
There are no competing interests.