Non-equilibrium statistical physics, transitory epigenetic landscapes, and cell fate decision dynamics

Statistical physics provides a useful perspective for the analysis of many complex systems; it allows us to relate microscopic fluctuations to macroscopic observations. Developmental biology, but also cell biology more generally, are examples where apparently robust behaviour emerges from highly complex and stochastic sub-cellular processes. Here we attempt to make connections between different theoretical perspectives to gain qualitative insights into the types of cell-fate decision making processes that are at the heart of stem cell and developmental biology. We discuss both dynamical systems as well as statistical mechanics perspectives on the classical Waddington or epigenetic landscape. We find that non-equilibrium approaches are required to overcome some of the shortcomings of classical equilibrium statistical thermodynamics or statistical mechanics in order to shed light on biological processes, which, almost by definition, are typically far from equilibrium.

that we have a (high-dimensional) function, f (x), such that the rate of change in x is given by (1) with "cell states", x * , corresponding to the locally stable stationary points of the dynamical system, calculated from f (x * ) = 0, that is those solutions for which the eigenvalues of the corresponding Jacobian (assuming linear stability analysis suffices), are all less than zero [5,6]. Where linear stability analysis is not sufficient to assess the (local) stability, other, more involved methods need to be invoked.
The stochastic differential equation (SDE) counterpart to Eq (1) takes the form where g(x) captures the functional form of the stochastic contribution to the dynamics, and W t is a Wiener Process increment. The stability of stationary points of deterministic dynamical systems under the influence of stochastic dynamics is in general difficult to assess analytically [7,8].
Instead, in most cases, approximations or simulations are required.
What we are interested in are new ways of determining and analysing the functional form of f (x) (and g(x)). This would open up the possibility of making more and better mechanistic models in cell biology. With the exception of a handful of simple models of e.g., embryonic [9][10][11][12] and haematopoietic stem cells [13], we have precious few mathematical models of the relevant differentiation systems. Analysis of data is thus largely descriptive, but even from these descriptions we can learn or distill some important lessons that could inform mechanistic modelling in the future [14]. Three such examples include molecular noise, the dynamics of gene regulation, and the time it takes for cell-fate decisions to take place. First, noise in gene expression, or cell to cell heterogeneity, appears to be closely associated with the transition between cell states [15][16][17]. Second, there is clear evidence that the regulation at the gene expression level is highly dynamic and shaped by factors at the epigenetic, transcriptomic, proteomic, and post-translational modification levels [18][19][20][21]. We cannot describe this in terms of static gene regulatory networks, and instead need to develop explicitly dynamical descriptions; even then we need to take into account the uncertainty in these networks [22]. Third, the timing of a transition appears to indicate that differentiation is a non-Markovian process [23].
In the following we will outline a set of qualitative frameworks for the analysis of cell differentiation dynamics, developing their connections, and follow one of these, non-equilibrium statistical mechanics, further in order to characterise transitions between states or cell fates.

Theoretical descriptions of cell fate decisions
The theory of dynamical systems offers a set of tools that allow us to investigate developmental processes. There is already a set of well studied problems, including different stem cell differentiation systems [9][10][11][12][13], segmentation in insect development [24], neural tube formation [25], and Turing patterns [26,27]. But for the vast majority of systems of concrete biological interest we lack such mechanistic descriptions.
In addition to developing these models from the bottom up, we can also take a more abstract perspective, again grounded in the theory of dynamical systems. Below we outline two such approaches.

Stem Cell
D iff e r e n t ia t io n C e ll -t y p e sp e c ifi c Figure 1: Example of a Cusp Catastrophe and its relationship to developmental dynamics. The two control variables are a differentiation marker, which leads to loss of stem-like properties, and factors affecting cell-identity, such as the relative abundance of competing transcription factors.

Qualitative dynamics of cell differentiation
If we identify cell fates with the stationary points, x * , of a dynamical system, then, even if we do not know the structure and form of the dynamical system, i.e., we do not know the mathematical form of f (x) in Eq (1) or Eq (2), we can still make some general qualitative statements. Work in this area, especially by René Thom [28], was in fact partly inspired by problems in developmental biology.
Catastrophe Theory [28][29][30] was developed to characterise the qualitative behaviour exhibited by dynamical systems as system parameters change. A central tenet of this approach, borne out empirically as well, is that in many cases even high-dimensional systems can be understood in terms of dynamics of a much lower dimensional system. Here the evocative term "catastrophe" refers to a sudden change in the qualitative nature of the set of solutions of a dynamical system, caused by a smooth or small change in some model parameter, system variable, or control input.
In the example in Figure 1 such a change is observed in the number of fates that are accessible, as cell-type specific markers are varied. There are regions in parameter space where either one of the fates (Fate 1 or Fate 2) is realised, and regions where both can co-exist. As the differentiation factor is reduced, only a single state, the stem-like state, can exist. In the language of catastrophe theory this is an example of a cusp catastrophe.
Despite the appeal of this framework there are considerable limitations, including the fact that many of the theoretical results are limited to gradient systems, although stability properties can apply more widely [31]. Certainly a central cornerstone of catastrophe theory, structural stability [29], will have wider implications. We mean by this that the qualitative behaviour of a mathematical model or theoretical system remains stable even when the model is changed slightly. Such structural stability of a model would be desirable, as we know that real-world systems differ from our models, and often quite considerably so. Clearly we should therefore strive in our modelling to only consider structurally stable systems.

Models of epigenetic landscapes
Following Waddington's groundbreaking conceptual work-which has had great influence on Rene Thom's work [28]-the well-known epigenetic landscape was long primarily seen as a useful metaphor for developmental processes [32]. In the 21st century, however, it has increasingly been seen as a computational tool in its own right [33][34][35][36][37][38][39]. And there is a rich mathematical literature to draw on, that has only rarely been tapped into so far [30], notably related to Morse theory [5,40].
Here, however, we follow in the first instance the statistical physics perspective developed above. We shall also restrict ourselves to gradient-like systems, i.e., where f (x) in Eqs (1) and (2) can be written as where U (x) is a potential [5] (or quasi-potential [41] under suitable circumstances). For stochastic systems (for deterministic systems, (1) resulting densities are typically sums of Dirac δ functions), we have for the probability density of the solution of Eq (2) to be at xdx, details to this can be found in [35][36][37].
Local minima of the potential correspond to (locally) stable attractors of the dynamics and can thus be related to cell states [33,42]. And Morse theory [29,43], which applies to gradient systems, allows us to relate the different fixed-points of gradient systems in the case of deterministic dynamics. Notably, any set of locally stable fixed points, is separated by (at least) one saddle node, i.e., a fixed point where the Jacobian of the potential, U (x), has both positive and negative eigenvalues [5,29].

Statistical mechanics of cell differentiation
A different approach is rooted in theoretical physics and has found widespread use in e.g., ecology [44], signal transduction [45][46][47], and gene regulation [48,49], but it has also been shown to be helpful in developmental and stem-cell biology [12,23,50,51]. Statistical physics links microscopic behaviour of e.g., molecules or atoms, to macroscopic observables such as pressure.
In the context of cell biology, the precise molecular composition characterise microstates, whereas the cell-types are the observable macrostates. Just as in statistical physics, each macrostate is associated with a large number of corresponding microstates. In biological terms, the microstates associated with a given macrostate represent all molecular configurations (chromatin states, mRNA and protein concentrations, transcription factor activities etc) corresponding to a given cell state ( Figure 2).
Statistical mechanics is concerned with the long-term behaviour of a system [52,53], in particular assigning probabilities of different macrostates being realised. We will start by defining some of the terminology. First, we use Latin and Greek letters to denote micro-and macrostates, respectively; here and below we follow the exceptionally clear terminology set out by Attard [53]. We assume that we can meaningfully define weights for the different microstates, ω i . We then have for the weights of the corresponding macrostates, ω α , Summing over all microstates and macrostates must necessarily give the same value, W , that is we have where W is the total weight (to be made precise below). For notational convenience we use the sum symbol, , rather than the integral (as would be more appropriate for continuous state spaces). From the weights we obtain the probabilities of micro-and macrostates, with which we can, for example, obtain expectation values for functions, e.g., across a macrostate, by averaging Here r could denote cell size, or gene expression level associated with a given cell type, if α refers to cell types. Higher moments, including variances are calculated similarly.
The entropy plays a central role in statistical mechanics, and here is defined as or, if we consider the entropy of a macrostate, as If we have a uniform distribution over the microstates we can assign unit weight to each microstate, ω i = 1, ∀i. The entropy of a macrostate is then simply, S α = log n α , where n α is the number of microstates corresponding to macrostate, α.
One of the central results of statistical mechanics is encapsulated in the second law of thermodynamics, which states that entropy never spontaneously decreases [52,53]. Thus spontaneous change will only ever occur if a change in state leads to an increase in entropy. So in this picture, a change from state α to α will only be observed if S α > S α ; or, in the case of uniform weight over microstates, if n α > n α .

Epigenetic landscapes, entropy, and cell fate transitions
We next link the epigenetic landscape with entropy, first by considering equilibrium statistical mechanics, then by considering transitions between states. We start by developing the total entropy of the system, Eq (8), further, Here the second term is the conventional representation, which captures the uncertainty associated with respect to which macrostate, α, the system is in. The first term captures the internal uncertainty associated with the macrostates; this is often ignored because in many applications only differences in entropy matter, and the whole term can then be viewed as an additive constant. But, for example, when considering a system coupled to a reservoir, this term does matter profoundly. It is also important for the case where we consider different macrostates, which below include alternative definitions of cell types.

Macrostates for cell biology
Identifying macrostates for cell biology is, perhaps surprisingly, non-trivial. Microstates are versions of gene expression states, associated to a macrostate [23,50,51]. Many of these microstates will never be attained [36]. We assume that the whole state space can, in principle, be specified and we denote it by Γ. The whole set of macrostates, called a collective, has to cover all potential states in a non-overlapping manner, that is The second condition is typically easy to meet, the first is slightly more problematic: we have to assign microstates that are potentially never observed [36] to appropriate macrostates. We discuss an almost certainly incomplete list of suitable macrostates for cell biology in the following.
Phenotypic Definition: If we have a set of objective phenotypic markers (morphology or cell surface marker), S = {s 1 , s 2 , . . . , s C }, we may use this as a base from which to define macrostates; α. We then, have, however, three types of microstates: (i) microstates that are observed in these cell-types; (ii) microstates that are never observed; these have weight ω i −→ 0 and can be ignored without biasing or distorting e.g., the calculation of state-transition probabilities. Finally, (iii) there are microstates that do not conform to these previously defined cell-types and are thus not assigned. These can, however, have finite probability. Their assignment to macrostates is a priori complicated: they may correspond to new cell-types or sub-types; they may correspond to intermediate cell states [14,54]; or they may be fleetingly visited as cells explore the molecular states available [36]. These states need to be considered with some care if we want to base macrostates on phenotypic cell definitions.
Data-Driven Definition: Alternatively, observed microstates can be subjected to statistical analysis, perhaps, unsupervised learning to group them together and then assign macrostates to clusters [14,15,55]. The ambiguity of clustering [56]-especially whether to lump small clusters together, or split larger, more extended clusters-is, of course, encountered in this approach. But because of the practical irrelevance of microstates that are never encountered (see above) this approach seems sensible, and unproblematic.
Dynamical Systems Definition: We can use ideas from the theory of dynamical systems [5].
For the deterministic case we can group all microstates, x i which, for t −→ ∞ go to the same stationary state into the same macrostate. This definition assigns every point in Γ to one and only one macrostate. However, generalisation to stochastic dynamics is not straightforward; furthermore, it does not capture the role of saddle node fixed points, which may play an important role in defining intermediate cell states [14].
Mixed Macrostates: The advantage of the form for the entropy given in Eq (11) is that we can combine different collectives of macrostates, here denoted by α, β, and γ. We can calculate the weight of such mixtures using the usual laws for joint probabilities, e.g., we have With this, it becomes possible to combine the macrostate definitions above and overcome their individual limitations. We can also, through simple relabelling of macrostates, simplify the notation and have a single subscript to denote the new "mixed macrostate".

Non-stationary epigenetic landscapes
One problem related to the difficulty in developing a statistical mechanics for stem cell biology, comes from the fact that much of the appeal of statistical mechanics lies in the fact that entropic arguments can be used to determine (most probable) system states. The second law of thermodynamics, in particular, states that entropy never decreases spontaneously, and that the maximum entropy state is the one realised with high probability [52,53]. If the macrostates are not coherently defined then entropy and i∈α p i log p i cannot be used to assign the most probable states.
Reports, for example, that entropy across cell-populations becomes maximal around the transition state are thus not necessarily in violation of thermodynamics [57]: clearly the most probable states (i.e., the states with highest probability as t −→ ∞) will correspond to fixed points and their vicinity. High entropy at transition states could either reflect poor definitions of cell states; or this could simply reflect that the concepts from equilibrium thermodynamics and statistical mechanics are of limited use in this regime.
Both explanations seem eminently plausible, and non-equilibrium statistical mechanics and thermodynamics may offer solutions to the second problems, in particular. We will sketch out two such solutions: a brief introduction to transition probabilities between states and how we can use them to reconcile some of the experimental results. Finally, we briefly turn to considering dynamic epigenetic landscapes.

Transition probabilities and entropy of transitions
In non-equilibrium theories we consider explicit change over time. One convenient way is to consider transitions over time τ , j τ −→ i, as states of interest. We have for the weight of a microstate j that is the weight of a microstate, j is equal to the sum over the weights of transitions out of j into any other microstate. The weight of a transition between (suitably defined) macrostates β −→ α (where α and β can be in the same or different collectives) is then given by Crucially, even if transitions between microstates were deterministic, transitions between macrostates will still be stochastic, because specification of macrostates, α and β, does not precisely define the start and end microstates, i ∈ α and j ∈ β [53].
In order to normalise the weights, we have to sum over all states, or equivalently, all transitions Differentiation factors increasing Figure 3: The epigenetic landscape, and our quasi-potential representation thereof, will change over time and in response to the cellular environment. This is illustrated here, where the progression along a developmental trajectory leads to the dissolution of old and the creation of new potential minima aka cell types. Whether this evolution of surfaces is necessarily smooth is not clear.
that can occur, and we get With this we get for the probability of a transition from state i/α to state j/β to occur at some time in the future τ , we have We then have for the conditional probability of ending up in state β given that the system starts in state α p(β|α, τ ) = ω(β, α|τ ) ω(α) .
Now analogously to Eq (9) we can define a new entropy for the transitions between macrostates, which we refer to as the second entropy. The advantage of this formalism is that for nonequilibrium systems and equilibrium systems alike, the most likely state transition, given the current state, β can be obtained by determining the stateα(τ |β) which maximises the second entropy, This shifts the focus of the analysis from states to transitions, and may offer a better perspective on cell differentiation than a conventional equilibrium statistical mechanics perspective.

Transitory landscapes
This brings us to the final point, and one that may have been implicitly presaged by Waddington [32], and is also apparent in the work of Thom [28] and others (see [30] for a recent review): the epigenetic landscape should not be viewed as a fixed object in time, but one which changes dynamically [36]. For mathematical models of developmental systems-typically of the essential gene regulation networks-it is possible to calculate the corresponding epigenetic landscapes as quasi-potentials, U (x) [30,34,35]. These networks change-some interactions become more prominent, while others fade as e.g., transcription factor activity changes in response to external signals [58][59][60]-and so do the quasi-potentials, taking the system through a qualitative change point.
In this view ( Figure 3) the landscape changes in response to signalling and the resulting minima of the quasipotential, U (x; ζ), which now explicitly depends on an external signal or differentiation factor, shifts position in statespace. In this view we make the stationary states of the system dependent on ζ, that is for the x * that solve ∇U (x * ; ζ) = 0.
The set of solutions, x * , does not need to behave in a continuous manner with the control variable, as is clear, for example, if ζ induces a bifurcation. A priori it is not clear if the potential has to vary smoothly.
For a transitory landscape, we could potentially treat the landscape associated with ζ as the macrostate, but we have to note here that the control variable, ζ, will in practice rarely be scalar: differentiation into a more specialised cell-type typically depends on more than a single molecular factor [61].

Conclusions
From a merely conceptual tool the Waddington or epigenetic landscape has been slowly morphing into a framework for the qualitative and quantitative analysis of real biological systems. There are two areas where further investigation and development are likely to bear fruit, and which we discussed above.
First, ideas from dynamical systems theory, Morse theory, catastrophe theory, and especially concepts related to structural stability, have important implications for the mathematical analysis of dynamical systems.
A crucial challenge to their widespread use is, however, that (i) they are typically restricted to gradient systems; (ii) they are only valid for deterministic systems. There is some reason to be hopeful that analysis of deterministic systems can be meaningful for our understanding of stochastic dynamical systems. But this may require detailed case-by-case analysis, as some hallmarks of deterministic dynamical systems, such as certain types of bifurcations, may not translate to their stochastic counterparts, or vice versa.
The second point relates to applying statistical mechanics to cell differentiation. There is obvious appeal to doing so as has been detailed before. There are two shortcomings to this, however: (i) equilibrium statistical mechanics rests on assumptions that almost certainly do not hold in the context of living and changing systems; (ii) much of the appeal of statistical mechanics stems from the fact that entropic considerations can point towards the state a system will be in. Defining the relevant macrostates is problematic; and translating empirical entropy estimates into e.g., the likelihood of a given cell-state being obtained, is not possible in the conventional framework.
There is, however, as we have sketched out here, some scope to resolve these outstanding issues by adopting a non-equilibrium perspective, and better definitions of cellular macrostates.
The concept of a transitory landscape [62], may be an attractive way of combining the dynamical systems perspective pioneered by Thom [28] and others [30,63], with the statistical mechanics perspective, especially if an appropriate non-equilibrium framework is used.