Mechanistic models of cell-fate transitions from single-cell data

Our knowledge of how individual cells self-organize to form complex multicellular systems is being revolutionized by a data outburst, coming from high-throughput experimental breakthroughs such as single-cell RNA sequencing and spatially resolved single-molecule FISH. This information is starting to be leveraged by machine learning approaches that are helping us establish a census and timeline of cell types in developing organisms, shedding light on how biochemistry regulates cell-fate decisions. In parallel, imaging tools such as light-sheet microscopy are revealing how cells self-assemble in space and time as the organism forms, thereby elucidating the role of cell mechanics in development. Here we argue that mathematical modeling can bring together these two perspectives, by enabling us to test hypotheses about specific mechanisms, which can be further validated experimentally. We review the recent literature on this subject, focusing on representative examples that use modeling to better understand how single-cell behavior shapes multicellular organisms.


I. INTRODUCTION
Single-cell technologies are producing large amounts of data about multicellular development, which have created an urge to develop tools to extract information from those datasets. In that respect, machine learning has proven to be very useful in many applications, where it has exhibited substantial predictive power. This raises the question of why bother trying to make mechanistic models. However, as it has already been debated [1,2], mechanistic and machinelearning approaches are not mutually exclusive. The power of machine learning is the ability to find patterns in vast amounts of data in an inductive way, while mechanistic approaches work at the level of human intuition, making their strength deductive. Both perspectives can complement each other to make better theories and predictions [3]. This mutual interaction has been present for decades: the first (mechanistic) neuron model [4] served as inspiration for the generation of the network approach that has culminated in the development of deep neural networks. There is renewed interest in this synergistic point of view, as evidenced by the efforts towards the development of tools ranging from the selection of simpler and more informative models [5] to the analysis of complex systems using machine-learning strategies [6], and including the proposal of benchmarks for modeling and data analysis [7,8].

II. EXPERIMENTAL SINGLE-CELL BREAKTHROUGHS
Two main breakthroughs are providing an unprecedented push to study the impact of fate decisions at the single-cell level on the development of multicellular organisms. The first one is the fast development of high-throughput techniques to measure gene expression at single-cell resolution, which are leading to new insights into the cell-fate transitions underlying multicellular development [9,10]. The data is increasing in detail, as shown by the addition of key information such as spatial resolution [11][12][13]. While these techniques are not perfect and many challenges are still open [14], the generation of richer data with continuously improving quality offers promising avenues of research in developmental biology. The second breakthrough is the establishment of gastruloids as novel experimental models of development [15]. These developmental models have shown the ability to reproduce with unprecedented detail the main features of embryo formation, from 3D organization [16] to somitogenesis [17]. The benefits of these systems are their increased experimental controllability, as well as the scalability of the number of samples that can be analyzed under different perturbations.

III. BUILDING MECHANISTIC MODELS AT THE SINGLE-CELL LEVEL
In spite of the increase in the amount of data available, mechanistic models suffer from what has been called the "curse of parameter space" [18], namely a lack of knowledge of the in vivo parameter values of most biochemical and mechanical processes within live cells. This problem can be circumvented by (i) using dynamical behavior as a constraint of the models [19], (ii) applying the tools of dynamical systems (in particular bifurcation theory) to explore systematically how the different biological processes impact qualitatively the behavior of the system in both model and experiments [20], and (iii) using minimal models to describe the spatiotemporal self-organization of tissues [21,22]. In what follows, we describe several case examples illustrating the use of those methods, in particular of minimal models that can capture the fundamental principles of the cell-fate transitions that underlie the formation of organisms.

A. Geometric description of cell-fate dynamics
The simplest paradigm of organism development comes from the epigenetic landscape picture introduced by Waddington, in which cells start from a pluripotent state and roll down arXiv:2102.03422v1 [q-bio.CB] 5 Feb 2021 2 through developmental valleys towards one of several alternative committed states ( Fig. 1) [23]. This qualitative description is amenable to be interpreted with the formalism of dynamical systems [24,25]. Starting from this conceptual "landscape model", Corson and Siggia used the geometrical perspective of dynamical systems to explain the characteristic fate patterning of vulval development in C. elegans [26]. Fate commitment in these cells involves both positional EGF cues from an anchor cell and lateral cell-cell communication among the vulval precursor cells via Notch ( Fig. 2A). To un- derstand how the system integrates the two types of signals, the authors ignored all biochemical details of the system, focusing on the fate patterning across a range of conditions. With this information, they created the simplest model that captures the shape of the landscape over which the cells evolve towards the committed fates (Fig. 2B). Using the effective degrees of freedom of the system to direct behaviors of the data allowed the authors to explore systematically the properties of the model, and study in particular the epistatic interactions between the exogenous EGF cue and the Notch-based cell-cell communication (Fig. 2C).
Despite the reduced dimensionality of the model, which only consists of eight cells with two effective degrees of freedom each, we can extract several lessons. First, the model exploits the qualitative description of dynamical systems to gain insights at multiple levels of complexity, suggesting in the process alternative interpretations to our current understanding and proposing insightful experiments to test the current state of knowledge. The second lesson is the ability of this approach to discern among one of several theories, usually considered mutually exclusive, when interpreting experimental observations. In this case, the model captures the joint effect that extrinsic cues and mutual cell-cell communication have on the system, conciliating an apparent contradiction between the two mechanisms. The simultaneous dependence on diverse factors is a natural assumption given the complexity of biology. Other studies have reached similar conclusions for phenomena ranging from the refinement of cell-fate patterning [27] to robustness inactivating mechanisms [28,29]. Being able to select or integrate multiple models points at the benefits that single-cell resolution provides.

B. A single-cell dynamical landscape
From a cell-centric perspective, the genome contains all the necessary information describing the cell-fate landscape. However, cells are not isolated systems. They feel external stimuli and gradients, and are sensitive to temperature and to mechanical interactions, both with the environment and between them. All these factors impact the individual cells, cre-ating complex landscapes. In principle, we could imagine the global landscape of the full dynamical system containing all its degrees of freedom. However, we do not use this picture for several reasons. First, time-dependent extrinsic factors will keep the landscape dynamic. Besides, such an image completely obscures the elegant geometric understanding that we have of dynamical systems, and hence prevents us from extracting any possible intuition from it. Finally, and more practically, individual cells inside an organism are amenable to be interpreted as identical cell blocks, conferring the complex collective space with a valuable assumption of symmetry.
Considering each cell's dynamical system as a building block of the population, we find two common approaches to study cell-fate decisions at the collective level: continuous models (CM) and agent-based models (ABM). Continuous models exist since the inception of the mathematical study of morphogenesis with the pioneering work of Turing [30]. In such an approach, we take the limit of the cells to be a continuum. This modeling approach is very amenable to analytical work, and was for years the only approach used. However, complexities arise when introducing cell division and inhomogeneities in the system. As an alternative, with the rise of computational power, it has become possible to simulate cells as individual entities following simple rules. This approach has the benefit that it naturally accounts for the individuality of cells, describing complex intrinsic properties flexibly. The main drawback of this simulation framework is that it is hard to reach an analytical understanding that goes beyond specific parameter values. Careful considerations are required to avoid over-complications of the model and extract reliable conclusions from their use. The two techniques are not exclusive and can be complementary. For example, Manukyan et al [31] use a CM to explain the patterning of reptile scales, while proving that the approach maps to an ABM depending on the length scale at play. The use of one method or the other will depend on the modeling purpose, with the question that we want to assess fitting naturally one or the other.
As an illustrative example of this combined approach, Saiz et al [32] investigated the robustness of cell fate proportions during the early development of the mouse embryo. The current knowledge of this process points to a complex interaction of different interconnected pathways combined with intercellular communication ( Figure 3A). Despite the complexity of the genetic network, the dimensionality of the biochemical model can be reduced substantially through adiabatic considerations, leading to a single degree of freedom per cell. This simple biochemical model is enough to explain the balanced distribution of fates between epiblast and primitive endoderm in terms of only cell-cell communication via FGF/ERK signaling, which leads to an effect akin to lateral inhibition in neighboring cells without the need of intracellular mutual inhibition between the master regulators of the two cell fates ( Figure 3B). To include the effects of cell mechanics and proliferation, the authors used an ABM where the cells obey the minimal biochemical model described above ( Figure 3C). The model reproduced the experimentally observed robustness to perturbations in the initial embryo size, using both scaling experiments in which half-sized and double-sized embryos were As we have discussed in the preceding section, the cellfate transition landscape changes between cells under the effect of external signals and cell-cell interactions. But there is an additional detail left: the stochastic nature of biology. Noise is ubiquitous in cells, from the sensing of external signals to gene expression [33,34]. The effects of noise at the single-cell level have been studied for more than two decades [33,[35][36][37][38]. A standard way to approach this issue so far has been to include noise sources as part of the modeling approach, usually working on small gene and protein circuits described by low-dimensional stochastic dynamical systems. However, a fundamental understanding of the large-scale interplay between the noise and the behavior of cell-fate decisions is still missing. The unique character of single-cell experiments should help us reach deeper insights into this problem. In particular, the recent deluge of data is challenging established concepts such as that of cell identity [39,40], and provides unprecedented insight into the dynamics of cellfate decisions in commitment [37,41,42] and reprogramming [42,43].
Previous proposals highlight the potential benefit of using the formalism of non-equilibrium statistical mechanics to account for this inherent stochasticity [44,45]. This route can nevertheless be challenging. A step in this direction was taken by Stumpf et al [46], who used single-cell data and a combination of machine learning and mechanistic modeling to examine the in vitro differentiation of stem cells as they evolve towards the neuroectoderm fate ( Figure 4A). Cell trajectories towards the committed state, as inferred from single-cell transcriptome analysis, are highly stochastic (Figure 4B). The single-cell data also allowed the authors to infer a regulatory network whose components change along the developmental path, activating and deactivating three different regulatory modules ( Figure 4C, see labels 1-2-3 in the plot). According to this analysis, cells evolve through three macrostates, which contain several microstate transitions. A careful comparison between different alternative models reveals that the transitions are not consistent with Markovian dynamics ( Figure 4D), but they rather correspond to a non-Markovian stochastic process ( Figure 4E). This combination of stochastic modeling and statistical data analysis shows a way forward in our quest to understanding cell-fate transitions in development using high-throughput single-cell experimental methods.

IV. FINAL REMARKS
Single-cell methods, particularly those that offer spatial resolution [11][12][13], will be of great relevance to disentangle many open questions on the dynamics of the cell-fate decision landscapes. However, we need to be mindful of the most effective way in which we can leverage those techniques to advance our understanding of the fundamental mechanisms behind multicellular self-organization. Looking for the most minimalistic approach to explain current knowledge is at the heart of any modeling approach. However, although simplicity is readily appreciated [26,27,32], the continuous increase of detailed knowledge and the rising amount of data generated at single-cell resolution makes it very tempting to construct over-complicated models, which might not be the best way forward.
The reasons for keeping things simple are twofold. First, over-detailed attitudes may obscure our understanding and curtail our ability to both search for trustful explanatory mechanisms and open up further avenues of research. Second, despite the amount of data that single-cell techniques provide for us, we should always assess their explanatory capacity; overwhelmingly complex models can be impossible to validate even with extensive experimental efforts. Many studies have focused on the assessment of both issues, both looking systematically for simpler models compatible with the data [5,7,47] and assessing the limits of the information that we can obtain from those models and from the available data [48]. These are the approaches that will generate understanding from the current revolution in data gathering.

ACKNOWLEDGMENTS
This work was supported by the Spanish Ministry of Science, Innovation and Universities and FEDER (under projects FIS2017-92551-EXP and PGC2018-101251-B-I00, and by the "Maria de Maeztu" Programme for Units of Excellence in R&D, grant CEX2018-000792-M), and by the Generalitat de Catalunya (ICREA Academia programme and grant 2017 SGR 1054). G.T. is funded by PhD grant FPU18/05091 from the Spanish Ministry of Science, Innovation and Universities.