A common molecular logic determines embryonic stem cell self‐renewal and reprogramming

Abstract During differentiation and reprogramming, new cell identities are generated by reconfiguration of gene regulatory networks. Here, we combined automated formal reasoning with experimentation to expose the logic of network activation during induction of naïve pluripotency. We find that a Boolean network architecture defined for maintenance of naïve state embryonic stem cells (ESC) also explains transcription factor behaviour and potency during resetting from primed pluripotency. Computationally identified gene activation trajectories were experimentally substantiated at single‐cell resolution by RT–qPCR. Contingency of factor availability explains the counterintuitive observation that Klf2, which is dispensable for ESC maintenance, is required during resetting. We tested 124 predictions formulated by the dynamic network, finding a predictive accuracy of 77.4%. Finally, we show that this network explains and predicts experimental observations of somatic cell reprogramming. We conclude that a common deterministic program of gene regulation is sufficient to govern maintenance and induction of naïve pluripotency. The tools exemplified here could be broadly applied to delineate dynamic networks underlying cell fate transitions.


Asynchronous Simulations
Here we compare the results of our methodology with the approach of simulating a single model under an asynchronous update scheme. For this investigation, we selected the minimal model from the 0.717 cABN (the network with the fewest interactions), which is shown in Fig. S8a. We ran simulations of this model using BooleanNet, a freely available Boolean network simulator [1], and examined the resetting dynamics produced under different network perturbations. For each experiment described below, 10000 independent simulations were run under an asynchronous update scheme in which one component is chosen at random to update at each step. We set the initial state to be equal to the GOF18 EpiSC state following discretisation (Fig. S1f).
First, we investigated the effect of single factor forced expression in EpiSC resetting. We calculated the proportion of simulations that reached the naïve state, and then determined the average step at which the naïve state was reached. This enabled us to compare the relative potency of each single factor overexpression to Control (2i+LIF only with not overexpression) (Fig. S8b), and to evaluate model predictions against experimental results (Fig. 1e). The model predicted that forced expression of any of Nanog, Sox2, Tbx3, Sall4, Esrrb, Klf2 and Klf4 would guarantee that the naïve state be reached. These perturbations subsequently differed in the average trajectory length to the naïve state. As for the analysis run for the cABN, Klf2, Klf4 and Esrrb emerged as the most potent inducers of resetting with the lowest number of average steps required to reach the naïve state. In contrast, Sall4, Sox2, and Oct4 were incorrectly predicted to enhance resetting. Finally, forced expression of Gbx2 and Stat3 were predicted to marginally increase the number of simulations reaching the naïve state, or to decrease the number of steps required to reach the naïve state, respectively. Only for Gbx2 could we observe experimentally a mild yet significant increase in colony number (Fig. 1e), thus we conclude that overall, 7 out of 11 predictions are supported by experiment, as the model incorrectly predicts the effect of Stat3, Sall4, Sox2, and Oct4.
We were able to interrogate these simulations further to examine gene activation kinetics by calculating the number of steps taken for each gene to reach a stable average expression across all simulations, which indicates the relative order of gene activation towards the naïve state (Fig. S8c). In the control case only a fraction of the simulations reached the naïve state, therefore when we calculated the Average Expression of each factor across all simulations we observed, for instance, that Esrrb expression reaches ~0.8. Conversely, factors like Gbx2 appear to activate in all simulations, also those not reaching the naïve state. Forced expression of Klf4 allows all simulations to reach the naïve state, and indeed the average expression of all factors is 1. Moreover, a stable state is reached more rapidly in the case of Klf4 forced expression, as indicated by the red lines on the right panels. Compared with the experimental data (Fig. 2b), in the control case, the model incorrectly predicts only the early activation of Gbx2 and agrees with the kinetics of all other factors investigated.
We similarly investigated the effect of single factor knockdown in EpiSC resetting. Using the number of simulations that reach the naïve state as a measure (Fig. S8d), the model predicts that the removal of Gbx2 and Tfcp2l1 do not have effect on EpiSC resetting, that removal of Klf4, Tbx3 and Sall4 had a significant partial effect on resetting efficiency, while Nanog, Esrrb, Oct4, Stat3, Sox2 and Klf2 are required for resetting. When compared to the experimental results (Fig. 6b), only Klf4 was incorrectly predicted.
Finally, we investigated dual factor forced expression in EpiSC resetting (Fig. S8e). We tested whether combinations of factors would have a synergistic effect over single factors alone using the average number of steps required to reach the naïve state as a measure (note that in all cases, all simulations reach the naïve state). Compared to our experimental results (Fig. 3a, right panel), the model incorrectly predicts that Klf4/Tbx3 together are no more efficient than either factor alone, and that Esrrb/Tfcp2l1 together demonstrate an additive effect, as shown by a reduction in the average number of steps required to reach the naïve state. Overall, 5 out of 7 predictions are supported experimentally (Fig. S8e, lower panel).
The set of predictions made by 0.717 minimal model under an asynchronous update scheme were compared to those generated from the 0.717 cABN (Fig. S8f). Overall, the minimal model demonstrates a high predictive accuracy of 75.86% across this set of tests, but this accuracy is lower than the predictive accuracy of the 0.717 cABN, which is 84.92%. Based on this comprehensive comparison, we conclude that 0.717 cABN captures models with behaviours relevant to the real biological process with individual models possessing high level of predictive accuracy. This also highlights the utility of our cABN approach in identifying candidate network models that can be subsequently used for simulation-based investigation, and undoubtedly can be further refined based on experiments.

2) Appendix Supplementary Figures
Appendix Figure S1 (C) Expression of exogenous transcription normalised to actinβ from one representative experiment out of two. (D) A summary of the predictions from the 0.832 and 0.782 cABNs of whether the indicated forced expression was more efficient than empty vector control, compared with experiment. The predictive accuracy of the models increases in the 0.782 cABN. Experimental data was based on data shown in Fig 1E. (E) Comparison summary of predictions and experimental results for resetting potency between gene pairs. Each row compares the prediction from the 0.782 cABN (left box) with experiment (right box), showing whether the first factor (row) is more/less potent (green/red) than the second factor (column). We show experimental results where there was a significant difference between the resulting colony number (Student's t-test, p<0.05). In none of the cases tested were the predictions in disagreement with the experimental results (i.e. prediction of gene X being more efficient than gene Y, when in fact gene Y was significantly more efficient than X). The experimental data was based on data shown in Fig 1E. (F) Schematic summary of (E), illustrating the relative potency between individual factors confirmed by experiment, where the arrow points from a more potent to a less potent factor.
Appendix Figure S3 Appendix Figure S4, related to Fig 3, 4a, Table S3 for direct comparison between experimental results and predictions generated using the 0.717 cABN.

3) Appendix Supplementary Tables
Appendix Table S1: siRNAs used in this study.