Single-cell data-driven mathematical model reveals possible molecular mechanisms of embryonic stem-cell differentiation

Embryonic development is widely studied due to its application in disease treatment. The published literature demonstrated that Krüppel-like factor 8(KLF8) plays an important role in modulating mesendoderm to definitive endoderm (DE) differentiation. However, it is not clear how KLF8 interacts with other key genes and affects the differentiation process. To qualitatively and quantitatively explore the molecular mechanisms of KLF8 during the differentiation of human embryonic stem cells (hESCs) in detail, we developed a mathematical model to describe the dynamics between KLF8 and two other significant genes, E-cadherin(CDH1) and Zinc-finger E-box-binding homeobox1(ZEB1). Based on the single-cell RNA-seq data, the model structure and parameters were obtained using particle swarm optimization (PSO). The bifurcation analysis and simulation results reveal that the system can exhibit a complex tristable transition, which corresponds to the three states of embryonic development at the single-cell level. We further predict that the novel important gene KLF8 promotes the formation of DE cells by reciprocal inhibition between CDH1 and KLF8 and promotion of the expression of ZEB1. These results may help to shed light on the biological mechanism in the differentiation process of hESCs.


Introduction
Embryonic stem cells (ESCs) are pluripotent stem cells that are derived from the inner cell mass of a blastocyst which is an early-stage preimplantation embryo that can be propagated by culturing in an undifferentiated state while maintaining the capacity to generate any cell type in the body [1]. Due to their plasticity and potentially unlimited capacity for self-renewal, human embryonic stem cells (hESCs) play an important role in animal cloning, organ transplants, etc. For instance, embryonic stem cell therapies have been proposed for regenerative medicine and tissue replacement after injury or disease [2,3].
The transition from embryonic stem cells to three different types of cells in the ectoderm, mesoderm and endoderm has been the first step in studying how pluripotent cells exit the pluripotent state and give rise to lineage-specific progenitors. From the perspective of molecular biology, the transition is determined by different levels of gene expression. Thus, understanding the regulatory mechanism of the genes underlying differentiation and the cell fate decision of hESCs has become necessary.
In the last few years, genome-wide profiling approaches have started to uncover the molecular programs that drive the developmental processes of hESCs. A number of research works have revealed the key genes or mechanisms underlying the differentiation of hESCs [4][5][6][7][8][9][10]. For example, Adamo et al. revealed that LSD1 could regulate the balance between self-renewal and differentiation in hESCs [4]. Cheryle et al. demonstrated that stable endoderm progenitors can be established from hESCs by the constitutive expression of SOX7 or SOX17, producing extraembryonic endoderm and definitive endoderm progenitors, respectively [9]. Ivanova et al. found that OCT4 regulates and interacts with the BNP4 pathway to specify four developmental fates and identified general and cell-line-specific requirements for NANOG, OCT4, and SOX2 in the hESC [10].
Recently, due to the development of single-cell sequencing technology, a new type of data named "single-cell RNA-seq data (scRNA-seq data)" is available and we can now quantify the gene expression of individual cells and analyze the heterogeneity among cells [11,12]. Through analyzing the scRNA-seq data of hESCs, Chu et al. revealed that the Krüppel-like factor 8 (KLF8) plays a key role in modulating the mesendoderm, which is an intermediate state before the definitive endoderm and mesoderm are formed, to DE state differentiation [5].
However, the dynamic molecular mechanisms underlying the differentiation of individual human embryonic stem cells are still poorly understood. Moreover, how KLF8 dynamically interacts with other important genes in the core regulatory network and governs the transition from ESCs to the three different cell types in the ectoderm, mesoderm and endoderm remains unclear [13][14][15].
Mathematical modeling has been demonstrated as a powerful tool for investigating the dynamic mechanisms underlying signal regulatory network. To systematically analyze the dynamical regulatory mechanism underlying hESC differentiation, in this paper, we developed a mathematical model that is based on a possible core regulatory network underlying the differentiation process of hESCs. scRNA-seq data and the particle swarm optimization (PSO) algorithm are used to identify the parameters of the mathematical model. Then, we performed the simulation and dynamic bifurcation analysis that is based on the proposed model and investigated the molecular mechanisms underlying the core regulatory module. More importantly, we proved the reasonability of the proposed regulatory mechanisms through the dynamic analysis results and introduced several strategies that may be feasible for controlling the process of hESC differentiation.

The core transcriptional regulation underlying human embryonic stem-cell differentiation
It has been documented that epithelial-mesenchymal transition (EMT) underlies cell fate conversions in both reprogramming and differentiation along an endoderm cell fate [16]. HESCs begin differentiation with a near-synchronous EMT, and the differentiation of hESCs is considered an EMT process [17]. In fact, hESCs are epithelial cells with a high expression of E-cadherin (CDH1), which plays a central role in maintaining epithelial cell-cell adhesion and polarity. Previous work [18,19] has shown that down regulation of CDH1's transcription is thought to be a primary mechanism that contributes to the onset of EMT. The CDH1 has also been demonstrated to be the the paradigm of epithelial genes. Loss of CDH1 is normally determined to be the hallmark of EMT [20].
In the differentiation of hESCs, KLF8 is recognized to be an essential regulator in modulating mesendoderm to endoderm differentiation. It was observed that overexpression of KLF8 increases the mobility, which was evident by the up regulation of TWIST1, an EMT marker, which indicated that KLF8 plays a specific role in promoting the transition from mesendoderm to endoderm state [5]. Indeed, KLF8 induces EMT by directly binding to the CDH1 promoter through GT boxes and repressing the expression of CDH1 [21].
To obtain the appropriate model, we first used single-cell RNA-seq data to compute Spearman's correlation coefficients among three genes. The calculated Spearman's correlation coefficients between ZEB1 and KLF8, ZEB1 and CDH1, and CDH1 and KLF8 are 0.4879, -0.4664 and -0.4851, respectively.
According to the above descriptions, it is obvious that ZEB1 inhibits CDH1 and KLF8 inhibits CDH1. No evidence shows how KLF8 influences ZEB1 and CDH1. It has also been suggested that KLF8 functions as a promoter of DE cells by upregulating many other DE cell marker genes, such as CXCR4 [5]. We assumed several models to describe their different relations and determined that good hypotheses are that ZEB1 is promoted by KLF8 and CDH1 inhibits the expression of KLF8. The connections among the three genes are contained in the following Figure 1.

The preprocess of single-cell RNA-seq data
The single-cell RNA-seq data we used in this research is obtained from NCBIs Gene Expression Omnibus and are accessible through GEO series accession number GSE75748, which characterizes differentiation from the mesendoderm(a state before the formation of DE cells and mesoderm cells) to definitive endoderm cells [5]. This data set contains a large number of genes expression data collected at 6 time points with 758 cells.
Based on the single cell RNA-seq data, we first reconstructed a single-cell order using the Wave-Crest toolbox under R software. The cell order assigns a specific time point to each cell within the time interval [0 h, 96 h]. There are 758 time points evenly distributed from 0 h to 96 h (each pair of adjacent time points is 96 h/757 apart according to the Wave-Crest algorithm). These 758 time points correspond to the 758 reordered cells. The cell order produces a time-series expression of genes, which was used to calculate the Spearman's correlation coefficients (Table 1). Then, we obtained a pseudo trajectory of the gene-specific expression through polynomial fitting. The normalized scRNA-seq data is obtained by evenly selecting 758 points of the polynomial curve with the same time points. It has been demonstrated that the time series expression of the three considered genes reflect the dynamical differentiation well [5]. After further investigation, we discovered that the parameters estimated by the data not only capture the differentiation from the mesendoderm state to the DE state, but, in fact, changing some of the parameters can simulate the differentiation from ESCs to all three germ layers. We speculate that the function of genes and the interactions between genes at different stages of differentiation remain unchanged. The differentiation trend is mainly determined by specific biological processes.

Mathematical modeling of the core regulatory module
We built a mathematical model based on mass action law and Michaelis-Menten equation [27,28], which is described in the following ordinary differential equations (ODEs). The relationships among the three selected genes has been thoroughly discussed in the network construction, and each term of the equations has been marked accordingly.
Here k 1 , k 3 , k 5 and k 7 represent the maximum production rate of CDH1 regulated by ZEB1, the maximum production rate of CDH1 regulated by KLF8, the maximum production rate of ZEB1 regulated by KLF8 and the maximum production rate of KLF8 regulated by CDH1, respectively. k 2 , k 4 , k 6 and k 8 represent the Michaelis constants that correspond to the production rates k 1 , k 3 , k 5 and k 7 , respectively. d 1 , d 2 and d 3 represent the degradation rates of CDH1, ZEB1 and KLF8, respectively. a and r represent weight coefficients to reflect the degree of activation and inhibition, respectively.
To reduce the parameters and make the following analysis more convenient, we nondimensionalized the model by making the following substitutions, whereby we assume that the Hill coefficients are equal to 2.
Therefore, the simplified ODE model is as following, and all the analysis is based on the nondimensionalized ODEs. where The model includes three variables and night parameters. To identify the parameters in the model, we converted the parameter identification into the problem of optimization by minimizing the following objective function. Mathematically, the objective function is defined as the error between the simulation results and the time series experimental data. The formulation can be expressed as y D k (t j ) represents the measured data of component k at time-point t j , which, in our case, is the normalized data obtained from the polynomial curves. y k (t j , P) represents the k th component of the solutions of ODE at time t j with parameter set P. Here J = 758, since there are 758 pseudo time points. The numerical solution of the ODEs is solved by MATLAB, with the initial value set to be the approximation of the first data point of the normalized data and a, r assumed to be 1. The particle swarm optimization(PSO) [29] algorithm is used to optimize this object function within the optimization intervals that are carefully chosen. The obtained optimization parameters are listed in Table S1.

The consistency between the simulation results and measured data
We used the proposed model to simulate the dynamic process with time. Figure 2a,c,e demonstrates that the simulation results are consistent with the obtained trajectory of single-cell RNA-seq data, which shows that our model could reflect the main biological process underlying the differentiation of ESC. The fitting of 758 scRNA-seq data is shown in Appendix D. Since the RNA-seq data we obtained was collected from large amount of cells which may lead to the errors between the real initial gene expression level and the initial expression level we estimated, we gave the initial value random perturbations within 0.3 (Figure 2b,d,f). The results show that the model well reflects the fluctuation of the expression data with different initial values. The simulation is independent of the selection of initial values to a certain extent.
3.2. Bifurcation analysis reveals the threshold dynamics of the relative degradation rate corresponding to different stages of differentiation We presented dynamic simulations of the system with different values of p 4 . The curves in Figure 3a indicate that the dimensionless concentration of CDH1 gradually decreases after a short increase until a steady state is attained during the time evolution. The steady state remains the same with an increase of p 4 . Intriguingly, the steady state significantly increases into an extremely high state when p 4 is larger than 0.133. Conversely, a relatively lower state is maintained when p 4 is less than 0.133. It is obvious that the system will sustain more steady states with the change of p 4 . Through the nondimensionalization process, we know that p 4 is the ratio of the degradation rate of ZEB1 to the degradation rate of CDH1, which we named the "relative degradation rate". We hypothesize the existence of thresholds for the relative degradation rate, which eventually distinguishes different levels of CDH1 expression.
The threshold of the relative degradation rate is clearly indicated by Figure 3b. Through qualitative and quantitative analyses, we suggest that a high expression of CDH1, in which relative degradation rate is greater than 0.225, corresponds to the ectoderm state of ESCs. With the relative degradation rate decreasing, the inhibition of ZEB1 targets on CDH1 is enhanced, at least partly, because of the relatively lower degradation rate of ZEB1, which results in weaker adhesion among cells. Thus, EMT is activated to a certain extent, and the system undergoes SN bifurcation and another stable steady state occurs, which we believe corresponds to the mesoderm state. As the relative degradation rate decreases further (< 0.096), EMT is further activated and the system undergoes another SN bifurcation. Then, the lower expression state of CDH1 occurs, which corresponds to the formation of DE cells. In short, the threshold we discovered is the necessary condition for cell formation in different germ layers. The results also indicate that mesendoderm cells can repossess pluripotency and differentiate into ectodermal cells, and the differentiation process is reversible under certain conditions. There are reasons to believe that the proposed model is able to characterize the differentiation from ESCs to the three embryonic layers.

The relative dissociation constant distinguishes DE cells from cells in other germ layers
We investigated the effect of inhibition intensity on the dynamic behavior of the system. Figure 4a shows that a bistable phenomenon exists when p 3 varies in the region [9.65, 23]. According to the nondimensionalization process (Eq 8), p 3 is the ratio of k 4 and k 6 , which are the dissociation constants that KLF8 targets on CDH1 and ZEB1, respectively. Since p 3 is equal to the ratio of the dissociation rate of the ligand-receptor complex, we named it as the relative dissociation constant, and we found that there is a threshold of the relative dissociation constant. When the dissociation constant of the ligand-receptor complex of CDH1 and KLF8 is approximately 23 times greater than that of ZEB1 and KLF8, the system undergoes a bistable switch from a low-expression state to a high-expression state of CDH1. The bifurcation diagram indicates that the change of p 3 might not be a good way to obtain all three different types of cells during ESC differentiation.
However, we suggest that influencing the relative dissociation constant might be a feasible way to obtain DE cells from cells in other germ layers.

Bifurcation analysis reveals the connections of feedback coefficients
To further determine how p 3 affects the whole system, we analyzed the two-parameter bifurcation. According to Figure 4b, when the relative dissociation constant p 3 decreases to a small enough amount (for example, p 3 = 0.3), the tristable phenomenon disappears, and the system undergoes an irreversible bistable switch, which means that it is impossible to obtain mesoblastema and hard to change the fate of cells from differentiated cell into DE cells. In this case, CDH1 maintains a low level of expression, and KLF8 maintains a high level of expression. Interestingly, when the relative dissociation constant becomes greater, the system exhibits only one stable steady state of high concentration of CDH1.
Qualitatively we find that a large dissociation constant of the ligand-receptor complex of CDH1 and KLF8 and a relatively smaller dissociation constant of the ligand-receptor complex of ZEB1 and KLF8 resulted in the activation of CDH1 (the case where p 3 = 35), which is partly due to the situation that the inhibitory effect of CDH1 by KLF8 is weakened by a large dissociation constant. Instead, with a smaller dissociation constant of ligand-receptor complex of CDH1 and KLF8 compared to that of ZEB1 and KLF8, the direct inhibition of CDH1 by KLF8 is enhanced (the case where p 3 = 0.2), which leads to a state of low expression of CDH1, although ZEB1 is promoted by KLF8.
Furthermore we have the following equation according to Eq 8.
Under the circumstance that p 3 and d 1 are constants, the decrease of p 1 results from the the decrease of k 5 , which characterizes the promotion of ZEB1 by KLF8. It reveals that the system may exhibit only one steady state if the relative dissociation constant p 3 is too large or too small, no matter how the production rate of ZEB1 changes. Because p 3 and p 4 play important roles in the ESC fate decision, we start to wonder how they synergistically affect the system. Figure 4c,d shows the varying steady states of the model with the two constants in different domains. From the parameter values of p 3 , p 4 that correspond to different steady states, we suggest that it is difficult to obtain mesodermal cells with the relatively weakened inhibition effect that KLF8 targets on CDH1 or the weakened promotion that KLF8 targets on ZEB1.
Meanwhile, a high relative degradation rate of ZEB1 to CDH1 determines the irreversible steady state of epithelial cell formation, which corresponds to the differentiation of the ectoderm according to the yellow district of Figure 4d.
In conclusion, the relative dissociation constant p 3 should be neither too large nor too small in case of an irreversible situation of the steady states dominated by KLF8 if three types of embryonic cells are needed. Figure 5. The bifurcation diagrams of three components as a function of the inhibition coefficient that CDH1 targets on the expression of KLF8 (p 6 ). The solid lines describe the steady states of CDH1, ZEB1 and KLF8 versus p 4 , respectively. The dashed line between the two circles corresponds to an unstable state. SN represents the saddle node.

KLF8 promotes the formation of DE cells possibly by the activation of ZEB1 and the mutual inhibition of KLF8 and CDH1
Previous experimental and dynamical analyses show that KLF8 plays an important role in modulating cell differentiation from mesendoderm to DE. To systematically analyze the role of KLF8 in the differentiation, we plotted the bifurcation diagram ( Figure 5) with respect to the inhibition coefficient that CDH1 targets on KLF8 (p 6 ). With an increase of the inhibition coefficient p 6 , CDH1 switches from the monostability of low level to tristable states in a narrow region, then to the high level of bistable states. It reveals that if we want to switch the DE state to other states of hESC, we may enhance the inhibition strength of CDH1 on KLF8, or artificially, we inhibit the expression of KLF8 through medication. Combined with the fact that KLF8 inhibit the expression of CDH1, we assume that KLF8 promotes the formation of DE cells possibly by mutual inhibition of KLF8 and CDH1.
The above analysis is based on the assumption that a = 1, r = 1. To further investigate the function of KLF8 in the system, we decreased parameter a to characterize different intensities of CDH1 on KLF8 and plotted Figure 6a to analyze the behavior of KLF8.
With the decrease of a, the tristability phenomenon gradually disappears, and the system undergoes an irreversible trend towards a high expression level of CDH1 (when a reaches the threshold 0.15), which marks the formation of epithelial cells. Similarly, the decrease of r leads to the disappearance of the tristability phenomenon of the system (Figure 6b), which finally resulted in the formation of the ectoderm. The simulation results show similar conclusions (see Figure S2). We reasonably suggest that the promotion of ZEB1 by KLF8 and the inhibition of KLF8 by CDH1 is indispensable for DE cell formation which is consistent with the complex biological process. More specifically, we assume that the promotion of ZEB1 by KLF8 does not exist, which results in changes of the promotion term from a nonlinear function of KLF8 to a linear constant c. The simulation results (Figure 6c) indicate that when the KLF8-related production rate of ZEB1 degenerates into constants c, the expression of CDH either maintains a high level of expression or reduces it directly, which is inconsistent with experimental observations. Figure 6d shows the same conclusion that the production rate of KLF8 is a nonlinear function of CDH1, as was indicated by our model. In fact, the intrinsically complex processes require the coordinated dynamic expression of hundreds of genes and proteins in precise response to external signaling cues. Our work only focuses on a fraction of the complex regulatory network, which raises another question about whether the model really reflects the biological differentiation process. To verify the generality and reliability of the proposed model, we analyzed it from three aspects. First, we tried numbers of model settings, e.g., we set the Hill coefficients to 1 and found that the numerical simulations of the model did not fit the data well. We have also tried the linear functions in the model as a regulatory method between two genes. The fitting results remain poor. Secondly, to testify whether the parameters are sensitive to the singlecell data we used in the parameter estimation, we performed 10-fold cross-validation (Appendix E). The parameters that were estimated by 10 data subsets vary slightly and are similar to the parameters listed in Table S1. Third, since cell differentiation is typically heterogeneous and is spatially disordered, the intrinsic fluctuations and extrinsic signal fluctuations may play important roles in modulating the state switch in the differentiation of hESCs. To explore the influence of intrinsic and extrinsic noise on the transition between multiple steady states of the proposed model, we developed corresponding Master equation and Langevin equation models (Appendix F) and performed stochastic simulations using the Gillespie algorithm and Euler-Maruyama algorithm [30], respectively. According to the simulation results ( Figures S4 and S5), the Master equation and Langevin equation models displayed similar dynamic behavior as that by the deterministic model.
The above evidences proved that the proposed model can characterize the differentiation from hESCs to three germ layers and we suggest that KLF8 promotes the formation of DE cells possibly with the promotion of ZEB1 and the mutual inhibition of KLF8 and CDH1, which partially answers the unsolved question [5].

Conclusions
In this study, based on the single-cell RNA-seq data, we reconstructed a core regulatory network underlying the stem-cell differentiation process, and demonstrated that the core regulatory module shows various behaviors, including the numerical fitting and the three states switch, which is in correspondence with the three types of cell in three germ layers. Thus, we proved that the novel important gene KLF8 [5] affects the differentiation process by up-regulating ZEB1 and down-regulating the expression of CDH1, which forms a coupled feedback loop. In addition, we defined two indexes including the relative degradation rate and the relative disassociation rate which are tightly associated with the complex dynamic behaviors.
We also proposed some possible methods to realize the switch of the cell fate.
• The small relative dissociation constant ensures the formation of DE cells, and the large relative dissociation constant ensures the formation of epithelial cells; • The inhibition of KLF8 contributes to the formation of epithelial cells by blocking EMT; • The high relative degradation rate of ZEB1 to CDH1 determines the steady state of epithelial cell formation.
Altogether, we believe that the combination of scRNA-seq analysis and mathematical modeling can well reveal the molecular mechanism of cell fate decisions.   Table S1.
[CDH1](t + dt) =[CDH1](t) +  1 2 Previous studies have demonstrated that the bistable systems can generate bimodal expression patterns of corresponding genes [34,35]. To explore how the negative feedback strength shapes the KLF8 expression distribution, we simulated the distribution of KLF8 in a collection of 10,000 cells under different values of p 6 for the Master equation. Figure S6 gives the distribution diagram of KLF8 switching from a unimodal distribution of the high state (p 6 = 13) via the bimodal distribution in the tristable region (p 6 = 15), to the bimodal distribution in the bistable region (p 6 = 17) and, then, to the low state with unimodal distribution (p 6 = 22). Since the overexpression of KLF8 could accelerate  Table S1. the transition from the mesendoderm cell to DE cell under differentiation [5], the distribution of KLF8 become biomodal (p 6 = 15) distribution, which may correspond to the undifferentiated state or differentiate from mesendoderm to DE state. Or the cell may directional differentiate from mesendoderm state to endoderm state (p 6 = 13). The green and red lines correspond to the low and high steady states of the deterministic model, respectively. All the other parameters are set to be the same as those in Table S1.