Modeling TGF- β signaling pathway in epithelial-mesenchymal transition

The epithelial-mesenchymal transition (EMT) consists in a morphological change in epithelial cells characterized by the loss of the cell adhesion and the acquisition of mesenchymal phenotype. This process plays a crucial role in the embryonic development and in regulating the tissue homeostasis in the adult, but it proves also fundamental for the development of cancer metastasis. Experimental evidences have shown that the EMT depends on the TGF - β signaling pathway, which in turn regulates the transcriptional cellular activity. In this work, a dynamical model of the TGF - β pathway is proposed and calibrated versus existing experimental data on lung cancer A549 cells. The analysis combines Bayesian Markov Chain Monte Carlo (MCMC) and standard Ordinary Differential Equations (ODEs) techniques to interpolate the gene expression data via an iterative adjustments of the parameters involved. The kinetic of the Smad proteins phosphorylation, as predicted within the model is found in excellent agreement with available experiments, an observation that conﬁrms the adequacy of the proposed mathematical picture. Copyright 2012 Author(s). This ar-ticle is distributed under a Creative Commons Attribution 3.0 Unported License .

The epithelial-mesenchymal transition (EMT) consists in a morphological change in epithelial cells characterized by the loss of the cell adhesion and the acquisition of mesenchymal phenotype. This process plays a crucial role in the embryonic development and in regulating the tissue homeostasis in the adult, but it proves also fundamental for the development of cancer metastasis. Experimental evidences have shown that the EMT depends on the TGF-β signaling pathway, which in turn regulates the transcriptional cellular activity. In this work, a dynamical model of the TGF-β pathway is proposed and calibrated versus existing experimental data on lung cancer A549 cells. The analysis combines Bayesian Markov Chain Monte Carlo (MCMC) and standard Ordinary Differential Equations (ODEs) techniques to interpolate the gene expression data via an iterative adjustments of the parameters involved. The kinetic of the Smad proteins phosphorylation, as predicted within the model is found in excellent agreement with available experiments, an observation that confirms the adequacy of the proposed mathematical picture.

I. INTRODUCTION
The expression "epithelial-mesenchymal transition (EMT)" indicates the morphological transformation of epithelial cells in mesenchymal cells. This is a cellular process that occurs under physiological conditions, as in embryonic development and/or regulation of the tissue homeostasis in adults. EMT is however also implicated in cancer progression and others pathological conditions. 1 Biochemically, the transformation takes place through the down-regulation of specific epithelial proteins and the up-regulation of mesenchymal proteins. Cells which undergo the EMT transformation present a new phenotype, acquiring a pronounced migratory ability that promotes their diffusion towards adjacent districts. This is a fundamental step which seeds the physiological differentiation during embryonic development. Importantly, it is also fueling the spreading of cancer signaling. 2 At the cellular level, no significant difference is currently detected between pathological and physiological EMT in that they both appear to be governed by similar signaling pathways. 1 Starting from this setting, it would be extremely important to identify dedicated EMT features, specifically associated to tumour derives, an ambitious aim that could eventually translate in novel protocols to stratify the patients and predict the outcome of the disease. Several experiments unambiguously established that the EMT depends on the TGF-β signalling, the coupling being mediated by an intricate network of microscopic molecular reactions. 2 To gain insight into the TGF-β signaling pathway, and elucidate the interlaced connection among molecular constituents, a plethora of models were proposed in the past and benchmarked to direct experiments. [3][4][5][6] Among the others, the model proposed by Schmierer 2158-3226/2012/2(1)/011201/11 C Author(s) 2012 2, 011201-1 and colleagues 3 appears particularly interesting for the quality of the experimental input processed and, at the same time, the richness of the biochemical details accommodated for in the hypothesized reactions scheme. Smad intracellular proteins are for instance explicitly included: Smad transduce extracellular signals from TGF-β ligands to the nucleus where they activate downstream the transcription of specific target genes. The mechanism of Smad nucleocytoplasmic shuttling was studied in Schmierer et al. 3 by monitoring the nuclear accumulation of phosphorylated Smad proteins. To this end, a fitting strategy was implement to quantitatively compare the model predictions to time course kinetics experiments, performed on HaCaT cells in response to TGF-β. The experiments covered however a relatively short time window, 150 minutes upon administration of TGF-β. Noteworthy, while constituting a reasonable scheleton for validating the model, the data employed in Schmierer et al. 3 did not allow one to appreciate, and fully resolve, the TGF-β induced dynamics. This latter in fact requires 72 hours for the morphological transformation of the cells to eventually take place. 7 In this work, extending beyond the former analysis, we intend to characterize the dynamics of TGF-β in the EMT, embracing the time frame of interest. To constrain our model, we will make use of gene expression data relative to A549 cells, in response to 5 ng/mL of TGF-β, over a window of observation of 72 hours. These data were produced using single-color microarray technology, which enabled one to simultaneously measure the mRNA concentration of thousands of genes, 8 including those involved in the TGF-β pathway, object of our analysis. As we shall discuss in the following, the model that we will put forward is rather simplified and inspired to a reductionist approach. A limited number of biochemical proteins is solely considered that however organize in a self-consistent dynamical picture to describe the TGF-β pathway. Clearly, a much larger gallery of reactions should be accounted for when aiming at resolving all the details implicated in the examined processes. Our philosophy is however different. We are in fact interested in elucidating a minimalistic subset of microscopic reactions that are supposedly responsible for the observed macroscopic experimental behaviours. It is worth emphasizing that the hypothesized chemical reactions, although simple, follows current understanding of the processes involved, see e.g. Ting Liu et al. 9 Despite its inherent simplicity, our model proves capable of successfully explaining the alteration in the transcription process as induced by TGF-β in EMT. The quality of the fitting and the predictive ability of the model provide in turn an a posteriori validation to the proposed formulation. More specifically, as concerns the model setting, we shall focus our attention on the following molecular species: the TGF-β, the TGF-β receptor, the Smad proteins, the Phospatases and finally the gene hmga2. TGF-β exerts its activity by forming a ligand-receptor complex with the TGF-β receptor, which then switches to its active form and consequently transmits the signal into the cytoplasm by phosphorylation of the Smad proteins. The Smad proteins are transcription factors. They act as effective carriers which transport the TGF-β signal from the cell surface to the nucleus. Inside the nucleus, the Smad proteins can be dephosphorylated by phosphatases or express their transcriptional activity by stimulating the mRNA production of hmga2 gene, which belongs to the non-histone chromosomal high mobility group (HMG) protein family. Within our formulation, schematically summarized in figure 1, the modulation in gene expression of hmga2 is interpreted as a direct transcriptional effect of TGF-β. 2 By invoking a straightforward application of the law of mass action, a closed set of ordinary differential equations for the concentration of the chemical species involved is recovered and numerically benchmarked to available experiments. As we will clarify in the forthcoming sections, the model parameters have been adjusted so to reproduce hmga2 gene expression data. This fitting procedure is customarily adopted in pharmacogenomics 10,11 and yields to reliable estimates for the a priori unknown kinetic rates, providing in turn a viable strategy to reconstruct the underlying genetic networks.
12 Surprisingly, the model constrained to interpolate the time evolution of hmga2 gene expression data, predicts a non trivial behavior for the Smad protein phosphorylation dynamics which perfectly matches the experimental measurements of Keshamouni and colleagues. 13 These latter data, not employed in the calibration phase, and therefore treated as an independent experimental input, were generated using western immunoblotting technique, and quantify the protein expression of phosphorylated Smad in lung cancer A549 cells induced by 5ng/mL of TGF-β for 72 hours. The observation of a quantitative correspondence between (constrained) predictions and (independent) experiments, constitutes the main conclusion of our work and testifies on the validity of the proposed formulation. From a methodological point Cartoon of TGF-β pathway dynamics. For a detailed account of the process involved which inspired our formulation refer to Ting Liu et al. 9 The TGF-β molecule interacts with the receptor (R), which then turns into its active form (R * ). The active receptor (R * ) phosphorylates the cytoplasmatic Smad proteins (S c ), which transform into p S c . The phosphorylated Smad shift to the nucleus. In the nucleus p S n can stimulate the transcription of the target gene hmga2. Otherwise it can be dephosphorylated by the phospatases (PPase). The dephosphorylated nuclear Smad proteins(S n ) are exported back to the cytoplasm. of view, we complement the direct fitting strategy with a dedicated statistical based analysis to assess the model robustness. Particularly relevant, in this context, is the use of Bayesian MCMC (Markov Chain Monte Carlo) techniques, which allows to estimate the posterior data-dependent distribution of the uncertain parameters, accounting for the unavoidable noise which affects the experimental data.
The paper is organized as follows: in the next Section we introduce the chemical model and discuss its mathematical formulation. In Section IV we compare the theoretical results with the experimental data, building on the fitting procedure. Particular emphasis is put on discussing the predictive ability of the model, as concerns the time evolution of the Smad proteins. In section V, we address the problems of sensitivity and MCMC analysis. Finally, in Section VI we sum up and draw our conclusions.

II. THE MODEL
In order to study the dynamics of the TGF-β during the EMT, we propose a simple biochemical model which explicitly incorporates a limited number of key molecular species, assumed to mutually interact. The aim of the analysis is to explain, within a minimalist self-consistent dynamical picture, the modification of the gene expressions, driven by TGF-β, and as seen in the experiments. In the following we shall provide a critical guide through the model formulation, making reference to the schematic outline depicted in figure 1. The model consists in a collection of chemical equations, which encode for the interactions among individual entities. As we shall clarify later on, the chemical model readily translates in a closed set of ordinary differential equations for the coupled evolution of the concentration amounts.
The TGF-β belongs to the family of cytokines. It is produced by the cells in response to a external stimulus. Under normal conditions, it plays a role of paramount importance in several cellular processes as e.g. growth, differentiation and death. 14,15 The first equation of the model, see (1), accounts for the interaction between TGF-β and the inactive membrane receptor (R). Following a successful encounter, the receptor turns into its active form here denoted R * . In formulae: the parameter k TGFβ labeling the associated reaction constant.
In reality, the receptor is a complex constituted by two units, the TGF-β-RII and TGF-β-RI respectively. Here, we drastically simplify the model setting by considering just one receptor type, which can operate in its active (bound to TGF-β) or inactive configuration. The activated receptor is able to phosphorylate the Smad proteins (S c ) in the cytoplasm, resulting in the formation of a new species, the phosphorylated Smad protein, here labelled p S c . This process is mimicked by equation (2): In the above relation we assume that R * gets degraded once the Smad proteins are phosphorylated. In principle, one should foresee the possibility for the TGF-β to leave its targeted receptor upon completion of the deputed task and so free the binding site. In the proposed model, this latter condition is deliberately neglected, a choice which amounts to assume that the release of the TGF-β, spoiled of the phosphors, occurs on longer times scale. As a matter of facts, we also emphasize that two are the independent Smad components that get phosphorylated (respectively, p S 2 and p S 3 ) and that subsequently combine to result in the actual unit p S c . This process is compressed into a single step, as specified by chemical equation (2).
Once phosphorylated, the Smad proteins head towards the nucleus. The translocation of the Smad proteins into the nucleus ( p S n ) is necessary to activate the transcriptional activity. This is a delicate process, possibly organized in nested regulatory cycles. We have nevertheless decided to cast it in the minimalistic form: implying that cytoplasmatic p S c are mutated into nuclear p S n with given rate specified by the control parameter k in . The presence of phosphorilated Smad in the nucleus is in turn associated to an increase of the hmga2 gene expression 2 as governed by the following equation (4): The nucleus is also enriched by phosphates (PPase) targeting phosphorylated Smad, 3 a fact which in our view contributes to the regulation of transcriptional activity of Smad proteins. 9 The dephosphorylation of the nuclear Smad proteins p S n , which results in non-phosphorylated nuclear Smad elements S n , is accounted for by the following reaction: Non-phosphorylated nuclear Smad S n can be then exported back to the cytoplasm via the following equation: The above set of chemical reactions translates straightforwardly into a mathematical model, that we shall benchmark to the existing experimental literature in the forthcoming sections. The model is in particular composed of eight independent variables, respectively TGF-β,R, R * , S c , p S c , p S n , S n and hmga2. The associated concentrations, indicated with the usual notation [ · ], obey to the following set of ordinary differential equations: hmga2 measures the mRNA quota and as such assumes arbitrary units. All other concentrations are instead expressed in nM. For this reason the auxiliary parameter k m has been introduced in last of equations (7): k m = α k s , α being a dimensional quantity that restores the correct dimensional balance in the above equations. Notice that the concentration of PPase remains unchanged all along the dynamical evolution. Asymptotically, the system admits four distinct families of fixed points. These are listed in the following: (i) R = R * = S n = p S c = p S n = 0 and TGF-β, hmga2, S c arbitrary (ii) R = S c = S n = p S c = p S n = 0 and TGF-β, hmga2, R * arbitrary (iii) TGF-β = S c = S n = p S c = p S n = 0 and R, hmga2, R * arbitrary (iv) TGF-β = R * = S n = p S c = p S n = 0 and R, hmga2, S c arbitrary As we shall make clear in the following, our system converges to solution (iv), given the specific initial conditions that are being imposed so to benchmark theory and experiments. Notice that the quantities R, S c and hmga2 converge to stationary values, that are function of the initial condition and of the kinetic parameters involved.
In the following analysis we shall concentrate on the out of equilibrium dynamics of system (7), this latter being perturbed by the injection of a specific TGF-β quota. The time that it is necessary for TGF-β to fade off, i.e. for the system to recover its equilibrium steady state configuration, defines our window of observation. The next section is devoted to discussing the initial condition and the parameters choice. Then, in the forthcoming sections, we shall first test the model performance versus the existing experimental literature and subsequently, upon calibration, evaluate its predictive ability.

III. THE SELECTED INITIAL CONDITION
As already stressed, the proposed model builds on current biological knowledge, 3,9 and results in a simplified, though self-consistent, dynamical formulation. The TGF-β pathway is in particular reduced to a limited set of chemical reactions that are presumably implicated in the transmission of the signal from the cell surface, as triggered by TGF-β, to the cell nucleus. The current analysis is in particular aimed at inspecting the out-of-equilibrium dynamics of the system, as driven by the externally imposed TGF-β. The unperturbed steady state is characterized by specific concentrations of cytoplasmatic Smad proteins [S c ] and unbound receptors [R], which have been carefully evaluated in a study by Schmierer and collaborators, 3 that we shall here assume as a reference. In particular (see Table S1, in the Supporting Information of Schmierer et al. 3 3 and Table S1, in the Supporting Information of Schmierer et al. 3 ). Hence, the model rests on just two undetermined kinetic parameters k p , k s , and on the, presently unknown, scaling coefficient α. The concentration of TGF-β, referred to the initial time when the perturbation is applied, can be assigned based on the particular experimental setting that one aims at reproducing. In the next section, we will consider publicly available gene expression data, that report on the time evolution of hmga2 concentration, as induced by a perturbation of the steady state regime via TGF-β insertion. Initially, [hmga2] is assumed to be zero, so to match the experimental conditions.

IV. THEORY VS. EXPERIMENTS: FITTING THE MODEL TO GENE EXPRESSION DATA
The purpose of this section is to benchmark the model to experimental measurements and so produce a direct estimates of the parameters involved. We here recall that only three parameters are to be assigned, namely k p , k s and α, the other being set to values previously reported in the literature.
The experimental dataset that is employed in the calibration phase is taken from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). The dataset GSE17708 reports in particular on the temporal evolution of the level of expression (corresponding to mRNA concentration) associated to the hmga2 gene. The data, organized in 9 successive snapshots ranging from 0 to 72 hours, refer to lung cancer A549 cells exposed to TGF-β. At time t = 0, the cells are treated with 5 ng/mL of TGF-β, a concentration that is equivalent to 0.113 nM. This latter value is assumed as the initial condition for the dynamical variable [TGF-β]. 16 The measures are obtained by Affymetrix one color microarray technology and make use of the platform "U133 Plus 2.0 Array". The gene expression values of hmga2 are detected through 9 probes which match the mRNA of the hmga2 gene. Each probe emits a fluorescence intensity, reflecting the actual mRNA concentration. The temporal behavior is resolved by extracting a limited amount of RNA at successive times, and repeating the microarray analysis. The experimental points are displayed in figure 2 with filled circles: the concentration of mRNA (associated to gene hmga2) grows initially and then saturates to an equilibrium value, after about one day from the time of the initial exposure to TGF-β.
Starting from this setting one can adjust the free parameters of the model to eventually interpolate the experimental profiles. To this end, several strategies have been implemented (the Levenberg-Marquardt algorithm, the Nelder-Mead algorithm and Port algorithm) and shown to consistently return the same solution, which is therefore interpreted as a global minimum of the scrutinized problem. The fitted profile is depicted in figure 2 (solid line) and accurately interpolates the experimental points, as confirmed by direct visual inspection. The best fit parameters are respectively k p = 0.00003s −1 and k s = 0.01s −1 (as concerns the scaling factor, we get the best fit value α = 3.446). The former parameter is definitely small, as compared to the other reaction rates involved. This observation agrees qualitatively with the independent estimate of Schmierer et al., 3  hypothesized. At variance, to the best of our knowledge, this is the first time that the k s parameter is being quantified.
Before turning to elaborate on the robustness of the derived estimates, a task that we will later on accomplish by means of dedicated statistical analysis tools, let us further characterize the predictions of the model. In doing so, we are mainly concerned with testing the predictive ability of the model versus other experimental data, independent from the ones exploited in the calibration procedure. As we will see in the next section, the theory predicts the correct behavior of the phosphorylation kinetic of Smad proteins, without no additional fitting parameters.

A. The model predicts the phosphorylation kinetic of the Smad proteins
As previously anticipated, we are here interested in further evaluating the model, and critically test its predictive performances versus existing experimental observations. It should be emphasized that the model is now fully constrained, no additional parameters modulation being de facto allowed.
Consider for instance the kinetic of Smad proteins phosphorylation kinetics. This is an extremely important indicator in TGF-β signaling: it senses the nuclear import/export processes, and indirectly controls the transcriptional activity/stability. 17 In figure 3, panel A, we report the time evolution dynamics of [ p S c ] and [ p S n ], respectively solid and dashed line. Both curves present a clear peak after about 6 hours from TGF-β injection. Then, the concentrations of phosphorylated (cytoplasmatic and nuclear) Smad decay and eventually fade off in about 48 hours. In figure 3, panel B, the cumulated (cytoplasmatic plus nuclear) concentration of Smad proteins is reported using a different graphical representation that is meant to closely mimic a typical western blotting output. A color code is assigned and the concentration measured at different times plotted accordingly. In particular, we acquire, and hence display, the signal at times 10 minutes and 1, 8 is simply motivated by the need to facilitate the comparison with the experiments of Keshamouni and colleagues 13 where western immunoblotting measurements were performed using the same cell line (A549) as employed in the microarray investigations of Jin J.Y. et al., 10 and in response to the exactly the same quantity of administered TGF-β. The western immunoblotting results reported in Keshamouni et al. 13 are enclosed for convenience in Fig.3, panel C. Indeed, the experiments monitor the concentrations of both p S 2 and p S 3 species (with no distinction between cytoplasmatic and nuclear contribution), the two phosphorylated components that, following successful encounters, result in the complex here object of investigations. Several comments are mandatory at this point. When comparing panel B and C of Fig. 3, one can appreciate a qualitative degree of correspondence between experimental and theoretical profiles. In both cases, the concentrations of the cumulated Smad grow and then disappear in about 48 hours. The experimental peak seems to slightly precede the theoretical one. Indeed, from the experiments, one can solely argue that the maximum in the Smad concentration is probably located between 1 and 8 hours, a fact that agrees, within the accessible accuracy, with the theoretical prediction that positions the peak at 6 hours. As a side remark, we recall that the model looks at the behavior of the complex, formed by the two units p S 2 and p S 3 , and termed p S. The modest shift (if any) between the two maxima can be therefore interpreted as the natural time delay, that has to be allowed for the system to eventually establish the sought complex. Finally, we also reconstructed the Smad dynamics working within the context of the model in Schmierer et al. 3 (data not shown). The qualitative profile is correctly reproduced, but both the location of the peak and the global time of convergence to the asymptotic steady state regime appear to be significantly overestimated.

V. STATISTICAL ANALYSIS
In this section we shall complement the fitting procedure with a statistical based analysis aimed at assessing the parameters sensitivity and model robustness. 18 In other words, we wish to quantitatively evaluate the system response as follows an imposed modulation of key control parameters.

A. Sensitivity analysis
The susceptibility of the mRNA concentration of hmga2 gene to the variation of the model's parameter is here inspected. Define [X] as the mRNA concentration of hmga2 gene and K the scalar control parameter. The sensitivity coefficient is calculated as . The calculation is performed numerically by means of the FME package 19 made available in R. 20 The results reported in fig 4 allow us to appreciate a degree of positive correlation between the mRNA concentration of hmga2 gene on the one side and the transcriptional rate (k s ), the phosphorylation rate (k p ) and nuclear import rate (k in ), on the other. Conversely, a negative interference is observed with the dephosphorylation rate (k dephos ): by increasing the latter, the concentration of [hmga2] gets reduced. These tendencies, clearly consistent with the mathematical scheme, as one can readily qualitatively appreciate, are quantitatively characterized through the coefficients enclosed in the Table of fig 4. We notice in particular that the model is very robust to changes in the export rate (k ex ).

B. Markov chain Monte Carlo analysis
As a final point, we addressed the problem of parameters estimation in a statistical framework by using a Bayesian Markov Chain Monte Carlo technique. 19 This technique enables one to reconstruct a posteriori the parameter distributions (not just the associated mean expected values), accounting for the noise component which certainly affects the experimental data. 21 In our setting, we use the MCMC analysis to calculate the best values of the parameters k p and k s and their associated probability distribution. In Fig 5, the results of the analysis are displayed (in log-lin scale). The average values agree with those obtained by means of conventional optimization algorithms.
The analysis is again performed by using the FME package. 19

VI. DISCUSSION
The EMT is an important biological process which induces the cells to change their phenotype. This process, although implicated in physiological conditions, plays also a key role in cancer pathogenesis. 22 In cancer, in fact, the acquisition of the mesenchymal phenotype makes it possible for the cells to migrate toward nearby organs and so promote metastasis. From a molecular and cellular viewpoint, no specific features have been pinpointed that allow to distinguish physiological from pathological EMT. Several experiments established that, in both cases, the TGF-β pathway is directly involved in the activation of EMT. Starting from this setting, one needs to gain insight into the implicated processes to possibly help developing novel strategies for clinical oncology practice. In this paper we present a mathematical model which describes quantitatively the dynamics of TGFβ pathway in cancer cells during EMT. The model is constrained to fit experimental microarray dataset, a task that it is accomplished by adjusting a limited number of free parameters. The degree of correspondence between the experimental time series and the interpolating theoretical profile is indeed excellent. Importantly, the model predicts the kinetic of phosphorylation of the Smad proteins, with no further adjustments in the parameters involved. We elaborate on the correctness of the prediction, benchmarking our theory to the existing experimental literature. Again, the correspondence is satisfying, a result which in turn provides an a posteriori validation of the proposed mathematical model. The model calibration step is complemented by a dedicated statistical based analysis to assess the sensitivity and robustness of the assigned parameters. A Bayesian Markov Chain Monte Carlo is also performed and shown to return a generalized scenario which agrees with that obtained working with more convential optimization protocols.