A systematic study of the determinants of protein abundance memory in cell lineage

Proteins are essential players of life activities. Intracellular protein levels directly affect cellular functions and cell fate. Upon cell division, the proteins in the mother cell are inherited by the daughters. However, what factors and by how much they affect this epigenetic inheritance of protein abundance remains unclear. Using both computational and experimental approaches, we systematically investigated this problem. We derived an analytical expression for the dependence of protein inheritance on various factors and showed that it agreed with numerical simulations of protein production and experimental results. Our work provides a framework for quantitative studies of protein inheritance and for the potential application of protein memory manipulation.


Introduction
Proteins are important functional players in cells, whose abundances directly affect physiological functions and the fate of the cells [1][2][3][4]. Since proteins are directly passed on from parent to progeny cells in each cell division, there are correlations in the abundance of each type of proteins within a cell lineage [5]. Such correlations may further lead to similar cell behavior within the lineage, thereby playing an important role in inheritance and environmental adaptation. The progeny cell inherits more than just the specific protein molecules from the parent cell; it also inherits a large number of cellular components, including the transcription and translation machineries, which complicates the protein abundance correlation in a cell lineage. Both the strength and the time scale of protein abundance memory in a cell lineage significantly depend on individual cases [6,7]: while the abundance memory of cyclins in budding yeast lasts less than one cell cycle [8], the memory of the expression of the lac gene in E. coli cell lineages lasts for many generations [1].
Recently, an increasing number of studies have suggested that the abundance memory of certain proteins in cell lineages was likely an active and flexible strategy for cells to adapt to changing environments, rather than a simple by-product of cell division [1,5,7,9]. For example, it was suggested that a long memory of protein abundance may help cell adapt to a fluctuating environment [1]. In addition, some proteins inherited from a parental cell can directly affect the apoptotic probability of the progeny cells [10,11].
Given the importance of protein inheritance in cell lineages, increasing theoretical and experimental efforts have been devoted to study the effects of different factors on protein inheritance and its potential impact on cellular behavior [1,5,12,13]. Such investigations undoubtedly contributed to our understanding of many issues related to protein inheritance. However, most studies thus far have focused on the inheritance of a particular protein or the contribution of one or a few specific factors to protein inheritance [1,10]. In reality, protein inheritance in the cell lineage is often affected by multiple factors at the same time. Do different factors have different effects on protein inheritance? Are the influences of these factors independent or synergetic? What is the quantitative relationship between the protein inheritance and the various influencing factors? The lack of answers to these questions has hindered our understanding of the general phenomenon of protein inheritance.
In this study, by combining theoretical approach and quantitative experiments, we systematically investigated the contributions of various factors to protein memory along the cell lineage. We constructed a simple model and, for the first time, gave an analytic formula for the dependence of protein memory on various factors: protein synthesis, protein degradation, volume ratio for uneven division, doubling time, intrinsic noise, extrinsic noise, partition noise during cell division, and the time scales of different noises. We validated the theory by performing both computer simulations and quantitative experiments.

Stochastic simulations of the model
All simulations were performed using MATLAB (MathWorks). Stochastic simulation was performed using the Gillespie Algorithm [14][15][16]. To incorporate the extrinsic noise, the fluctuating production rate of protein was modeled as one ''species". The system had two variables: the production rate number K p ðtÞ and the protein number SðtÞ. The numbers of the protein and the production rate were updated according to the propensity based on Gillespie Algorithm.
The propensity was derived from the deterministic equations by incorporating a system size X through the conventional Van Kamppen expansion [17] and a rescaled constant for the protein production rate number. For the model simulation with the extrinsic noise incoporated into the production rate, the propensity function at time t was ½k pp X; 1=s; a 1 K p ðtÞ; SðtÞk d , where k pp X and 1=s are the birth and death rates of protein production rate number K p ðtÞ, a 1 K p ðtÞ and k d are the birth and death rates of protein number SðtÞ.
In the steady-state, we had hK p ðtÞi ¼ k pp Xs for average production rate number and hSðtÞi ¼ a 1 k pp Xs=k d for the average protein copy number. We also had k p ðtÞ ¼ K p ðtÞ=X for the protein production rate and XðtÞ ¼ SðtÞ=X for protein concentration. After sufficient time simulation, protein copy number reached a steady state, and the generated protein level data were recorded per cell doubling time T. The process was simulated 2000 times. Then, the Pearson correlation was calculated based on the recorded proteins.

Single-cell measurements using time-lapse microscopy
Standard methods were used throughout the study. To prepare the cells for time-lapse microscopy, we inoculated congenic W303 (MATa his3-11,15 trp1-1 leu2-3 ura3-1 ade2-1) cells from a colony into liquid SD, grew the cells for 12 h, and then diluted and cultured them for 12 h. Next, the cells that grew exponentially in SD liquid medium were seeded into a microfluidic chip in the same medium. For each experiment, stacks of 9 images were acquired every 5 min with 30 ms exposure for the bright-field channel, and 50 ms for the red channel and green channels. Microcolonies were tracked throughout the time series by identifying overlapping areas. Cell segmentation and tracing were performed based on bright field images and automatically obtained using the MATLAB customized software cellseg, which we previously developed [18,19]. Fluorescence quantification was performed using cellseg and ImageJ with Image5D plugin. The maximum intensity projection of z-stacks was reported for experiments to obtain the protein intensity.

Quantification of protein half-lives by FACS
W303 yeast strain with Adh1Pr-GFP (expressing GFP protein) or Adh1Pr-GFP-PEST (expressing GFP protein with a PEST tag) were grown in 5 mL of synthetic medium with 2% (w/v) glucose overnight at 30°C and rotating. The overnight culture was diluted to an OD 600 value of 0.1 in 20 mL of fresh medium and incubated until the cells reached the mid-logarithmic growth phase. Cycloheximide (translation inhibitor [20,21]) was added to a final concentration of 200 lg/mL, which is sufficiently high to inhibit protein synthesis without inducing a critical growth defect during the experiment. Next, 0.5 ml of yeast cells were quickly obtained from the culture every 10 min and 4.5 mL PBS buffer was added. FACS was performed to determine the protein fluorescence.

Memory of protein abundance along the cell lineage with intrinsic noise
To investigate the memory of protein abundance along the cell lineage, we first constructed a simple model in which the rate of protein deposition is determined by the rates of protein synthesis ðk p Þ and degradation/dilution ðk d Þ. The degradation/dilution rate k d includes two parts (i.e., k d ¼ k dil þ k deg ): regulated degradation ðk deg Þ and dilution rate ðk dil Þ due to cell growth and division [22]. We considered both symmetrical and asymmetrical division by introducing the division size ratio a, which is defined as the ratio of corresponding volume to the total volume (i.e., V M V M þV D for mother/lager cells, and V D V M þV D for daughter/smaller cells). As a result, k dil ¼ ÀlnðaÞ=T, where T is the cell doubling time [5]. We assumed that the system was in steady state, and only the mother and daughter lineages were discussed for simplicity ( Fig. 1). Fig. 1. Schematic view of the protein inheritance model in a cell lineage. The parameter kp represents the protein production rate, while k deg and k dil represent the degradation rate and the dilution rate, respectively. The cell size ratio is the ratio of corresponding volume to the total volume (i.e. VM V M þVD for mother cells, and VD VM þVD for daughter cells).
Because intrinsic noise and extrinsic noise may have different effects on protein memory in the cell lineage [5], we first investigated a system with intrinsic noise only. The dynamics of protein concentration follows the stochastic differential equation (SDE) [23,24] where XðtÞ is the protein concentration, k p the production rate, k d the degradation/dilution rate. nðtÞ represents the intrinsic noise, which is a rapidly fluctuating random variable with zero mean hnðtÞi ¼ 0 and hnðtÞnðt 0 Þi ¼ dðt À t 0 Þ [25,26]. D is the noise strength.
We assumed that the system was at steady state and that the fluctuation of protein abundance around its steady-state value xðtÞ ¼ XðtÞ À hXðtÞi was small. Note that our model was constructed simply based on the production rate and degradation/dilution rate without considering other kinds of regulations. Thus, the protein expression only had one steady state. Pearson correlation coefficients between different generations were used to measure the strength of protein memory. The Pearson correlation of protein abundance between generation 1 and generation n in the mother/daughter lineages can be calculated as the following equation (see Supplementary Materials for details): Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.scib.2018.07.010.
. . . ; nÞ is the protein abundance fluctuation of the ith generation cell in the mother/daughter lineage, a is the cell size division ratio, and T is the corresponding mother/daughter cell doubling time.
Thus, with only the intrinsic noise, the protein correlation increases with the cell size division ratio a and decreases exponentially with the degradation rate k deg and the doubling time T.

Memory of protein abundance along the cell lineage with both intrinsic noise and extrinsic noise
Next, we added extrinsic noise into the system and investigated the memory of protein abundance in the cell lineage with both intrinsic and extrinsic noises. Extrinsic noise may derive from either or both of production and degradation/dilution processes due to the fluctuation of environment and/or cell heterogeneity. We studied separately the impact of extrinsic noise on production and degradation/dilution.
We first incorporated extrinsic noise into the production rate [27]. We assumed that the fluctuation of the production rate k p was described by the following equation [15,24,28,29]: where fðtÞ represents a Gaussian white noise with ex is the variance of the production rate and s is the correlation timescale of the extrinsic noise. Then, the protein correlation in the mother/daughter lineage can be calculated as follows (see Supplementary Materials for details): Eq. (4) was plotted in Fig. 2 with various factors as variables. Note that the dependence of protein memory on the division size ratio a, the degradation rate k deg , and the doubling time T are similar as in case of the intrinsic noise only.
Eq. (4) can be simplified in two important limits. In the white noise limit, s ! 0 (1=s ! 1, i.e. the extrinsic noise fluctuates rapidly), the above Eq. (4) is degenerated into Eq. (2), the case with intrinsic noise only. Hence, fast fluctuating white extrinsic noise does not change the protein memory behavior. In the adiabatic limit, s ! 1 (1=s ! 0, i.e., the extrinsic noise is almost a constant in a long time), then the protein correlation in the mother/daughter lineage is simplified as (see Supplementary Materials for details): where g ðg ¼ r 2 Total =r 2 Intrinsic Þ is the variance ratio between the total variance of protein abundance and the variance caused by intrinsic noise.
Note that there can be two different sources of extrinsic noise. One is the external environment fluctuation; the other originates from cell to cell variability. Thus, the adiabatic limit case ðs ! 1Þ can be further subdivided into two different cases. In case A, the population of the cells under consideration was reproduced from one single cell. In this case, intrinsic noise is dominant ðg ! 1Þ, and Eq. (5) is degenerated into Eq. (2). In case B, the population of the cells under consideration was reproduced from multiple cells that had some initial cell-to-cell variability. When the cell-to-cell variability is small, this case is similar to case A. When the cell-to-cell variability is dominant in the system (g>>1), the protein abundance in the cell lineage becomes highly correlated ðq ! 1Þ. These results show that intrinsic and extrinsic noises play different roles in protein memory.
Next, we tried to incorporate extrinsic noise into the protein degradation/dilution process, and then into both the protein synthesis and degradation/dilution processes. We found that the contribution of different factors to the correlation were generally similar (Figs. S1, S2, S3 and see Supplementary Materials for details). There is, however, an interesting non-monotonic dependence of correlation on the time scale of extrinsic noise (Fig. S2c  and d), due to the interaction between the noise from production rate and that from degradation/dilution rate.

Small copy numbers of protein
In order to make the protein correlation solvable analytically, we described the stochastic behavior of protein abundance using the Langevin equation (Eq. (1)), where the intrinsic noise amplitude was determined by a free parameter D. However, the parameter D is related to the production and degradation/dilution rates in the real chemical reaction processes. In addition, our SDE of protein abundance rely on assumption of small noise, which could breakdown when the copy number of proteins are very small [30], as is the case for many proteins in E. coli and budding yeast [31,32].
To address these concerns, we used the Gillespie algorithm to simulate the gene expression and degradation/dilution processes with both intrinsic and extrinsic noises [14][15][16]. In order to compare the simulation results with the theoretical prediction (Eq. (4)), we estimated the noise strength D and the variance of production rate r 2 ex based on the chemical Langevin equation [33]. In this case, we had D ¼ 2k p , where parameter k p is the production rate, and r 2 ex ¼ k pp s, where k pp and 1=s is the birth and death terms of production rate (Eq. (3)) [24,33] (see Supplementary Materials for details). Our stochastic simulation using the Gillespie algorithm agrees perfectly with the theory prediction (Fig. 3). It is worth noting that, in the adiabatic limit case B (i.e. 1=s ! 0), the production rate for each single cells is a constant, but drawn from a distribution for a population of cells due to cell-to-cell heterogeneity. Consequently, the temporal protein abundance fluctuation of single cell is controlled by the intrinsic noise, and r 2 Intrinsic and r 2 Total can be estimated by measuring the variance of the protein level using single cell time series and among the population, respectively (Fig. 3b). In simulation, we directly investigated the influence of the variance ratio g ¼ r 2 Total =r 2 Intrinsic by verifying the variance of protein production rate for the cell population as 1=s ! 0 (Fig. 3c).
We also examined the effect of average protein copy number in the steady state. We varied the average protein copy number from 128, 32, 8 to 2. The theory showed consistently very good agreement with the simulations for all cases, even when the average protein copy number decreased to 2 (Fig. 3d).

Comparison with experiments
We further tested our theory with quantitative experiments. We used budding yeast because it grows fast and is easy to manipulate genetically. More importantly, its asymmetrical division not only enabled us to examine the influence of the cell size division ratio, but also provided two different protein dilution rates (doubling times) for mother and daughter cells simultaneously. We employed a microfluidics device, which enabled us to grow the cell population in a very stable environment. This setting would correspond to the case B with adiabatic limit for extrinsic noise ð1=s ! 0Þ (Fig. 4a). Thus, we could directly estimate the variance ratio by measuring the variance of protein abundance in a single cell and in a population of cells, as shown in Fig. 3b.
First, we performed protein inheritance experiments with a GFP controlled by a constitutive promoter Adh1pr and not being actively degraded (i.e. k deg % 0) (Fig. 4b) [34]. We monitored the concentration of GFP in individual cells in real time over generations using time-lapse fluorescence microscopy. To define generations, we fused an mCherry to the septin ring component Cdc10, whose assembly and disassembly mark the cell cycle entry and cytokinesis, respectively (Fig. 4b) [35]. Moreover, to avoid potential fluctuations due to different cell cycle phases, we always quantified protein abundance at the end of cell division (defined as when the septin ring splits into two) (Fig. 4b).
The protein abundance of hundreds of cells for each generation was obtained, and the Pearson correlations of the protein levels between the first generation and offspring generations were calculated ( Fig. 4g and h). Meanwhile, all parameters in the theory, including the cell size division ratio, doubling time, variance ratio g, were also obtained from the experimental data ( Fig. 4c-f), and the median values of these parameters were then used to predict the lineage protein correlation based on Eq. (5) without parameter fitting.
We observed excellent agreement between theoretical predictions and the experimental results for the mother lineage (Fig. 4i). For daughter lineage, the agreement was very good for the first to the second daughter. But for the first to the third daughter, the experimentally measured correlation was lower than the theoretic prediction. A possible reason may be that in contrast to the mother cell lineage, the protein abundance in the daughter cell lineage was obtained by monitoring different cells, which may introduce additional noise. Also, it was difficult to trace the daughter cell lineage more than 3 generations in our experimental system and we had only limited data for the 3rd generation daughters.
Next, we tested the theory with a degradable GFP ðk deg -0Þ. We fused a PEST degron to GFP (GFP-PEST), which would introduce an active degradation of the protein [36,37]. We first monitored the degradation of both GFP and GFP-PEST using cell cytometry after the addition of a protein synthesis inhibitor, and observed that GPF-PEST degraded much faster than GFP (Fig. S4). Note that the degradation of the PEST sequence depends on its phosphorylation state, and thus the degradation rate of GFP-PEST is cell-cycle dependent [38]. Indeed, we found that the fluctuation of protein abundance of GFP-PEST correlates with cell cycle progression (Fig. 5a). However, according to our theory the protein inheritance memory depends on the average degradation rate within a cell cycle. The measured average degradation rate of GFP-PEST was k deg ¼ 0:0112 which was used in Eq. (5) for prediction. We performed the lineage experiments twice using GFP-PEST with different initial cell-to-cell variability ( Fig. 5c and f), which resulted in significantly different the protein abundance correlations along the lineages ( Fig. 5d and g). Remarkably, the theoretic predictions agreed very well with both experiments (Fig. 5d and g), (except for the first to the third daughter for the same reason discussed above).
Furthermore, we tested the theory with an endogenous protein Cdc14, which is an essential cell cycle phosphatase. The localization of Cdc14 is regulated by multiple factors, although it does not seem to have regulated degradation [39,40]. During the early mitosis, Cdc14 is sequestered in the nucleolus by Net1 until M/A transition [39]. Since the localization of Cdc14 changes significantly during cell cycle, the median level of Cdc14 as it localized in the nucleolus was used in our correlation calculation to minimize the error (Fig. 6a). Again, the protein abundance correlation obtained from the experiment agreed very well with the theory (Fig. 6d).

Summary
Protein abundance memory plays an important role for cells to adapt to the environment [1]. However, multiple factors could affect the protein memory, such as protein production, degradation, random partition noise at division, extrinsic noise, intrinsic noise and the time scales of these noises. In this study, we combined theoretical, computational and experimental approaches, and systematically investigated the contributions of different factors to protein memory in cell lineages.
We found that these factors significantly and differently affected the protein memory in the cell lineage. First, the protein level correlation decreases with decreasing division size ratio (a). This would imply that the smaller daughter cell is naturally "fresher" (less correlated with the parent cell) than the mother cell, even without any specific asymmetric aging mechanisms [41]. Moreover, daughter cells commonly have a longer doubling time, making them even more "fresher" (lesser correlated with their parent cell). Second, the protein level correlation decreases exponentially with the protein degradation rate, suggesting that changing the degradation rate could be an easy way to adjust protein memory. Given that protein degradation is a common method of regulation and it can also be subject to environmental cues, this observation may provide insights into the regulation of protein degradation. Third, while intrinsic noise tends to diminish the protein abundance correlation in a cell lineage, extrinsic noise enhances the correlation (Fig. 2d and e).
Generally, there are two layers of protein inheritance, total protein amount and the protein with post-translational modification regulation. Here, we mainly focused on the memory of total protein amount, since the mechanism of protein post-translational modification varies case by case so that it is hard to have a general theoretical expression for this layer of inheritance. However, for the case in which protein with post-translational modification can be viewed as a continuous process (the post-translational modification rate can be viewed as the production rate), our theory could also be used to predict the protein memory.
We chose budding yeast to test our theory experimentally, and the theory agrees remarkably well with our quantitative experiments. Note that, although in principle all parameters described in our model could be tuned experimentally (such as cell size ratio by comparing the symmetric and asymmetric division cells, doubling time by culturing cells with different kinds of medium), in practice most parameters are correlated with each other. For example, in the cases of the protein memory of GFP and of GFP-PEST, adding PEST degron actually did not only change the protein degradation, but also the noise level. This is something to keep in mind, especially when trying to tune the protein memory. Our theory was based on very general assumptions. The conclusions are not limited to yeast, but applicable to other cell types, such as bacterial and mammalian cells. Our work provides a framework and a platform to further explore protein memory in more complex situations, e.g. fluctuating environments with various time scales.
Zongmao Gao is a Ph.D. candidate in the Center for Quantitative Biology, Peking University. His undergraduate background is in physics. His graduate study is interdisciplinary between physics and biology, with current research interest in the robustness of cell cycle and how the cell responses to fluctuating environment. Xiaojing Yang received her Ph.D. degree from the School of Physics, Peking University. She was a postdoctoral researcher at University of California, San Francisco, USA. She currently is a research associate in the Center for Quantitative Biology at Peking University, and her research interest focuses on the cell cycle regulation and optogenetics.
Chao Tang received his Ph.D. degree in physics from the University of Chicago. He was a Full Professor at the University of California San Francisco before returning to China full time in 2011. He is now a Chair Professor of Physics and Systems Biology, and the director of the Center for Quantitative Biology at Peking University. His current research interest is at the interface between physics and biology.