Elsevier

Methods

Volume 62, Issue 1, 15 July 2013, Pages 13-25
Methods

Stochastic models of transcription: From single molecules to single cells

https://doi.org/10.1016/j.ymeth.2013.03.026Get rights and content

Abstract

Genes in prokaryotic and eukaryotic cells are typically regulated by complex promoters containing multiple binding sites for a variety of transcription factors leading to a specific functional dependence between regulatory inputs and transcriptional outputs. With increasing regularity, the transcriptional outputs from different promoters are being measured in quantitative detail in single-cell experiments thus providing the impetus for the development of quantitative models of transcription. We describe recent progress in developing models of transcriptional regulation that incorporate, to different degrees, the complexity of multi-state promoter dynamics, and its effect on the transcriptional outputs of single cells. The goal of these models is to predict the statistical properties of transcriptional outputs and characterize their variability in time and across a population of cells, as a function of the input concentrations of transcription factors. The interplay between mathematical models of different regulatory mechanisms and quantitative biophysical experiments holds the promise of elucidating the molecular-scale mechanisms of transcriptional regulation in cells, from bacteria to higher eukaryotes.

Introduction

Transcriptional regulation is a complex biochemical process that often involves multiple transcription factors that bind to multiple sites on regulatory DNA in response to intracellular or extracellular signals. When bound to these regulatory DNA sequences, transcription factors either inhibit or enhance transcription through interactions with RNA Polymerase (RNAP) and other transcription factors. Most regulatory sequences, which we refer to as “promoters”, contain several operator sequences (“operators”), each of which can often be recognized with different affinities by more than one type of transcription factor. Even bacterial promoters, which are often considered to be simple in comparison with their eukaryotic counterparts, can exist in a surprisingly large number of regulatory states. For instance, the PRM promoter of phage lambda in Escherichia coli is regulated by two different transcription factors, which bind to two distal sets of three operators that can be brought together by looping out the intervening DNA. As a result, the number of regulatory states (each of which corresponds to a specific combination of occupancies of the different operators) of the PRM promoter is 128 [1]. Eukaryotic promoters can be even more complex, also involving nucleosomes that compete with (and may be removed by) transcription factors [2]. Furthermore, eukaryotic promoters are often epigenetically regulated via histone modifications [3], [4], [5], in addition to the more conventional regulation by transcription factors. This may lead to very complex promoter dynamics, which may also involve a separation of timescales between genetic and epigenetic regulation [3]. Given the complexity of most promoters, quantitative mathematical models play an important role in testing molecular-scale mechanisms of transcriptional regulation, helping to connect these biochemical models of regulation, proposed in response to in vitro experiments with purified components, with quantitative gene expression measurements in vivo.

The first generation of such models was developed in response to experiments where the transcriptional response from a population of cells was measured as a function of intracellular concentrations of transcription factors [6], [7], [8], [9]. These models helped connect specific promoter architectures (characterized by the arrangement of transcription factor binding sites in the promoter region) with their input–output functions, i.e., the amount of transcripts produced as a function of the input concentrations of all the transcription factors involved.

More recently, technological developments such as the use of fluorescent proteins as reporter genes have made it possible to extend these studies to single cells. Given that genes are present at very low copy numbers (typically 1–2 copies per cell), as are transcription factors (as little as 5–10 copies per cell), transcription in single cells is an inherently stochastic process. This stochasticity leads to random fluctuations in the transcriptional output of single cells. As a result, the outcome of these single-cell transcription experiments contains more information than the population average of gene expression reported in bulk experiments. The whole distribution rather than just the average mRNA and/or protein number per cell in a population can be measured [10], [11]. In addition, by tracking the number of mRNA and protein molecules as a function of time in single cells, these experiments reveal many aspects of the dynamics of transcription that are obscured by bulk experiments [12], [13]. Notably, direct monitoring of transcriptional dynamics in live cells has demonstrated that transcription may occur in bursts, both in bacteria such as E. coli [12], [14], and in eukaryotic cells such as Dyctostelium [15].

The class of models developed in response to bulk experiments, the so-called “thermodynamic models”, focused on computing the steady state occupancies of the different operators by the transcription factors [6], [7], [8], [9], [16], [17]. For specific promoter architecture, these models can be used to predict the equilibrium probability of each promoter state and therefore the average transcriptional output. Even though these models have been very useful for computing average gene expression levels in steady state (see for instance [9], [18], [19], [20], [21]), they have nothing to say about the dynamics of gene regulation, i.e., which promoter states are kinetically connected, and how often the promoter makes transitions from one state to another. To address these questions, a new class of stochastic models of gene regulation have been developed during the last decade [22], [23], [24], [25], [26], which are specifically tailored to deal with transcription from arbitrarily complex promoters at the single-cell level. Here we give an overview of these models, the equations that emerge from them and how they can be used to address single-cell experiments, and we discuss their limitations.

The task of predicting the distribution of mRNA or protein copy number across a population of cells, or the distribution of times between transcription initiation events in a single cell, is significantly more challenging than determining the mean expression level. While the latter involves a straightforward application of equilibrium statistical mechanics, the former requires formulation of stochastic differential equations, or chemical master equations, which are often not tractable analytically, except in certain limits. In spite of this, several theoretical approaches have been developed that provide insights about how promoter dynamics affects stochasticity of gene expression [27], [28], [29] and how promoter architecture affects the transcriptional output from single cells [18], [23], [24], [26], [30].

While the focus of this paper is the relation between promoter architecture and noise in gene expression, other sources of heterogeneity in gene expression at the single cell level have been analyzed as well, such as fluctuations in transcription factor concentration, diffusion of transcription factors to the promoter, the presence of transcriptional feedback through self-regulation, or fluctuations in the global cellular state [31], [32], [33], [34]. We consider these sources of fluctuations in the transcriptional output of cells, as well as the mathematical methods that have been used to describe them, in Section 4.

This paper is organized as follows: First, we discuss methods for computing the probability distributions of mRNA and protein copy-number from stochastic models of transcription. These distributions can be measured in experiments that count the mRNA or protein content of a single cell in a population (Fig. 1A). We demonstrate that these distributions are significantly affected by the dynamics of transcription and we discuss how information about the dynamics can be extracted from experiments. In the second part of this review we focus on methods to compute the distribution of times between subsequent transcriptional events, another measurable quantity in single-cell transcription experiments; see Fig. 1B. Just as before, our goal is to illustrate how a complex molecular model of promoter dynamics, in which the promoter can exist in any number of states, can be associated with an equation that in turn can be connected to experimental data.

We believe that this dialog between experimental data and mathematical modeling, where quantitative data is used to test model predictions, and models are further refined based on comparisons with experimental outcomes, is essential in order to drive progress in quantitative understanding of how gene regulatory function is determined by the sequence of regulatory DNA. However, it is important to be aware of the assumptions and limitations inherent to any equation that is formulated in response to experiments on single cells, which are far more complex than the models. In the third part of this paper, we discuss these limitations, and highlight areas in which further theoretical and experimental developments are needed.

Section snippets

Steady state distributions of mRNA and protein copy number

Recent experimental [10], [12], [35], [36], [37] and theoretical [22], [23], [28], [29], [30] studies have demonstrated that the distribution of mRNA copy number per cell, and its moments, can be dramatically affected by the underlying dynamics of the promoter controlling transcription of the mRNA being measured. Even qualitative features of the distribution, such as whether it is bimodal or unimodal, are determined by the detailed properties of promoter dynamics, such as the values of the

Distribution of times between subsequent transcription events

Experiments that reveal the dynamics of transcription initiation at promoters can also reveal the molecular mechanisms of transcription regulation [59], [60]. Several such experiments, where the synthesis of new mRNA molecules was visualized in live cells with single molecule resolution have been done so far, both in bacteria and in eukaryotic cells [12], [15], [61], [62]. These experiments have demonstrated that transcription can occur in bursts, and a typical output from such an experiment is

Discussion and outlook

As with any quantitative model, especially one attempting to describe processes within a living cell, it is important to understand the limitations of the chemical master equation description of transcription presented here. Particular care has to be taken when using mathematical models in conjunction with data in order to test specific hypotheses about biological mechanisms. When models are most informative is when there is a discrepancy between the model predictions and experimental data.

Acknowledgements

We are indebted to Rob Phillips, Hernan Garcia, and Jeff Gelles for numerous discussions which have shaped our understanding of transcriptional regulation, and to the NSF for financial support via grants DMR-0706458 and DMR-1206146.

References (77)

  • A. Halme et al.

    Cell

    (2004)
  • L. Weinberger et al.

    Mol. Cell

    (2012)
  • M.A. Shea et al.

    J. Mol. Biol.

    (1985)
  • L. Bintu et al.

    Curr. Opin. Genet. Dev.

    (2005)
  • L. Bintu et al.

    Curr. Opin. Genet. Dev.

    (2005)
  • I. Golding et al.

    Cell

    (2005)
  • J.R. Chubb et al.

    Curr. Biol.

    (2006)
  • H.G. Garcia et al.

    Trends Cell Biol.

    (2010)
  • H. Boeger et al.

    Cell

    (2008)
  • M.L. Simpson et al.

    J. Theor. Biol.

    (2004)
  • J. Peccoud et al.

    Theor. Popul. Biol.

    (1995)
  • T.B. Kepler et al.

    Biophys. J.

    (2001)
  • J. Rausenberger et al.

    Biophys. J.

    (2008)
  • A.M. Walczak et al.

    Biophys. J.

    (2009)
  • D. Kennell et al.

    J. Mol. Biol.

    (1977)
  • J. Zhang et al.

    Biophys. J.

    (2012)
  • L.J. Friedman et al.

    Cell

    (2012)
  • J.S. van Zon et al.

    Biophys. J.

    (2006)
  • P.S. Swain

    J. Mol. Biol.

    (2004)
  • P.R. Cook

    J. Mol. Biol.

    (2010)
  • H.G. Garcia et al.

    Cell Rep.

    (2012)
  • J.M.G. Vilar et al.

    Bioinformatics

    (2010)
  • G. Hornung et al.

    Genome Res.

    (2012)
  • L.M. Octavio et al.

    PLoS Genet.

    (2009)
  • G.K. Ackers et al.

    Proc. Natl. Acad. Sci. USA

    (1982)
  • A. Raj et al.

    PLoS Biol.

    (2006)
  • G.-W. Li et al.

    Nature

    (2011)
  • L. So et al.

    Nat. Genet.

    (2011)
  • T.T. Le et al.

    Proc. Natl. Acad. Sci. USA

    (2005)
  • L. Saiz et al.

    Nucleic Acids Res.

    (2008)
  • T. Kuhlman et al.

    Proc. Natl. Acad. Sci. USA

    (2007)
  • H.G. Garcia et al.

    Proc. Natl. Acad. Sci.

    (2011)
  • J. Gertz et al.

    Nature

    (2009)
  • E. Segal et al.

    Nat. Rev. Genet.

    (2009)
  • Á. Sánchez et al.

    Proc. Natl. Acad. Sci.

    (2008)
  • A. Coulon et al.

    BMC Syst. Biol.

    (2010)
  • T. Höfer et al.

    Genome Inform.

    (2005)
  • J. Paulsson

    Nature

    (2004)
  • Cited by (45)

    • Distribution of Initiation Times Reveals Mechanisms of Transcriptional Regulation in Single Cells

      2018, Biophysical Journal
      Citation Excerpt :

      To connect mechanisms of transcription initiation with measured times between successive initiation events, we consider a stochastic model of transcription with a general initiation mechanism, in which the promoter can be in an arbitrary number of states defined by different constellations of bound transcription factors and cofactors. Using a chemical master equation approach (22,55,56), we show that the distribution of times between two initiation events and its moments can be computed analytically for any mechanism of transcription initiation. These equations allow us to discriminate between different mechanisms of initiation by comparing the predicted distributions to experimental distributions of transcription initiation times.

    • Single-cell systems biology: Probing the basic unit of information flow

      2018, Current Opinion in Systems Biology
      Citation Excerpt :

      The earliest ‘Telegraph’ model for describing how information is processed through gene expression dynamics [62] was based on a single active and inactive state. The model proved to fit expression data in some instances [36,63,64], but there are increasing examples which illustrate that two states are insufficient to represent the data [65–68]. Recently, Rieckh and colleagues [69] identified instances in which a multi-state promoter model performs better than a simple two-state model; however, they advocate the two-state model as the simplest theoretical baseline to start from, as it is possible to overfit the data with too many states.

    • Mathematical aspects of the regulation of gene transcription by promoters

      2017, Mathematical Biosciences
      Citation Excerpt :

      Such features of gene expression have important consequences for cellular function, being beneficial in some contexts and harmful in others [19–21]. The corresponding theoretical studies are numerous (see e.g. already mentioned reviews [3–16], recent original studies [22–28], and references therein). In fact, expression (1) often remains to be applicable in this case provided pi are treated as stochastic variables.

    View all citing articles on Scopus
    1

    These authors contributed equally to this work.

    View full text