Promoters maintain their relative activity levels under different growth conditions

Most genes change expression levels across conditions, but it is unclear which of these changes represents specific regulation and what determines their quantitative degree. Here, we accurately measured activities of ∼900 S. cerevisiae and ∼1800 E. coli promoters using fluorescent reporters. We show that in both organisms 60–90% of promoters change their expression between conditions by a constant global scaling factor that depends only on the conditions and not on the promoter’s identity. Quantifying such global effects allows precise characterization of specific regulation—promoters deviating from the global scale line. These are organized into few functionally related groups that also adhere to scale lines and preserve their relative activities across conditions. Thus, only several scaling factors suffice to accurately describe genome-wide expression profiles across conditions. We present a parameter-free passive resource allocation model that quantitatively accounts for the global scaling factors. It suggests that many changes in expression across conditions result from global effects and not specific regulation, and provides means for quantitative interpretation of expression profiles.


Selection of genes for promoter library.
Our study encompasses 1820 (3/4) of all E.coli promoters and 859 (1/6) of all S.cerevisiae promoters. The E.coli dataset was previously described (Zaslaver et al, 2006) and was shown to be highly representative of both genome and proteome. The yeast promoters were chosen to include several complete groups of genes, in addition to representatives of other groups. Thus, our library includes nearly all known ribosomal proteins (110 promoters), transcription factors (157 promoters) and central metabolism enzymes (312 promoters). The remaining 280 promoters were chosen to represent various GO categories: cytoskeleton, cell wall, DNA processing, chromatin, mRNA processing and export, chaperons, cell cycle genes, signaling pathways, secretory pathways, mitochondrial proteins, vacuolar proteins, degradation pathways etc. In total, our library covers ~1/6 of the S. cerevisiae genome.
We performed several tests to gauge the degree of representation of our library.
First, we examined the distribution of our promoters with respect to dominance of either TFIID or SAGA transcription initiation complexes, as previously determined (Huisinga & Pugh, 2004). We found that our set of promoters is representative of the genomic distribution, with 78% TFIID-dominated promoters and 13% SAGA-dominated promoters (similar to the genomic percentages of 82% and 10%, respectively, Table S1 (Huisinga & Pugh, 2004)). This indicates that our set of promoters is representative of the two main transcription regulation strategies, not enriching for neither housekeeping nor stressrelated genes.
Second, we examined our promoters with respect to their architecture. We classified the promoters of all yeast non-dubious open reading frames and known noncoding RNAs according to their promoter type. Promoters which support bidirectional transcription (a head to head orientation) were classified as Divergent, and promoters that support unidirectional transcription (a tail to head orientation) were classified as Unique. We found that our set of promoters is representative of the genomic distribution, with 50% divergent promoters and 50% unique promoters.
Third, we examined our promoters with respect to the two characteristic promoter architectures, OPN and DPN (Tirosh & Barkai, 2008). We found that out of our 859 promoters, 162 were designated with prominent promoter architectures. Of these, 65 were designated as DPN and 97 as OPN, indicating that our set of promoters is well representative of promoter architectures, not enriching for neither closed nor open forms.
Fourth, we examined protein expression levels of our set of genes in a curated dataset of protein abundances integrated from 5 different datasets .
We found that the combined expression of our selected genes encompasses ~60% of the protein mass expressed in rich media. We note that while this subset is enriched for highly active promoters, and therefore may not accurately represent weak promoters, it is highly representative of the proteome, and accounts for much of the cellular activity in standard growth conditions. Fifth, we repeated our entire analysis, excluding ribosomal promoters overrepresented in our datasets, and our results were hardly affected, both qualitatively and quantitatively (Fig. S10).

Analysis of mRNA and protein datasets.
We explored whether our findings regarding proportional scaling of promoter activities also apply for mRNA and protein abundance, as these include additional layers of regulation. To this end we performed several analyses:

Promoter activity is highly correlated with mRNA and protein levels
We first compared our measurements in synthetic complete media with glucose to a variety of existing datasets that examined genome-wide mRNA and protein levels under rich conditions. We compared our promoter activity values to 3 different DNA microarray studies (Holstege et al, 1998;Shalem et al, 2008;Lipson et al, 2009), 3 different RNA-seq studies (Nagalakshmi et al, 2008;Lipson et al, 2009;Yassour et al, 2009), protein abundance obtained by immuno-tagged proteins (Ghaemmaghami et al, 2003), fluorescently-tagged proteins (Stewart-Ornstein et al, 2012), mass spectrometry (de Godoy et al, 2008) and a curated dataset of protein abundances integrated from 5 different datasets . In all cases our promoter activity data correlated well with mRNA and protein abundance (R=0.72-0.81 and R=0.57-0.74 respectively, Fig. S5), suggesting that promoter activity is a major determinant of these properties.
The comparisons between all these datasets yield several interesting observations. First, we note that the correlation coefficients between our promoter activity and these datasets are similar to the correlation coefficients between these datasets themselves, supporting the validity of our data. Second, as expected, the mRNA datasets better correlate between themselves than they do with the protein datasets. Our data correlates better to mRNA levels than to protein levels, as could be expected for promoter activity data. Finally, the high correlations suggest that global trends we derive for promoter activities should be largely preserved for both mRNA and protein levels. We test this in the following section.

Proportional scaling is largely preserved for both mRNA and protein levels
To test whether our findings regarding proportional scaling of promoter activities also apply for mRNA and protein abundance, we searched the literature for genomewide studies performed in several growth conditions. We note that these datasets are not ideal for the detection of proportional scaling due to the following reasons: 1. While microarrays and RNA sequencing are very useful at delineating global profiles, they are known to have considerable noise at the single gene level (Marshall, 2004;Frantz, 2005). These stem from technical challenges such as gene-specific biases in hybridization, PCR, differences in labeled material, dye-specific biases and location on the array (Oshlack & Wakefield, 2009;Balázsi & Oltvai, 2007). For these reasons, a scatter that compares genes between conditions will generally form a 'cloud' rather than a tight line that would indicate a proportional response (Fig. S3).

Most genome-wide technologies employ various normalization procedures to
counteract the abovementioned non-biological variation, such as analysis of equimolar amounts of cDNA after cell lysis and RNA extraction and various in-silico normalizations (Churchill, 2002;Tang et al, 2007;Bammler et al, 2005;Bakel & Holstege, 2008a;Sun et al, 2012). These contribute to the obliteration of global proportional scaling.
3. Most of these experiments do not include biological replicates, making it impossible for us to separate the experimental noise from the actual change in gene expression and to tease apart global and specific responders, as we did in this study for both the S. cerevisiae and E. coli data (Methods).
For the reasons above, we took a complementary approach. We reasoned that if the genes that display proportional scaling in their promoter activity also display it in other levels of regulation, then we would expect that when we examine the ratio of these genes between conditions, it will be relatively constant. That is, if proportional scaling is preserved, the expression ratio between two conditions of genes that belong to the same cluster in our analysis should have low variability. However, due to the lack of replicates in these experiments we do not have an estimate of the experimental noise and absolute measure of what 'low variability' is. Even so, we can predict that if our principle holds, this variability should be lower than the variability between genes from different clusters that scale with different scaling factors. We note that this analysis depends on our clustering, which was derived from a defined set of environmental conditions, and therefore it can be performed only on datasets that examined similar conditions.
We thus compared the intra-cluster variability to the inter-cluster variability of genes that were part of this study in 8 datasets, including microarrays (Gasch et al, 2000;Lai et al, 2005;Chechik et al, 2008;O'Rourke & Herskowitz, 2002;Brauer et al, 2008), RNA-seq (Tirosh et al, 2011), fluorescent protein-fusion strains (Breker et al, 2013) and mass spectrometry (Costenoble et al, 2011). We asked to what extent genes within the global cluster preserve proportionality between conditions by examining whether the ratio of these genes between the conditions was less variable than the ratio between the other genes between the same conditions. Indeed, for all examined conditions in all examined datasets, we found that the intra-cluster variability was much smaller than the inter-cluster variability (Fig. S11), indicating that the proportionality we observed for promoter activity is largely preserved for mRNA and protein levels.

Discussion of experimental system
Our experimental system is based on fusion of promoters upstream of a fluorescent reporter and enables non-invasive tracking of live cells across time, as described in detail in the main text and experimental procedures. This is an established approach, and promoter-reporter constructs using enzymatic reporters such as betagalactosidase or luciferase have been used successfully in biological research since the 1980s (Bronstein et al, 1994) and have contributed much to our understanding of promoter architecture and gene regulation. More recently, variants of this approach using fluorescent reporters were successfully applied by our lab and by others to reveal ordered activation of genes in various pathways in bacteria (Kalir et al, 2001;Zaslaver et al, 2004) and to generate libraries of native and synthetic promoters in bacteria (Cox et al, 2007) and yeast (Ligr et al, 2006;Murphy et al, 2007;Gertz et al, 2009;Raveh-Sadka et al, 2012;Sharon et al, 2012;Zeevi et al, 2011;Newman et al, 2006), which provided much insight into the rules that underlie combinatorial cis-regulation. We hereby discuss some of the advantages and limitations of our system and provide several reasons and validations for its accuracy and sensitivity.

Fluorescence as a measure for promoter activity
An important feature of our experimental system is that we rely on a fluorescent protein as a reporter for promoter activity. The use of fluorescence has several wellestablished advantages, such as high sensitivity and measurement resolution ( Fig. S1-3) and the ability to perform the measurements in-vivo. However, the use of fluorescence may also be regarded as a limitation of the system as the reported values encompass changes in both transcription and translation. We note that this is true for all reporter assays, including immuno-tagged proteins, luciferase and beta-galactosidase assays, which have nevertheless contributed highly to our understanding of the regulation of expression (Bronstein et al, 1994). We believe the advantages of fluorescence reporters outweigh their disadvantages. Yet, since we realize that this is the shortcoming of our experimental system, we have performed several validations with respect to this issue: 1. All the tested promoters drive the expression of the same protein (YFP) with the same 3'-UTR. Thus, aside from the 5'-UTR, the transcript produced from all promoters is identical. It is thus reasonable to assume that post transcriptional and post translational regulation is highly similar for all of our strains, and consequently, that differences in the measured promoter activity for different genes are attributable to differences driven by the different promoter sequences. Naturally, between conditions all promoters will also be affected by changes in global translational parameters, yet these effects should be similar for all tested promoters.
2. We compared our promoter activities to quantitative real-time PCR measurements of 18 selected strains in two growth conditions. The high correlation obtained between mRNA and YFP levels (R = 0.99 and R = 0.98, Fig. S4A-B) and the recapitulation of our results at the mRNA level ( Fig. S4C-D) confirm that YFP protein levels are an accurate proxy for the corresponding mRNA levels.
3. We compared our promoter activity values to three DNA microarray studies (Holstege et al, 1998;Shalem et al, 2008;Lipson et al, 2009), three RNA-seq studies (Nagalakshmi et al, 2008;Lipson et al, 2009;Yassour et al, 2009), protein abundance obtained by immuno-tagged proteins (Ghaemmaghami et al, 2003), fluorescently-tagged proteins (Stewart-Ornstein et al, 2012), mass spectrometry (de Godoy et al, 2008), and a curated dataset of protein abundances integrated from five different datasets . We found that the correlation coefficients between our promoter activity and these datasets are similar to the correlation coefficients between these datasets themselves, supporting the validity of our data.
For all these reasons, we believe that the values we report are highly indicative of the actual promoter activities.

Genomic location
Another feature of our experimental system is that all constructs are integrated at the same locus in the yeast genome. Similar to the use of fluorescence, this trait of our experimental system has both advantages and disadvantage. One concern is that the epigenetic regulation of all constructs is likely to be identical, and therefore the reported values will not be indicative for the promoters in their native contexts. To assess whether this is indeed a major concern, we compared our results to 10 distinct datasets of mRNA and protein abundance that were measured in the native genomic locations, as detailed above. The high correlations obtained (R=0.57-0.81, which is similar to the correlations between these datasets Fig. S5), and the recapitulation of our results in several of these datasets (Fig. S11), reinforce our values to be adequate proxies for these quantities, despite the non-physiologic location of the promoters in our experimental system. In addition, we note that our results also hold for the prokaryote E.coli, in which genomic location is presumably a less significant factor due to the absence of chromatin.
An advantage of our experimental system is that it isolates the effect of the promoter alone, without effects of genomic context, distinct coding sequences and various post-transcriptional modifications. All the differences in expression between different strains in the library can be attributed to the ~400bp inserted upstream to the YFP. This allows us to obtain a clean signal for promoter activity, which is required for mathematical modeling of the promoter contribution to the response to different conditions. Thus, it appears that the fixed location of the promoters in our experimental system does not have a major effect on the outcome of the study and is beneficial in the attribution of the observed phenomenon to the promoter sequence. Repeating our experiments with the promoters at their native location could be informative of how much of the transcriptional signal is attributed to the promoter versus the genomic location, which is in itself an interesting and unresolved question.

Advantages and assessment of accuracy of the experimental system:
We hereby provide several reasons and validations for the accuracy and sensitivity of this system:

1.
The choice of fluorophore: We chose YFP over GFP, which was used in proteinfusion libraries (Newman et al, 2006), since yeast cells autofluoresce much less at this wavelength, thereby increasing our measurement sensitivity (35, Fig. S1).
2. YFP stability: it has been shown that YFP is stable and long-lived. Whereas this property restricts the ability to interrogate dynamically changing systems (Mateus & Avery, 2000;Houser et al, 2012), it is ideal for interrogation of steady state growth as slow turnover rate ensures that the difference in YFP levels across time provides a direct measure of the amount of YFP produced.

3.
In addition to the YFP, we inserted a red fluorophore (mCherry) which serves as an internal control for the reliability of the strain construction process and experimental variability. When constructing a fluorescence-based library, even when it is done robotically, each strain undergoes separate transformation, growth and measurement. It is therefore critical to have a control showing that in the process the different strains in the library did not acquire any mutations that generally affect transcription in the cell, causing the promoter activity calculated from the YFP to be unreliable. Indeed all strains in the library exhibit highly similar growth curves and mCherry levels ( Fig. 1).

4.
The construction of the strains and experimental procedures were subjected to several tests including: sequencing of promoter insert, similarity in growth curves and similarity in mCherry expression.

5.
Our system requires neither interventions nor normalizations, as we noninvasively track live cells over time with high temporal resolution, enabling the extraction of robust data.

6.
We validated that the YFP levels of independent clones of the same promoter sequence are indistinguishable from those of replicate measurements of the same clone, indicating that our library construction procedure does not introduce mutations that have global effects on transcription or translation (35, Fig. S4).

7.
For 60% of strains and conditions, measurements were performed in biological replicates (up to 6 measurements per strain), which were carried out on different days and included all stages of the experiment, starting from the frozen stock.
This allowed us to reliably assess our detection level and experimental noise. We note that we report low coefficient of variation values at the single gene level, ranging from 0.05 for highly active promoters to 0.36 for promoters with very low activity, lower than those obtained with other methods such as microarrays, RNA-sequencing and mass-spectrometry (Methods, Fig. S1-3).

8.
The accuracy of our system is supported when we examine the GO annotations of the genes in our proportional clusters (Table S6). We find that in nearly all cases members of the same pathway or complex cluster together, resulting in near maximal enrichment values.

9.
Quantitative PCR elaborated above indicates that our measurements of promoter activities correlate well with mRNA levels (R = 0.99 and R = 0.98, Fig. S4A-B).

10.
Good correlation of our dataset to existing mRNA and protein datasets elaborated above (Fig. S5) and the recapitulation of our main finding in these datasets ( Fig. S11) reinforce the validity of our data.

11.
The repetition of the phenomenon of proportional scaling in the independent dataset from E.coli provides further support for the conclusions extracted from our data (Fig. 7,S12-S17).

Models for global effects on expression across conditions.
In our study, we found that promoter activities change between conditions and that this change is captured by a linear function with a scaling factor that according to our model compensates for the changes in growth rate and magnitude of the specific response. This proportional response agrees with theories of the Copenhagen school, which proposed that the expression of non-regulated genes should scale proportionally (Maaloe, 1969;John L. Ingraham, Ole Maaløe, 1983). Here we discuss plausible alternative modes of regulation that would lead to different findings:  Static response: Non-regulated promoters could be static between conditions, with no expression change whatsoever. Although this model may seem obsolete, this assumption actually underlies most current forms of analysis of high throughput data, including microarray and sequencing analysis (Bakel & Holstege, 2008b).
 Highly variable response: Promoters, both regulated and non-regulated, could display high variability between conditions, with no preservation of their relative values. This model resonates with the current paradigm that gene expression is the result of a highly complex combinatorial process, whereby each gene is separately controlled by a distinct set of both global and specific transcription factors (Reményi et al, 2004). Across conditions, this model predicts changes in the production of each of these factors, which in turn would lead to differential changes in expression of their targets. The outcome of this would appear as a highly variable response across conditions, with little or no preservation of stoichiometry between different genes. Graphically, when comparing between two conditions, this will manifest as a 'cloud', rather than a tight line.
 Linear response with a different scaling factor: Non-regulated promoters could display proportional changes between conditions with various optional scaling factors. One plausible hypothesis discussed in the main text is that such a global scaling factor will quantitatively compensate for the changes in growth rate. This model requires cells to possess a mechanism for coordinating promoter activity and doubling time. In principle, this could be achieved if cells control their growth rate based on the concentration of some non-specifically regulated 'counter protein'. Higher concentration of that protein would lead to faster growth rates which in turn would decrease the concentration. This model will entail the global scaling factor to be proportional to the growth rate. This model is compelling as it entails that per doubling time, unregulated promoters will preserve their concentration, which could be beneficial for the robustness of the cell.
 Non-linear response: Another model is that global changes in parameters will have a differential effect on different promoters. Thus, gene expression changes between conditions resulting from global regulation may obey a non-linear mathematical function. It is easy to suggest molecular mechanisms that will exert such differential responses of the promoters. For example, if the determining factor was RNA polymerase, then we would have expected that promoters with different architectures and different affinities for the polymerase will be affected differentially.

Variance explained by linear scaling
The adherence of promoter activities to scale lines may seem at first somewhat surprising, in light of the prevailing paradigm that gene expression is the result of a highly complex combinatorial process, whereby each gene is separately controlled by a distinct set of both global and specific transcription factors (Reményi et al, 2004). We believe our results do not contradict this paradigm, but rather allow one to assess the relative contribution of such different factors.
We find that across all conditions, partitioning promoters into 6 clusters allowed us to account for 97% of the variability in promoter activities across conditions over the entire dataset. This analysis can be repeated separately for each condition and for each cluster (Methods). We find that for all conditions, the use of only 6 scaling lines explains 72%-98% of the variability in the dataset. The existence of residual variability represents the fact that promoter activities do not adhere perfectly to their respective scale line, but are rather distributed around it ( Fig. 3C-E,G, Fig. S7). This distribution indicates that in addition to the linear scaling, shared by different group members, there exist some genespecific changes in activity. This can be caused by different sensitivities and nonlinearities in the response that different group members have to changes in global and specific transcription factors. However, the fact that such high percentages of the response are quantitatively explained by only six numbers, suggest that these genespecific changes are small compared with the changes shared across functionallyrelated clusters and the global changes across conditions. Altogether, our results imply that despite being highly combinatorial, gene expression changes across conditions may actually be quantitatively structured and simple, since most of this change can be captured by only a few scale lines.
Adding conditions to our study would help to identify more condition-specific promoters. As more conditions are added, it would be interesting to see how many more numbers are required to fully understand an organism's expression programs. It would be interesting to see if there is an upper limit on the number of scaling factors needed to fully predict the expression program in a new condition. We hypothesize that this number is possibly similar to the number of specific regulators in the organism (on the order of 250 in E. coli and S. cerevisiae).

Enrichment analysis
To assess our clustering, we examined them in terms of biological function. First, we tested their enrichment in functional annotations from Gene Ontology (GO) (Methods). Next, we asked whether our clusters grouped together promoters that are regulated by similar mechanisms. To this end, we examined how promoters of transcription factors (TFs) were distributed across the different clusters, since changes in the expression of many TFs were shown to correlate with changes in the expression of their targets (Segal et al, 2003;Pe'er et al, 2002). Our library allows a global examination of this idea, since it includes promoters for most of the known yeast transcription factors (Badis et al, 2008;Zhu et al, 2009). Finally, we examined the promoters in terms of their promoter architectures (OPN/DPN (Tirosh & Barkai, 2008), promoter types (divergent/unique, based on (Saccharomyces Genome Database)) and transcription regulation strategies (SAGA-dominated/TFIID-dominated (Huisinga & Pugh, 2004)) (Methods).
Notably, we found significant enrichments across all six clusters, in good correspondence with our understanding of the tested conditions (Fig. 4, Tables S6,S7).
As detailed in the main text, the first cluster contains most genes (77%) and most TFs (85%), from various GO families. It is enriched for constitutive, TATA-less, TFIIDregulated promoters with an open chromatin architecture. While the expression of some of these genes (e.g., ribosomal protein genes) was previously shown to be correlated with the growth rate (Regenberg et al, 2006;Castrillo et al, 2007;Brauer et al, 2008), others (e.g., GAPD and actin) are considered classical house-keeping genes that are constitutively expressed. Our measurements indicate that the activity of all of these promoters scales together across all 10 tested conditions.
The other clusters exhibit condition-specific responses in a subset of the tested conditions and are highly enriched for families of genes that are known to respond to these conditions and their known regulators (Fig. 4, Tables S6,S7, supp. material 1.6).
Clusters 2 and 5 represent two different branches of respiration and are highly upregulated in strictly aerobic conditions (ethanol and glycerol) and mildly upregulated in partially aerobic conditions (galactose and galactose lacking amino acids). Cluster 3 is upregulated in osmotic stress conditions (NaCl) and is enriched for trehalose and glycoside metabolism, known to participate in alleviation of this stress. Cluster 4 is upregulated in conditions lacking amino acids (with either glucose or galactose as carbon sources) and is enriched for genes of amino acid biosynthesis pathways. As expected, clusters 2-4 are also enriched for SAGA-dominated promoters. Finally, cluster 6 is upregulated in conditions in which galactose serves as the carbon source, and it is comprised almost solely from promoters of genes from the galactose assimilation pathways. Thus, these clusters may represent complete regulatory units that are coregulated across conditions in a manner that largely preserves their internal stoichiometry.
An important implication of our clustering results is that proportional scaling of promoter activities transcends the usual partition of promoters to housekeeping /condition-specific, open/closed, TFIID/SAGA-dominated. When examining the matrix of scaling factors of clusters across conditions (Table S5), it is clear that between most conditions, most clusters are not differentially regulated and their scaling factor coincides with the global scaling factor. This is consistent with the observation that cluster 1 contains condition-specific promoters that are not differentially regulated across our tested set of conditions (e.g. ER-stress associated proteins), and therefore scale according to the global scaling factors across the entire dataset. These observations hint that the mechanisms responsible for global proportional scaling are not unique to a limited set of genes, promoter architecture or transcription regulation strategies. We find that both growth-related promoters (e.g. ribosomal) and stress-related promoters (e.g. respiration-related metabolism) exhibit global proportional scaling when not differentially regulated. Accordingly, both TFIID-dominated and SAGA-dominated promoters exhibit global proportional scaling when not differentially regulated. Thus, global proportional scaling of promoter activities is probably the result of a basic mechanism, shared across all promoter classes and architectures.

Accurate prediction of promoter activities from several representative promoters
The clustering described in the main text implies that in any given condition, the activity levels of all promoters can be accurately described by knowing the partition of promoters into clusters, the relative activity levels of promoters within each cluster, and the clusters' scaling factors in the given condition. Both the clustering and relative activity levels of promoters in each cluster can be obtained from promoter activity measurements in some set of conditions. The scaling factors for any new condition can be obtained by measuring the activity level of only a few representative promoters from each cluster in that new condition. Thus, we hypothesized that given promoter activity measurements in some set of conditions, we should be able to accurately predict all of the promoter activities in a new condition by measuring only a few representative promoters. To test this hypothesis, we used a cross-validation scheme, whereby for each tested condition, we clustered the promoters based on all other conditions, and predicted all of the promoter activity levels of the tested condition using the activity levels of only a few predefined representative promoters (Methods). Indeed, using only 10 promoters, we obtained highly accurate activity level predictions, whereby in every condition, the predictions explain over 85% of the variance of the activities of at least 98% of the promoters (Fig. 5, S9).
This ability to derive accurate predictions results from the variety of specific responses represented in our dataset, and from our finding that most promoters preserve their relative activity levels across conditions. Since we can only accurately predict conditions that are similar to conditions that were already measured or to combinations of such conditions, we fail to predict specific responses that were not contained in the original training dataset (Fig. 5C, blue boxes). However, these failures are highly informative, since promoters for which our predictions deviate the most from their measured activity levels suggest the existence of new clusters and regulatory pathways that were not activated in the original dataset.

Expansions on the passive resource allocation model
We suggest a passive resource allocation model to explain the values of the observed global scaling factors. We posit that in every doubling the total number of molecules produced from all promoters together is preserved. Such preservation of total promoter activity is expected without the need for an elaborate regulation mechanism if the total fraction of proteins in the biomass (measured as OD) is close to constant across conditions (Maaloe, 1969;John L. Ingraham, Ole Maaløe, 1983). In line with earlier models discussing differential allocation of resources between conditions (Ehrenberg & Kurland, 1984;Koch, 1988;Scott et al, 2010;Zaslaver et al, 2009;Molenaar et al, 2009), the overall promoter activity can be thought of as a fixed resource available to the cell per doubling time, but that this resource is differentially partitioned between the condition-specific genes, and the globally responding genes, where the exact partition is determined by the varying magnitudes of the specific response required in each condition. Thus, the value of the global scaling factor will accommodate both changes in growth rate and the magnitude of the specific response in each condition.
Since all ribosomal subunit promoters in our data belong to the global cluster, this model is consistent with previous studies in which the fraction of ribosomes in the cell was found to be correlated with growth rate (Schaechter, 1958;Maaloe, 1969;Bremer & Dennis, 1987;Zaslaver et al, 2009;Scott et al, 2010). Notably, this model entails that the concentration of most proteins per doubling time per biomass is not preserved across conditions. It is interesting to speculate how cells manage with these changes in concentration.
Despite this ability of the passive resource allocation model to account for a large fraction of the global scaling factors based on the growth rate and magnitude of the specific response to each condition, there remains a considerable fraction of the proportional response that this simplified model does not explain. For the yeast dataset, this may partly stem from inaccuracies in our estimates of the total activity of all promoters, since our library does not include all promoters. Similarly, this simplified model does not directly account for other global properties, such as cell size and macromolecular composition, which are known to vary in different conditions along with the doubling time (Schaechter, 1958;Bremer & Dennis, 1987;Neidhardt, 1999).
Nevertheless, the accuracy with which this model matches the measured global scaling factors based on the doubling times and magnitude of the specific responses suggests that these factors are likely to be major determinants of the global scaling factors.
Finally, it is interesting to consider what mechanisms are responsible for the observed proportionality and the coordination between promoter activity and growth rate.
In bacteria it has been suggested that cAMP (You et al, 2013) ppGpp (Magnusson et al, 2005) and use of alternative sigma factors (Klumpp & Hwa, 2008;Zaslaver et al, 2009) may contribute to the differential allocation of resources to different groups of genes. In yeast, transporters, cAMP, master growth regulators (e.g. TOR) and different ribosomal subunits have been suggested to perform a similar task (Broach, 2012). Importantly, our current observations that proportional scaling underlies both global and specific responses hint that the mechanisms responsible for global proportional scaling are not unique to a limited set of genes, promoter architecture or transcription regulation strategy. It transcends the usual partition of promoters to housekeeping/ conditionspecific, growth-regulated/stress-regulated, open/closed NFR, TFIID/SAGA-dominated.
Thus, global proportional scaling of promoter activities is probably the result of a basic mechanism, shared across all promoter classes and architectures. To date the molecular mechanisms underlying proportional responses and the identity of the limiting resources, and the role that cell size, shape, and composition may play in this process remain unknown.

Library design and construction.
Promoters for this study were chosen to cover a wide variety of cellular functions and processes and to span various cellular compartments (Supp. material 1.1). Promoter sequences were defined as the genomic region located between the translation start site (TrSS) and the end of the upstream neighboring gene. Sequences larger than 1kbp were truncated to 1kbp to facilitate cloning. All strains were constructed as previously described, based on genomic integration of promoter sequences into a common master strain, upstream of a YFP reporter protein (Zeevi et al, 2011). A second mCherry reporter, driven by the constitutive TEF2 promoter was integrated to the same genomic location to control for the reliability of the strain construction process and experimental variability. Briefly, we used a master strain, based on Y8205, containing a construct of ADH1 terminator -mCherry -TEF2 promoter -Venus -ADH1 terminator -Nat1 on chromosome 15. Desired promoters were amplified by PCR from genomic DNA of BY4741 yeast strain, linked to a URA3 selection marker and integrated into the genome by homologous recombination. Final strains were validated by sequencing, growth curves, and mCherry expression levels, and abnormal strains were removed. In total, 867 strains were designed, and 859 were successfully constructed and measured (for a full list of promoters, primers, and sequences, see Table S1).

Promoter activity measurements.
Cells were inoculated from frozen stocks into synthetic complete dextrose (SCD) (150μl, 96 well plate) and grown at 30 o C for 48 hours, reaching complete saturation. Cells were then diluted 1:36 in fresh medium to a total volume of 180μl and were grown at 30 o C for at least 16 hours in 96-well plates while being measured. Measurements were carried out every 20 minutes using a robotic system (Tecan Freedom EVO) with a plate reader (Tecan Infinite F500). Each measurement included optical density (OD 600 ), YFP fluorescence, and RFP fluorescence. Measurements of each plate at every growth conditions were repeated 1-6 times.

Growth conditions.
The growth media in which all strains were measured are outlined in Table S2.

Computing promoter activity levels.
Basic analysis of measured OD, YFP, and RFP was done as previously described and included removal of strains with abnormal growth curves and RFP expression, subtraction of background levels of OD and auto-fluorescence, and smoothing of outlier measurements for each strain (Zeevi et al, 2011). In order to calculate a promoter activity level that is comparable between conditions, for each measured plate we defined the time window of maximal growth. We identified the 20-minute interval of maximal growth ⁄ )) and extracted its doubling time ) ⁄ . We then  )the promoter activity, is the fluorescence production rate per unit biomass at time t.
Since YFP is stable and long-lived, we can neglect its degradation and attribute its rate of accumulation to production alone: ) ) ∫ ) ) . We now assume a phase of exponential steady state growth between and as described above. During this time ) is constant and thus ) ) ∫ ) .
We get that ) ) ) ∫ ) ⁄ (Fig 1). To obtain promoter activities per OD per doubling time, values were multiplied by the average doubling time.
To compare promoter activity levels from multiple experiments, we had to account for technical variation between experiments. To this end, we performed a calibration experiment in which we measured the same strain (RPL3) in 12 replicates in all growth conditions. We repeated this experiment twice with freshly prepared media and randomized locations of conditions within the measurement plates. Average promoter activity levels for RPL3 for both experiments deviated by less than 5%. Each measurement plate in the library included four technical replicates of RPL3. We then scaled the promoter activity levels for all strains in each plate such that the median promoter activity of RPL3 for the plate was equal to that calculated in the calibration experiment. For each strain in every condition, we took the final promoter activity levels to be the average of the strain across all measurement plates. If this average was below the detection level, we set the promoter activity to the detection level.

Estimating the detection level of our experimental system.
To assess the detection level of our system, we examined the distribution of promoter activity levels under all examined conditions for a strain containing an RFP gene but no YFP gene. For each condition, more than 30 biological replicates of the strain were measured and fitted to a normal distribution (Fig. S1), and the 95th percentile of the distribution was taken to be the detection level. All analyses were restricted to strains which are above detection level in at least one condition.

Experimental Variability.
To assess the experimental error of our system, we examined differences between biological replicates of promoter strains whose activities were above the detection threshold. The relative error was estimated by the coefficient of variation (CV) of replicate measurements. We then grouped the replicate measurements (42% of the measurements) into 20 equally-spaced bins (in logarithmic scale) according to their mean promoter activity. For each such group of similarly active promoters, we computed their mean CV and smoothed this mean by averaging the values of four neighboring bins. We then used these mean CV values for every bin to estimate the CV of any promoter activity level, by linear interpolation. The CV values ranged from 0.36 for very low promoter activity levels to 0.05 for high promoter activity levels (Fig. S2).

Global scaling factors and error model.
For all conditions the global scaling factor represents the best robust fit to the data and was identified by two separate methods: A) For each pairwise comparison, we performed a robust linear fit (linear fit ignoring outliers) using Matlab's robustfit function.
B) Promoters were clustered as described below (section 2.10). Global scaling factors were then computed by finding the best linear fit to promoters of cluster

Functional annotation and enrichment analysis.
Sets of genes were assigned process, function, and cellular components according to the annotations from the Gene Ontology (GO) (Ashburner et al, 2000). The significant representation of GO terms in the set was evaluated by Gorilla GO Term Finder (Eden et al, 2009) with a p-value threshold of 10 -3 . For TF analysis we examined the distribution of known TF promoters (Badis et al, 2008;Zhu et al, 2009) across the different clusters.
For enrichment analysis, promoters were classified as previously described according to their properties as: OPN/DPN (Tirosh & Barkai, 2008), SAGA-dominated/TFIIDdominated (Huisinga & Pugh, 2004), divergent/unique (this study, based on (Saccharomyces Genome Database). Enrichment p-values were computed according to the HG distribution and corrected for multiple hypothesis testing using false discovery rate correction (Benjamini & Hochberg, 1995).

Clustering promoter activities.
To partition the promoters into clusters that preserve proportionality, we used K-means clustering with the cosine metric (defined by ) ) , where and are vectors of promoter activity levels in a given condition and is the origin).
The cosine metric ensures that two promoters whose activity levels across all conditions are equal up to some scaling factor, will have distance zero, and will thus reside in the same cluster. The clustering was repeated 100 times with different random starting points and the clustering that minimized the sum of distances from the centers was chosen. The number of clusters, , was determined as the largest for which the distance between any two centers is at least (Fig. S8), thereby ensuring a minimal separation between any two clusters. Promoters that had very low activity levels (mean activity across conditions < 1.5 X mean detection level across all conditions) were excluded from the clustering (127 promoters). Promoters that had missing values in part of the conditions (90 promoters) were clustered separately after the initial clustering.
Each such promoter was assigned to the closest center using the cosine metric reduced to known coordinates.
For generation of figure S10, this analysis was repeated excluding all ribosomal promoters.

Variance explained by clustering.
For each promoter , its vector of promoter activity levels across conditions, ), was projected to the center of the corresponding cluster. Denoting the difference between the vector and its projection by ), the variance explained by the clustering was calculated as -)) )).

Predicting promoter activity levels.
We used the following scheme to predict promoter activity levels under growth condition from measurements of several other conditions . First, the number of clusters for all promoters under the measured conditions was determined using above criterion. Then, the promoters were clustered by the k-means algorithm using the cosine metric. Denote the centers of the clusters by . Since the cosine metric is invariant to scaling, we can assume that the norm of each center equals one. Any promoter that was measured only under part of the conditions was assigned to the closest center,

Growth conditions.
All media for bacterial growth were based on a defined M9 minimal medium (42mM Na2HPO4, 22mM KH2PO4,8.5mM NaCl,18.5mM NH4Cl,2mM MgSO4,0.1mM CaCl,16µM Thiamine)+50µg/ml kanamycin. Specific growth conditions and the respective growth rates in each condition are listed in Table S8.

Robotic assay for genome-wide promoter activity data.
The library of reporter strains, each bearing a low-copy plasmid with one of E .coli promoters controlling fast-folding GFP (Fig. S14A, (Cormack et al, 1996)) was previously described (Zaslaver et al, 2006). This library includes 1820 reporter strains which represent ~75% of E. coli promoters. Reporter strains were inoculated from frozen stocks into in high-brim 96-well plates containing M9 minimal medium supplemented with 11mM glucose, 0.05% casamino acids and 50µg/ml kanamycin. The 96-well plates were covered with breathable sealing films (Excel Scientific Inc.) and grown overnight in a shaker at 37°C. All steps from this point were carried out using a programmable robotic system (Freedom Evo, Tecan Inc.). Overnight cultures were first diluted 1:33 times into M9 medium followed by a second 1:15 dilution into 6 flat bottom microwell plate (nunc) containing one of the growth media (Table S8) in a final culture volume of 150µl. Wells were then covered with 100µl of mineral oil (Sigma) to prevent evaporation and transferred into an automated incubator. Bacteria were grown in the incubator with shaking (6 Hz) at 37°c for about 20 hr. Every 8 min the plate was transferred by the robotic arm into a multiwall fluorometer (Infinite F200, Tecan) that reads the OD (600 nm) and GFP (excitation 480 (20), emission 515 (10)). After 5 hours of incubation NaCl or casamino were added to the appropriate plates by automated pipetting below the oil layer. Each plate contained several control strains: Promoterless strain used for the subtraction of auto-fluorescence background (Zaslaver et al, 2006); Sigma70 synthetic promoter bearing the consensus sigma70 binding site (Kaplan et al, 2008); GadB promoter which was used a representative of a sigmaS regulated promoter (Keseler et al, 2011).

Computing promoter activity levels, detection level, experimental variability and error model.
Promoter activity was calculated by the rate of GFP production per OD unit, as described above for yeast (supp. methods 2.4) for the 3-hour window around mid-exponential growth (Fig. S14). For conditions in which a compound was added to the media, promoter activity was calculated for the window of time after its addition.
Background fluorescence was measured using a promoter-less control strain in each plate. Promoter activities lower than 3 STDs above the mean background promoter activity were set to zero. We find that about 300-500 promoters are active above background in each condition, and 100 promoters are active in all conditions. In total, 969 promoters were active above background in at least one condition. Experimental variability was assessed as described above, using three replicate measurements in M9 glucose (Fig. S15) and error model was calculated as for S. cerevisiae.
Identification of representative promoters for predictions was done iteratively. At each iteration, we calculated the best linear sum of the representative promoter, which predicted the experimental data, and added an additional representative promoter, which contribute the most to predict the experimental data.

Supplementary figure 1. Estimating the background level of promoter activity.
Shown is the distribution of YFP promoter activities across all examined growth conditions, for a strain that contains a mCherry reporter gene but no YFP reporter gene.
For each condition (upper right text), shown is a histogram (blue bars) of the fraction of over 30 biological replicate measurements as well as a fit of the histogram to a normal distribution (red line). The 95th percentile of the distribution (red arrow) was set as the level above which YFP-containing strains are considered to be above background.
Supplementary figure 2. Estimating the experimental variability of our system. For each promoter which was measured in replicates (42% of the promoters), shown is the coefficient of variation (CV, standard-deviation divided by the mean, y-axis) of its activity against its mean activity, where the activity level of each promoter was derived from 2-6 replicates. The measurements were grouped by their promoter activity into 20 equallyspaced bins (in logarithmic scale, green X marks). Also shown (green curve) is a linear interpolation of the CV of each promoter activity using the CVs of four neighboring bins.
We used this linear interpolation as an estimate for the relative error of promoters that were not measured in replicates. fluorescence reporters (this study). (D) For the datasets from A-C and an additional mass spectrometry dataset (Costenoble et al, 2011) shown is the coefficient of variation (CV, standard-deviation divided by the mean, y-axis) against mean expression level (xaxis). For each dataset, the measurements were grouped by their mean levels into 20 equally-spaced bins (in logarithmic scale) and mean CV per bin was interpolated using the CVs of four neighboring bins. For all datasets mean expression levels were normalized to be from 0 to 1.    (Holstege et al, 1998;Shalem et al, 2008;Lipson et al, 2009), RNA sequencing Nagalakshmi et al, 2008;Yassour et al, 2009), immune-tagged proteins (Ghaemmaghami et al, 2003), fluorescently-tagged proteins (Stewart-Ornstein et al, 2012), mass spectrometry (de Godoy et al, 2008) and a curated dataset of protein abundances integrated from 5 different datasets . The correlation coefficients between promoter activity and these datasets are similar to the correlation between these datasets. As expected, the mRNA datasets better correlate between themselves than they do with the protein datasets. Promoter activity correlates better with mRNA levels than with protein levels.
Supplementary figure 6. Most promoters preserve their relative activity levels between every pair of growth conditions. Shown is a comparison of promoter activities between every pair of tested conditions. The slope of the robust linear fit (magenta line) represents the scaling factor between the two conditions. The percent of promoters that deviate less than 3 standard deviations from the global trend is indicated for each pair of growth conditions.  To select the number of clusters, we clustered the activities of all promoters under all measured conditions into 2 to 12 clusters using K-means with the cosine distance function (Methods). Shown is the cosine distance (1cosine of angle) between the two closest centers (y-axis), as a function of the number of clusters (x-axis). Since clustering the data into 7 clusters resulted in two clusters that are nearly indistinguishable (the cosine distance between their centers is ~ 0.01), we used 6 clusters in our subsequent analyses. Supplementary figure 11. The intra-cluster variability is smaller than the intercluster variability in genome-wide mRNA and protein measurements. For over 100 genome-wide mRNA or protein measurements in yeast, derived from 7 different studies, shown is the standard deviation of reported log-ratio expression levels of two sets of genes: (1) genes whose promoter activities were clustered into the first global response cluster in our study (right column, cyan); and (2) genes whose promoter activities were clustered into any of the other five clusters (left column, blue). In all cases we found the intra-cluster variability to be lower than the inter-cluster variability, as expected if proportional scaling is largely preserved for the mRNA and protein levels. No correlation with growth rate RP gene1, our data RP gene2, our data RP gene3, our data RP gene1, Brauer et al.,2008RP gene2, Brauer et al.,2008RP gene3, Brauer et al.,2008 Correlation between expression and growth rate Distribution of slopes B A Supplementary figure 13. Correlation between expression/promoter activity and growth rate (A) For 110 ribosomal genes we found the slope between their expression (Brauer et al, 2008) or promoter activity and growth rate. Plotted are 3 illustrative examples from each dataset. Slopes for expression data (blue lines) were taken from (Brauer et al, 2008) and slopes for promoter activities (red lines) were obtained by fitting a linear function (using matlab's polyfit) for each gene between promoter activity values (Table S3) and growth rates (Table S4). Ribosomal proteins were chosen as their expression was shown to correlate with growth rate in many different studies (Regenberg et al, 2006;Castrillo et al, 2007;Brauer et al, 2008;Fazio & Jewett, 2008;Neidhardt, 1999;Bremer & Dennis, 1987;Zaslaver et al, 2009;Klumpp et al, 2009;Levy & Barkai, 2009;Scott et al, 2010;Pedersen et al, 1978)

. Shown are 3 representative examples from both datasets. (B)
Shown is the distribution of slopes between expression (blue bars) or promoter activity (red bars) and growth rate for 110 ribosomal genes. We note that both we and (Brauer et al, 2008) observe a positive correlation of ribosomal gene expression with growth rate, as indicated by the positive slopes for most ribosomal genes using both methods.
However, our data additionally suggests that the magnitude of this correlation (i.e., the slopes) is highly similar for all ribosomal genes. promoter activities. For 3 measured replicates in glucose, shown is the coefficient of variation (CV, standard-deviation divided by the mean, y-axis) of its activity against its mean activity. The measurements were grouped by their promoter activity into 13 equally-spaced bins (in logarithmic scale, green X marks). Also shown (green curve) is a linear interpolation of the CV of each promoter activity using the CVs of four neighboring bins. We used this linear interpolation as an estimate for the relative error of promoters that were not measured in replicates. coli are correlated to growth rate. For each growth condition, shown is the growth rate (x-axis) and (A) global scaling factor or (B) total promoter activity (y-axis). The scaling factor for M9+glucose was arbitrarily set to 1. Using all promoters in a set of conditions, we found weights for each promoter such that its expression is given as accurately as possible by the weighted sum of the representative source genes. We numerically searched for the k promoter that best explain the rest. (A) Cumulative distribution of prediction error for predictions using N=1 to N=5 representative promoters. (B) Mean relative error of prediction (y-axis) using various number of promoters (x-axis). The promoters used for prediction are: rpsP, rrnH, asd, trpL, ompC.