Promoter Architecture Differences among Alphaproteobacteria and Other Bacterial Taxa

ABSTRACT Much of our knowledge of bacterial transcription initiation has been derived from studying the promoters of Escherichia coli and Bacillus subtilis. Given the expansive diversity across the bacterial phylogeny, it is unclear how much of this knowledge can be applied to other organisms. Here, we report on bioinformatic analyses of promoter sequences of the primary σ factor (σ70) by leveraging publicly available transcription start site (TSS) sequencing data sets for nine bacterial species spanning five phyla. This analysis identifies previously unreported differences in the −35 and −10 elements of σ70-dependent promoters in several groups of bacteria. We found that Actinobacteria and Betaproteobacteria σ70-dependent promoters lack the TTG triad in their −35 element, which is predicted to be conserved across the bacterial phyla. In addition, the majority of the Alphaproteobacteria σ70-dependent promoters analyzed lacked the thymine at position −7 that is highly conserved in other phyla. Bioinformatic examination of the Alphaproteobacteria σ70-dependent promoters identifies a significant overrepresentation of essential genes and ones encoding proteins with common cellular functions downstream of promoters containing an A, C, or G at position −7. We propose that transcription of many σ70-dependent promoters in Alphaproteobacteria depends on the transcription factor CarD, which is an essential protein in several members of this phylum. Our analysis expands the knowledge of promoter architecture across the bacterial phylogeny and provides new information that can be used to engineer bacteria for use in medical, environmental, agricultural, and biotechnological processes. IMPORTANCE Transcription of DNA to RNA by RNA polymerase is essential for cells to grow, develop, and respond to stress. Understanding the process and control of transcription is important for health, disease, the environment, and biotechnology. Decades of research on a few bacteria have identified promoter DNA sequences that are recognized by the σ subunit of RNA polymerase. We used bioinformatic analyses to reveal previously unreported differences in promoter DNA sequences across the bacterial phylogeny. We found that many Actinobacteria and Betaproteobacteria promoters lack a sequence in their −35 DNA recognition element that was previously assumed to be conserved and that Alphaproteobacteria lack a thymine residue at position −7, also previously assumed to be conserved. Our work reports important new information about bacterial transcription, illustrates the benefits of studying bacteria across the phylogenetic tree, and proposes new lines of future investigation.

for the initiation sites for bacterial transcription units at the nucleotide level (37,38). We gathered publicly available genome-scale TSS-seq data sets from a variety of bacterial species in order to predict bacterial promoter sequences and to ask if they were similar across the phylogeny. The published bacterial TSS-seq data sets that we analyzed included several thousand experimentally determined TSSs from Actinobacteria (40,41), Alphaproteobacteria (37,38,43), Betaproteobacteria (39), Firmicutes (44), and Gammaproteobacteria (42) ( Table 1).
Since s 70 homologs are needed for transcription of most genes (25,27), the majority of the TSSs in these genome-scale data sets are predicted to be derived from s 70 -dependent promoters. Thus, we hypothesized that including all identified TSSs in each bacterium in our analysis would allow the discovery of overrepresented DNA sequences that corresponded to the s 70 promoter sequence. Further, the large number of TSSs analyzed predicts that any activity of alternative s factors on genome-wide transcription would be minimal. Additionally, most of the TSS data sets were generated under environmental conditions where the activity of many alternative s factors was predicted to be low, thereby limiting the influence of other promoter motifs on our analysis (Table 1). Using MEME as a motif discovery tool (45,46), we were able to identify upstream motifs with DNA sequence similarity to 235 and 210 promoter elements that would be predicted to be recognized by the respective housekeeping s factor based on what is known about s 70 -promoter interactions in other well-studied bacterial species ( Fig. 1; Table S1). Upon examining the overrepresented sequences, we found that motifs identified by MEME are generally conserved across this diverse set of bacterial species and have significant sequence identity to 235 and 210 promoter elements that are known to represent binding sites for s 70 -containing RNAP in well-studied organisms.
However, the motifs identified by MEME also make predictions about some basespecific differences in promoter elements that are recognized by s 70 RNAP holoenzyme across the bacterial phylogeny. For example, TSS-seq data sets show that over 60% of the 210 elements of the Alphaproteobacteria R. sphaeroides, Zymomonas mobilis (37,96), and Caulobacter crescentus (37,43,96) lack the thymine at position 27 that is found in over 95% of the E. coli and Bacillus subtilis s 70 -dependent promoters (Fig. 1). In an extreme case, the MEME motif finder indicates that .80% of the s 70 -dependent promoters in Novosphingobium aromaticivorans, another member of the Alphaproteobacteria, lack a conserved thymine at position 27 (Fig. 1). The low frequency of a thymine at position 27 in Alphaproteobacteria s 70 -dependent promoters contrasts with the prediction made by the MEME motif finders that .90% of the analogous transcription units contain a thymine at this position in Actinobacteria, Betaproteobacteria, Firmicutes, and Gammaproteobacteria for which genome-scale TSS data sets are publicly available (Fig. 1).
The MEME motif finder also identified the overrepresented TTG DNA sequences within potential 235 elements that were well conserved across most of the same bacterial species ( Fig. 1; Table S1). However, this DNA sequence was not found to be overrepresented in Betaproteobacteria and Actinobacteria (Fig. 1). To analyze additional features of these putative s 70 -dependent promoters across the bacterial phylogeny, we also used the predictions of the MEME motif finder to determine the distance between the 210 and 235 elements and calculate the number of bases between the downstream end of the 210 element and the experimentally determined TSS. This analysis resulted in the same most frequent distance between 235 and 210 elements of the s 70 -dependent promoters (17 bp) and between the TSS and the 210 element (6 bp) across the species for which genome-scale TSS data are available, suggesting that these features of promoters are conserved across the bacterial phylogeny (Table S1).
To test if the prediction of sequences of these promoter elements was influenced by the motif-finding algorithm, we analyzed the same genome-wide TSS-seq data sets using Delila-PY (47), a Python-based pipeline interface with the Delila software suite (48) that uses a different motif-finding method than MEME. The use of Delila-PY to identify DNA sequence motifs upstream of the experimentally mapped TSSs predicted that .60% of the 210 elements of Alphaproteobacteria s 70 -dependent promoters lacked a T at position 27 in all four species examined ( Fig. 2; Table S2). This analysis also revealed that, while there are some species-specific differences in the base distribution at position 27, when averaged across all Alphaproteobacteria, there is a roughly equal percentage of each base at position 27 (Fig. 3A). In agreement with MEME, Delila-PY predicted that .80% of the non-Alphaproteobacteria 210 elements for s 70 -dependent promoters contained a thymine at position 27 ( Fig. 2; Table S2). Further, Delila-PY predicted a lack of conservation for the 235 element for Mycobacterium smegmatis, Streptomyces coelicolor, and Burkholderia cenocepacia (Fig. 2). In total, there was a 60% to 95% agreement between the predicted s 70 235 and 210 promoter elements identified by both the MEME motif finder and Delila-PY across the data sets we analyzed. The fact that similar predictions about the DNA sequences of s 70 -dependent promoters are made when using either Promoter Sequence Differences across Bacterial Phyla MEME or Delila-PY suggest that the observed differences were not due to a specific algorithm but are likely to be biologically relevant.
In sum, this analysis illustrates that new insights can be obtained from a comparative analysis of genome-scale TSS-seq data across bacterial species. Specifically, it   makes a prediction that the 235 and 210 elements of s 70 -dependent promoters vary significantly across the bacterial phylogeny. In addition, since the MEME motif finder and Delila-PY identified similar overrepresented sequences proximal to the TSS, we conclude that the nucleotide present at position 27 relative to the TSS of s 70 -dependent promoters is a previously unrecognized feature across many Alphaproteobacteria. Below, we focus on predictions about the nature and consequences of the difference in the base at position 27 of the 210 element for predicted s 70 -dependent promoters within Alphaproteobacteria. Functional groups associated with the products of genes that are downstream of 27T r 70 -dependent promoters among Alphaproteobacteria. Because a thymine at position 27 (27T) relative to the TSS of s 70 -dependent promoters is present in only a minority of the predicted promoters in Alphaproteobacteria ( Fig. 1 and 2), we investigated if this small set of transcription units is enriched for gene products that have specific functions across these bacteria. When we analyzed the genes downstream of 27T s 70 -dependent promoters in Alphaproteobacteria, we found that ,20% of these encoded homologs of proteins contained in the bacterial Database of Essential Genes (49, 50) ( Fig. 3B; Table S2). There was also a roughly equal distribution of all 4 bases at position 27 within promoters upstream of s 70 -dependent transcription units that encode homologs of these essential proteins (Fig. 3B). We also analyzed the predicted 27T s 70 -dependent promoters upstream of R. sphaeroides and C. crescentus genes that have been identified as essential in transposon insertion sequencing (Tn-seq) mutant libraries (51,52) and genes identified as essential in Z. mobilis via microarrays (53). This analysis predicts that ,15% of the genes containing 27T s 70 -dependent promoters in these three species are essential and found that there was no statistical enrichment in these gene products (hypergeometric test, P # 0.05) ( Fig. 3C; Table S2). Indeed, there is no significant enrichment for any base at position 27 in the s 70 -dependent promoters that are found upstream of these essential genes (Fig. 3C). Taken together, these data suggest that the genomes of Alphaproteobacteria have no significant enrichment for known essential genes downstream of 27T s 70 -dependent promoters.

Percent of Promoters
We also tested for functional enrichment of the products transcribed from genes downstream of predicted Alphaproteobacteria 27T s 70 -dependent promoters. To do this, we analyzed the predicted cellular role of gene products transcribed from Alphaproteobacteria transcription units with a 27T s 70 -dependent promoter with functional groups compiled from the KEGG Brite ontology, the KEGG pathway lists, and GO terms from each organism (54, 55) using a hypergeometric test (an adjusted P value of #0.1 indicates significant enrichment) ( Fig. 4; Table 2; Table S3). This analysis revealed that the annotated functions of gene products downstream of predicted 27T s 70 -dependent promoters were highly variable among the Alphaproteobacteria species for which genome-scale TSS data were available. Indeed, the only functions enriched in more than one Alphaproteobacteria were annotated as having roles in cell envelope function and protein degradation ( Fig. 4; Table 2; Table S3). Moreover, this analysis revealed that Z. mobilis had no statistically significant enrichment for any functional groups for products encoded by genes downstream of predicted 27T s 70 -dependent promoters, perhaps reflecting the low number of genes that fall in this category in this bacterium.
In the three Alphaproteobacteria for which genome-wide TSS data are available, we found evidence for functional enrichments unique to the individual organisms. For example, in N. aromaticivorans, the largest number of enriched functional groups of proteins encoded by genes downstream of 27T s 70 -dependent promoters were predicted to function in translation, transport, transcription, protein folding, DNA organization, DNA repair, and DNA replication ( Fig. 4; Table 2; Table S3), as well as some proteins being predicted to allow N. aromaticivorans to metabolize aromatic compounds (56)(57)(58)(59)(60). In R. sphaeroides, the products encoded by genes downstream of predicted 27T s 70 -dependent promoters were enriched for phosphor-group transfer, including sensor histidine kinases of two-component regulatory systems ( Fig. 4; Table 2; Table S3). In C. crescentus, the products of genes downstream of predicted 27T s 70 -Promoter Sequence Differences across Bacterial Phyla dependent promoters were enriched in those involved in the cell cycle changes of this alphaproteobacterium ( Fig. 4; Table 2; Table S3) (61). Because of this predicted enrichment, we asked if the genes in C. crescentus downstream of a predicted 27T s 70dependent promoter exhibited any cell cycle changes in transcription in published RNA-seq data set (62). This analysis revealed an increase of the average abundance of transcripts derived from the cell cycle genes downstream of predicted 27T s 70 -dependent promoters over the cell cycle (Fig. 5A). We did not find a similar pattern in cell cycle-specific transcript abundance when we analyzed the same number of randomized genes downstream of predicted 27T s 70 -dependent promoters for genes that encode proteins with different functions (Fig. 5A). Taken together, these results indicate that genes downstream of predicted 27T s 70 -dependent promoters do not encode common cellular functions across the Alphaproteobacteria. Instead, they predict that 27T s 70 -dependent promoters are found upstream of transcription units that encode proteins responsible for diverse and possibly lifestyle-specific sets of cellular functions. Alphaproteobacteria proteins transcribed from genes downstream of r 70dependent 27A/C/G promoters perform essential and core cellular functions. We also examined the predicted functions of Alphaproteobacteria proteins transcribed from genes downstream of predicted s 70 -dependent promoters containing an adenine, cytosine, or guanine at position 27 (27A/C/G promoters). In this case, we found that there was a high percentage of homologous proteins in the bacterial Database of Essential Genes (DEG) (49,50) that were encoded by transcription units that contained predicted 27A/C/G s 70 -dependent promoters ( Fig. 3B; Table S2) but no significant difference in distribution of bases at this position within these promoters. Further, we found statistically significant enrichment of transposon-identified essential genes located downstream of predicted 27A/C/G s 70 -dependent promoters (51-53) ( Fig. 3C;   (Fig. 3C). These analyses suggest that Alphaproteobacteria have a higher percentage of 27A/C/G s 70 -dependent promoters upstream of genes encoding known essential functions than those containing a thymine at this position and that an A, C, and G at position 27 are roughly equally distributed in all the Alphaproteobacteria included in our analysis (Fig. 3). We also tested for functional enrichments in the proteins transcribed from genes that are downstream of predicted 27A/C/G s 70 -dependent promoters. This analysis identified several functional groups enriched in multiple Alphaproteobacteria species for products encoded by genes that are downstream of predicted 27A/C/G s 70 -dependent promoters ( Fig. 4; Table 3; Table S3). The enriched functional groups shared among all 4 Alphaproteobacteria species were gene products involved in translation, central carbon metabolism, and the biosynthesis of secondary metabolites ( Fig. 4; Table 3; Table S3). The biosynthesis of purines (three species), amino acid biosynthesis gene products (three species), and transporters (two species) was enriched in the products of genes downstream of predicted 27A/C/G s 70 -dependent promoters in a subset of the Alphaproteobacteria for which genome-wide TSS data sets were available ( Fig. 4; Table 3; Table S3).
In addition, this analysis showed enrichment of several other groups of gene products that are transcribed from predicted 27A/C/G s 70 -dependent promoters in single bacterial species ( Fig. 4; Table 3; Table S3). One example of this is the enrichment of genes whose products are involved in photosynthesis in R. sphaeroides, the only phototrophic alphaproteobacterium for which genome-wide TSS data sets are available ( Fig. 4; Table 3; Table S3). Consistent with the bioinformatically predicted function of the 27A/C/G s 70dependent promoters that are upstream of genes encoding proteins involved in photosynthesis, there is a significant reduction in abundance of transcripts encoding proteins involved in photosynthesis and only a slight increase in abundance of those encoding translation functions after photosynthetic cells are shifted to nonphotosynthetic conditions (Fig. 5B) (63). We also found that products encoded by genes transcribed from predicted 27A/C/G s 70 -dependent promoters that function in lipid biosynthesis and transcription were among those enriched only in R. sphaeroides. The genes transcribed from predicted 27A/C/G s 70 -dependent promoters whose products are involved in lipid biosynthesis may play a role in forming the membrane invaginations that house the photosynthetic apparatus of this organism (64), while the genes encoding alternative s factors in this group (rpoE, rpoH1, and rpoH2) may play a role in the R. sphaeroides response to singlet oxygen and heat or envelope stress ( Fig. 4; Table 3; Table S3) (65)(66)(67)(68)(69)(70).
In another example, predicted 27A/C/G s 70 -dependent promoters in N. aromaticivorans were overrepresented upstream of transcription units that encode iron-sulfur proteins, enzymes in cell wall/cell membrane biosynthesis, DNA repair, protein degradation, and protein folding ( Fig. 4; Table 3; Table S3). Phenolic compounds metabolized by N. aromaticivorans are known to damage bacterial cell membranes and other macromolecules, suggesting that this gene is associated with a lifestyle of this alphaproteobacterium (71,72). Further, several of the iron-sulfur proteins transcribed from genes downstream of predicted 27A/C/G s 70 -dependent promoters function in the tricarboxylic acid (TCA) cycle, which assimilates the products of aromatic metabolism into N. aromaticivorans cellular biomass (58,73). Below, we discuss the importance of finding that a majority of Alphaproteobacteria genes that are transcribed from predicted 27A/C/G s 70 -dependent promoters include those involved in many cellular functions.

DISCUSSION
The initiation of transcription requires the recognition and binding of RNAP to specific promoter DNA sequences, an event that requires a s factor, and can depend on other proteins and small molecule ligands (3,4). A variety of studies have helped predict the promoter sequence in some well-studied bacterial species, but access to genome-scale maps of TSSs at the nucleotide level provides an opportunity to catalog and compare promoter sequences across the bacterial phylogeny. Here, we predicted promoter sequences  (45)(46)(47), we predicted differences in the 235 (Actinobacteria and Betaproteobacteria) and 210 (Alphaproteobacteria) promoter elements that are recognized by the housekeeping s factor (s 70 ). Below, we discuss the biochemical and functional consequences of the differences in predicted s 70 -dependent promoters across these taxa.
Features of r 70 -dependent promoter recognition that are conserved across phyla. The 210 and 235 elements of bacterial promoters make specific contacts with separate regions of s factors. From mechanistic studies, amino acids in s 70 region 2.4 make specific contacts with the 210 element in cognate promoters (27,74). Comparison of the sequence of s 70 region 2 in the phyla we examined shows a high level of amino acid conservation, including Q437, T440, and R441 (using residue numbers of E. coli s 70 ) (Fig. 6A), which recognize the 210 region of the promoter (10). These residues are also conserved in the Alphaproteobacteria, where the sequences of the 210 elements in the majority of the predicted s 70 promoters lack a thymine at position 27 that is highly conserved in many other promoters that are recognized by this s factor. Similarly, the 235 promoter elements interact specifically with s 70 region 4 (10). Comparison of s 70 region 4 among the bacteria studied here also shows a high degree of amino acid conservation, including residues that recognize the 235 sequence: R584, E585, and Q589 (using residues numbers of E. coli s 70 ) (Fig. 6B). Indeed, these s 70 region 4 amino acids are conserved in M. smegmatis, S. coelicolor, and B. cenocepacia, bacteria that lack a 235 TTG sequence that is conserved across the phyla ( Fig. 1 and 2). This suggests that few, if any, differences in s 70 exist to account for the observed differences in 235 elements of M. smegmatis, S. coelicolor, and B. cenocepacia and the 210 elements of Alphaproteobacteria. Alphaproteobacteria genes for essential and core metabolic functions contain 27A/C/G r 70 -dependent promoters. It was previously proposed that there are differences in the 210 promoter elements of Alphaproteobacteria, based on examination of a small number of transcription units (75,76) or organisms (37,96). In this study, we used published genome-scale TSS data sets to show that 27T is widely conserved across the bacterial phylogeny, except for the Alphaproteobacteria.
We also examined the biological implications of this variance in the sequence of the 210 element of s 70 -dependent promoters in Alphaproteobacteria. We found that essential genes were more likely to be transcribed from predicted 27A/C/G s 70 -dependent promoters in several Alphaproteobacteria. This finding was unexpected given the conservation of amino acid residues in s 70 region 2 that recognize the 210 element sequence, and it suggests that there are different requirements for transcription initiation in Alphaproteobacteria (see below). We also found that the functions of proteins transcribed from genes downstream of predicted 27A/C/G s 70 -dependent promoters were often shared among the Alphaproteobacteria. This so-called core regulon of genes that are downstream of 27A/C/G s 70 -dependent promoters includes proteins that function in translation, carbon metabolism, and biosynthesis of amino acids, purines, and secondary metabolites. These findings suggest that there has been a reprogramming of promoter architecture within the Alphaproteobacteria to place 27A/C/G s 70 -dependent promoters upstream of both essential genes and ones that encode numerous cellular functions. The presence of a core Alphaproteobacteria regulon that contains predicted 27A/C/G s 70 -dependent promoters makes it tempting to propose that this reprogramming occurred after the Alphaproteobacteria diverged. Analysis of TSS data sets from other members of the bacterial phylogeny is needed to test this hypothesis.
However, we also found Alphaproteobacteria 27T s 70 -dependent promoters upstream of genes that encode proteins with critical functions, including cell cycle genes in C. crescentus and cell wall/cell membrane biosynthesis genes in multiple members of this phylum. In addition, some Alphaproteobacteria showed enrichment for different bases at position 27 of s 70 -dependent promoters upstream of genes that were linked to individual lifestyles. For example, R. sphaeroides showed enrichment for predicted 27A/C/G s 70 -dependent promoters upstream of genes encoding proteins involved in photosynthesis, while C. crescentus showed enrichment for predicted 27T s 70 -dependent promoters upstream of genes encoding products involved in their cell cycle developmental program. In contrast, N. aromaticivorans contained a large number of enriched functional groups transcribed from genes downstream of both 27T and 27A/C/G s 70 -dependent promoters. If the latter finding reflects the acquisition by N. aromaticivorans of transcription units from a variety of bacteria which allow it to metabolize aromatic compounds, then future analysis of other aromatic-metabolizing Alphaproteobacteria (77) might shed light on core or extended regulons for this metabolic capacity. We could not identify functional enrichments for Z. mobilis proteins transcribed from genes downstream of either 27T promoters or 27A/C/G s 70 -dependent promoters, but this might reflect the number of genes annotated with unknown function and the lack of metabolic analyses of this alphaproteobacterium. Further studies of Alphaproteobacteria are needed to better understand the roles of proteins encoded by genes downstream of 27T and 27A/C/G s 70 -dependent promoters in their lifestyles.
The potential role of CarD at Alphaproteobacteria r 70 -dependent promoters. In organisms in which the majority of 210 elements contain a 27T, the presence of other bases at this position often reduces their activity, creates a promoter which requires different base patterns to compensate, or requires another protein to stimulate transcription (78)(79)(80)96). The transcription factor CarD may play such a stimulatory role in Alphaproteobacteria, since it increased transcription from several R. sphaeroides s 70 -dependent 27A/C/G promoters in vitro (96), possibly by stabilizing open complex formation by RNAP (28,29). CarD is essential in several Alphaproteobacteria, and a residue required for CarD function (W86) (29, 30) (red arrow in Fig. 6C) is conserved across the Alphaproteobacteria we studied, suggesting that this protein also activates transcription in these species (Fig. 6C). Together, these data suggest that CarD homologs enhance transcription by s 70 -containing RNAP, perhaps by compensating for the lack of a thymine at position 27 in the 210 element of Alphaproteobacteria.
Lateral gene transfer (LGT) is common within Alphaproteobacteria and with other phyla (81,82), and it is proposed to be a key component of proteobacterial evolution (83). This raises the possibility that LGT of CarD and transcription units containing 27A/C/G s 70 -dependent promoters played an important role in both the evolution of the Alphaproteobacteria and their branching from other taxa. Additional analysis of alphaproteobacterial species could lead to a better understanding of any evolutionary link between CarD and transcription initiation.
Potential impacts of promoter differences on biotechnology. The unique features of Alphaproteobacteria s 70 -dependent promoters described here and elsewhere (37,96) highlight the importance of analyzing multiple phyla to gain a more complete picture of transcription initiation. For example, the paradigms for s 70 -dependent sequence motifs developed in other bacteria might not accurately predict the presence of alphaproteobacterial promoters. The ability to control activity of alphaproteobacterial promoters has practical applications, since they have biochemical and metabolic pathways that would be beneficial to harness for various biotechnology applications. These include the conversion of lignin-derived and other aromatic compounds into valuable products by N. aromaticivorans (56,(58)(59)(60) and the ability of R. sphaeroides to harvest solar energy, fix atmospheric nitrogen and CO 2 , and produce hydrogen and other valuable chemicals (84)(85)(86)(87)(88). Future efforts to engineer these and other bacteria will be enhanced by a better understanding of promoter architecture and the role of proteins like CarD in transcription initiation.
Conclusion. By analyzing published genome-scale TSS data sets from species across the bacterial phylogeny, we found that Actinobacteria and Betaproteobacteria s 70 -dependent promoters lack conserved bases in their predicted 235 elements. We further found that the base at the 27 position of the 210 elements in over 60% of the s 70 -dependent promoters in several diverse Alphaproteobacteria differs from that found in many other phyla, and we propose that CarD plays a role in activating transcription from Alphaproteobacteria s 70 -dependent promoters. Our findings highlight the importance of studying numerous bacterial species to increase our understanding of transcription and engineer members of the Alphaproteobacteria as well as other bacterial phyla for processes of medical, agricultural, environmental, and biotechnological importance.

MATERIALS AND METHODS
Data sets. We used published TSS data from M. smegmatis and S. coelicolor (Actinobacteria) (40,41), C. crescentus, N. aromaticivorans, R. sphaeroides, and Z. mobilis (Alphaproteobacteria) (37,38,43), B. cenocepacia (Betaproteobacteria) (39), B. subtilis (Betaproteobacteria) (44), and E. coli (Gammaproteobacteria) (42). All identified TSSs were used in our analysis; there was no attempt to remove any TSSs potentially downstream of alternative s factors. Genome sequence files (e.g., GFF, GenBank, and genome FASTA) for each species were obtained from NCBI (89)  Promoter motif prediction. The MEME motif finder (version 5.1.0) was used to predict s 70 -dependent promoter elements upstream of each TSS (45,46). To identify the 210 promoter element, the nucleotide sequence from 219 to 25 relative to each identified TSS was analyzed using the "zoops" method (minimum width of 9 bp, maximum width of 10 bp, no palindromic motifs) in MEME. The motif with the most hits and lowest P value was chosen for each species. The predicted 210 elements all had overrepresented TA nucleotides at positions 212 and 211 relative to the TSS. The percentage of thymine at position 27 was calculated relative to the TA dinucleotide within each predicted 210 promoter element. If identical sequences were used from multiple closely spaced TSSs upstream of the same gene, the duplications were not analyzed to determine the position 27 base percentages. To identify DNA sequences of potential 235 elements in each data set, the positions 240 to 228 relative to each identified TSS with an identified 210 element were analyzed using MEME and the settings described above. Motifs for publication were constructed using WebLogo (90). Distances between the 235 and 210 elements and between the 210 element and position 11 were determined using custom Python scripts.
Delila-PY (47) was also used to predict s 70 -dependent promoter motifs (48,91). The DNA sequences in predicted 210 elements were identified by searching 215 to 25 relative to each TSS using default parameters for Delila-PY. The motif with the highest information content is, by default, reported by Delila-PY and was used in our analyses. The percentage of a thymine base (T) at position 27 was calculated as described above. To identify the DNA sequences in 235 elements for each species, the positions 237 to 227 relative to each identified TSS with an identified 210 element were analyzed using the default Delila-PY settings, and the predicted logos are reported in our analyses. We used the Delila-PY 210 promoter element predictions for subsequent analysis due to the larger number of predicted s 70dependent promoters generated by this tool.
Determining gene essentiality. Predicted s 70 -dependent promoters identified by Delila-PY were split into two categories based on the identity of the base at position 27: 27T and 27A/C/G. For each group, the genes downstream of each predicted s 70 promoter were searched against the Database of Essential Genes (DEG), version 15 (49). The amino acid sequence of each gene downstream of a predicted s 70 -dependent promoter was searched against the protein sequences of all bacterial essential genes in DEG 15 using BLAST (version 2.9.0) (92) with a E value threshold of 1 Â 10 210 . Genes with at least one match to a protein sequence within the DEG list in any bacterial species were considered essential for this analysis.
The genes within the 27T and the 27A/C/G groups were also compared to essential genes identified by transposon insertion sequencing (Tn-seq) for R. sphaeroides and C. crescentus (51, 52) or transposon insertion identification via microarrays for Z. mobilis (53). Statistical significance of the number of Promoter Sequence Differences across Bacterial Phyla essential genes within each group and species was determined using a hypergeometric test, with an adjusted P value of #0.05 being considered significant.
Functional enrichment. Functions of gene products downstream of predicted s 70 -dependent promoters in the 27T and 27A/C/G groups were obtained from the NCBI GenBank file for each species. These predicted functions were mapped to organized protein functional groups from the KEGG Brite ontology, KEGG Pathways, and GO terms (54,55). The comparison was performed using a hypergeometric test with an adjusted P value of #0.1 as a threshold for significant enrichment. Subgroups were combined into supergroups by binning similar cellular functions.
Protein sequence alignments. Protein sequences of RNA polymerase s 70 and CarD homologs were obtained from the GenBank files mentioned above. Clustal Omega (93)(94)(95) was used to align the sequences using default parameters.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.