Regulatory roles of 5′ UTR and ORF-internal RNAs detected by 3′ end mapping

Many bacterial genes are regulated by RNA elements in their 5′ untranslated regions (UTRs). However, the full complement of these elements is not known even in the model bacterium Escherichia coli. Using complementary RNA-sequencing approaches, we detected large numbers of 3′ ends in 5′ UTRs and open reading frames (ORFs), suggesting extensive regulation by premature transcription termination. We documented regulation for multiple transcripts, including spermidine induction involving Rho and translation of an upstream ORF for an mRNA encoding a spermidine efflux pump. In addition to discovering novel sites of regulation, we detected short, stable RNA fragments derived from 5′ UTRs and sequences internal to ORFs. Characterization of three of these transcripts, including an RNA internal to an essential cell division gene, revealed that they have independent functions as sRNA sponges. Thus, these data uncover an abundance of cis- and trans-acting RNA regulators in bacterial 5′ UTRs and internal to ORFs.


Introduction
The expression of many bacterial genes is controlled by elements in the 5´ untranslated regions (UTRs) of mRNAs. Changes in the secondary structures of these cis-acting RNA elements lead to altered expression of the associated gene(s) by modulating accessibility of ribosomes to sites of translation initiation, accessibility of RNases, or premature transcription termination. The RNA secondary structure changes can occur in response to temperature (RNA thermometers), translation of small upstream open reading frames (uORFs), or the binding of trans-acting factors such as metabolites (riboswitches), tRNAs, RNA-binding proteins such as CsrA, or small base-pairing RNAs (sRNAs) (reviewed in (Breaker, 2018;Kreuzer and Henkin, 2018;Loh et al., 2018;Orr et al., 2020;Romeo and Babitzke, 2019;Storz et al., 2011)).
Some of the regulatory events in 5´ UTRs are associated with premature transcription termination, which occurs by one of two mechanisms: intrinsic (Rho-independent) or Rhodependent (reviewed in (Roberts, 2019)). Intrinsic termination requires only RNA polymerase and an RNA hairpin followed by a U-rich tract in the nascent RNA. Rho-dependent termination requires the loading of the hexameric Rho protein complex onto nascent, untranslated RNA at Rho utilization (Rut) sites that are typically C-rich, G-poor, and unstructured sequences (reviewed in (Mitra et al., 2017)). Rho translocates along the RNA until the protein catches RNA polymerase and promotes transcription termination, typically between 100 and 200 nt downstream of the Rut site, leading to 3´ ends that are processed by 3´ to 5´ exonucleases (Dar and Sorek, 2018b;Wang et al., 2019).
Several studies have sought to identify sites of transcription termination across the E. coli genome by sequencing RNA 3´ ends or by mapping the distribution of transcribing RNA polymerase (Dar and Sorek, 2018b;Ju et al., 2019;Peters et al., 2012;Peters et al., 2009;Yan et al., 2018). The vast majority of identified termination sites are in 3´ UTRs. In some studies, termination was compared in cells grown with/without the Rho inhibitor, bicyclomycin (BCM), facilitating the identification of Rho termination sites (Dar and Sorek, 2018b;Ju et al., 2019;Peters et al., 2012). These data provided evidence for Rho termination of mRNAs in 3´ UTRs and spurious transcripts initiated within genes. While termination within 5´ UTRs was noted in some of these studies, extensive global characterization of premature termination has not been performed. Many of the uncharacterized 3´ ends in 5´ UTRs or open reading frame (ORF)internal regions are likely to be the result of regulatory events, given that riboswitches (Bastet et al., 2017;Hollands et al., 2012), attenuators (Ben-Zvi et al., 2019;Gall et al., 2016;Herrero del Valle et al., 2020;Konan and Yanofsky, 1997;Kriner and Groisman, 2015), RNA-binding proteins (Baniulyte et al., 2017;Figueroa-Bossi et al., 2014), and sRNAs (Bossi et al., 2012;Sedlyarova et al., 2016) have all been implicated in affecting premature Rho termination events.
In addition to being the product of a regulatory event, RNA fragments generated by premature termination or RNase cleavage themselves can have functions as regulatory sRNAs.
sRNAs commonly base pair with trans-encoded mRNAs, frequently with the assistance of the RNA chaperone protein Hfq, resulting in changes in the stability or translation of the target mRNA (reviewed in (Hör et al., 2020)). Most sRNAs characterized to date are transcribed independent of other genes or are processed from mRNA 3´ UTRs, though a few 5´ UTR-derived sRNAs have been reported (reviewed in ). RNA fragments entirely internal to coding sequences (Dar and Sorek, 2018a) also have been suggested to function as regulators, though this has not been tested. While sRNAs generally base pair with mRNA targets, a few small transcripts have been shown to have roles as competing endogenous RNAs (ceRNAs) also known as "sponges", which base pair primarily with sRNAs, targeting the sRNAs for degradation or blocking their interactions with mRNA targets (reviewed in (Denham, 2020;Figueroa-Bossi and Bossi, 2018;Grüll and Massé, 2019)).
To systematically identify new regulatory elements in E. coli, we globally mapped RNA 3´ ends, and specifically characterized those ends in 5´ UTRs and ORF-internal regions. We compared this 3´ end dataset with another dataset where BCM treatment was used to identify sites of Rho termination. Using these approaches, we detected hundreds of RNA 3´ ends within 5´ UTRs and internal to ORFs, likely generated by premature transcription termination or RNase processing. We propose the majority of these 3´ ends are the consequence of regulatory events, and we document regulation for multiple examples. For instance, we show 3´ ends are associated with the translation of uORFs, or result from the binding of some sRNAs to mRNA 5´ UTRs.
Furthermore, we demonstrate that RNA fragments generated by premature transcription termination and from within coding sequences function as independent sRNA regulators; one as part of an autoregulatory loop and another that connects cell division to the cell envelope stress response. These findings reveal extensive and diverse regulation through premature transcription termination and RNase processing of mRNAs, which can lead to the generation of RNA byproducts with independent functions.

Global mapping of 3´ ends in E. coli
Two independent cultures of wild-type E. coli MG1655 (WT) were grown to OD 600 ~0.4 in rich (LB) medium, OD 600 ~2.0 in LB, and OD 600 ~0.4 in minimal (M63) glucose medium. Total RNA was isolated and analyzed using modified RNAtag-seq (Shishkin et al., 2015) (total RNA-seq) and 3´ end (Term-seq) protocols (Dar et al., 2016) (Figure 1-figure supplement 1). The replicate total RNA-seq and Term-seq datasets were highly correlated (Figure 1-figure supplement 2A). Using the Term-seq data, we curated a list of dominant RNA 3´ ends (Supplementary file 1). The total numbers of identified 3´ ends were 1,175 and 882 for cells grown in LB to OD 600 ~0.4 or 2.0, respectively, and 1,053 for cells grown in M63 glucose to OD 600 ~0.4 (Figure 1-figure supplement 2B). The detected 3´ ends were further subclassified (see materials and methods for details) according to their locations relative to annotated genes ( Figure 1A). 3´ ends that could not be assigned to one unique category were counted in multiple categories. For cells grown to OD 600 ~0.4 in LB, this analysis revealed that, while 23% of 3´ ends mapped <50 bp downstream of an annotated gene (primary 3´ ends), hundreds (58%) were classified as orphan and internal 3´ ends, many mapping upstream of, and within ORFs ( Figure   1B).
We compared the detected 3´ ends to those identified by three other RNA-seq based studies (Dar and Sorek, 2018b;Ju et al., 2019;Yan et al., 2018). Note that the number of overlapping 3´ ends differs depending on the direction of the comparison (is non-commutative) because multiple 3´ ends from one dataset can be close to a single 3´ end from the other dataset.
While there was significant overlap between the 3´ ends identified in our work and those identified in each of the other studies (Figure 1-figure supplement 2C,E,F; hypergeometric test 2018b; Wang et al., 2019). Hence, we refer to the identified genomic locations as "Rho termination regions". As for 3´ ends mapped by Term-seq, we classified Rho termination regions by their position relative to annotated genes ( Figure 1C). 3´ ends antisense to annotated genes represented the largest category (51.5%), consistent with significant Rho termination of antisense RNAs (Peters et al., 2012). Only a small percentage (4.5%) of the Rho termination regions are <50 nt downstream of an annotated gene (primary), likely because Rho loading and termination typically requires >50 nt untranslated RNA (reviewed in (Mitra et al., 2017)). As for the 3´-end mapping by Term-seq, many Rho termination regions (44%) were classified as orphan and internal, frequently mapping upstream of, or within ORF sequences.
The C:G nucleotide usage was calculated for the putative Rho termination regions (Supplementary file 2), as well as for a control group of randomly selected genomic coordinates.
Relative to the control group, there was a higher local C:G ratio within 200 nt of the 3´ ends associated with Rho termination regions ( Figure 1D), consistent with enrichment for C-rich, Gpoor sequences attributed to Rut sites (reviewed in (Mitra et al., 2017)).
We also compared the Rho termination regions (Supplementary file 2) to the sites of Rho termination reported in three previous genome-wide studies (Dar and Sorek, 2018b;Ju et al., 2019;Peters et al., 2012) (Figure 1-figure supplement 2G). Again, there was significant overlap with each of the three previous studies (hypergeometric test p < 2.2e -16 in all cases), though many (43%) of the putative Rho termination regions we identified are >500 bp away from any of the previously identified Rho termination regions (Dar and Sorek, 2018b;Ju et al., 2019;Peters et al., 2012). The large sets of Rho termination regions that differ between the studies suggest that much remains to be learned about Rho-dependent termination. The comparison of our Term-seq LB 0.4 dataset (Supplementary file 1) with our DirectRNA-seq LB 0.5 dataset (Supplementary file 2) revealed that there was significant overlap (hypergeometric test p < 2.2e -16 ), with 34.6% of the Term-seq 3´ ends within 500 nt of a Rho termination region (Figure 1-figure supplement   2H). This suggests that these 3´ ends are associated with Rho-terminated transcripts.
Known regulatory events are associated with 3´ ends and Rho termination regions in 5´

UTRs
Many 3´ ends identified by Term-seq mapped upstream or internal to annotated mRNAs (Supplementary file 1). We specifically focused on these 3´ ends (see materials and methods), since we hypothesized they could be reflective of regulatory events. Our analysis identified 665 and 507 3´ ends for the LB 0.4 and LB 2.0 samples, respectively, and 580 3´ ends for the M63 0.4 sample (Supplementary file 3). Among the 3´ ends in 5´ UTRs, several correspond to sites of known cis-acting RNA regulation (annotated in Supplementary file 3). These include mRNAs that previously have been shown to be regulated by premature Rho-dependent termination, such as the riboswitch-regulated genes thiM (Bastet et al., 2017) (Bastet et al., 2017), and the translationally-repressed genes ilvX (Lawther and Hatfield, 1980) and topAI (Baniulyte and Wade, 2019). We also noticed that even some 5´ UTRs where regulation has only been reported to be at the level of translation, such as the RNA thermometer upstream rpoH (Morita et al., 1999a;Morita et al., 1999b) and at ribosomal protein operons (reviewed in (Zengel and Lindahl, 1994)), harbored defined 3´ ends in 5´ UTRs.
For the LB 0.4 Term-seq dataset, for which the growth conditions were most similar to those of the DirectRNA-seq dataset, we determined whether each 3´ end was associated with a detected Rho termination event. Significant Rho scores are listed in Supplementary file 3 (see materials and methods for details). It should be noted that some genes containing 3´ ends with significant Rho scores in Supplementary file 3 are absent from Supplementary file 2, because the genomic position with the highest Rho score in that region is located in a neighboring gene/region, and thus given a different gene designation.
We also used a previously described quantitative model to score putative intrinsic terminators, where a score >3.0 is predictive of intrinsic termination (Chen et al., 2013). Based on these analyses for the 3´ ends for the LB 0.4 dataset in Supplementary file 3, we predict that 20% have secondary structures and sequences consistent with intrinsic termination (intrinsic terminator score ≥ 3.0), 11% have detectable Rho termination, and the remaining 3´ ends are likely the result of RNA processing. However, the number of Rho termination sites could be an underestimate, because inhibition of Rho leads to extensive readthrough transcription from very strongly transcribed genes that can mask overlapping transcripts, and 3´ ends generated by Rho termination are typically unstable (Dar and Sorek, 2018b;Wang et al., 2019). Some 3´ends also may be generated by a combination of mechanisms. Nonetheless, overall, our data suggest that modulation of premature transcription termination in 5´ UTRs or ORFs is a widespread regulatory mechanism.

Novel sites of regulation are predicted by 3´ ends and Rho termination regions in 5´ UTRs
Several genes harboring Rho-dependent 3´ ends within the 5´ UTR or ORF have not been previously described as being regulated by Rho but are associated with characterized cis-acting RNA regulators. Examples are the sugE (gdx) and moaA genes, which are preceded by the guanidine II riboswitch (Sherlock et al., 2017) and molybdenum cofactor riboswitch (Regulski et al., 2008), respectively. A browser image of the RNA-seq data for the sugE locus in the LB 0.4 condition documents a predominant 3´ end 76 nt downstream the sugE transcription start site (TSS) (Figure 2A). This region was associated with significant readthrough in the +BCM DirectRNA-seq sample and a 3´ end Rho score of 3.7 (Supplementary file 3), strongly suggesting that the riboswitch impacts Rho-dependent premature termination, as is the case for other riboswitches.
While the involvement of 5´ UTRs in sugE and moaA regulation was known, most of the genes for which we found 3´ ends in 5´ UTRs or ORFs have not been reported to have RNAmediated regulation (Supplementary file 3). In some cases, such as the mdtJI mRNA, encoding a spermidine efflux pump, we observed a novel 3´ end that is clearly Rho-dependent ( Figure 2B) with a Rho score of 2.3. In other cases, such as ispU (uppS), encoding the undecaprenyl pyrophosphate synthase, a novel 3´ end was observed with no readthrough upon Rho inhibition ( Figure 2C; Rho score of 0.7). Rather, this 3´ end had an intrinsic terminator score (Chen et al., 2013) of 14.1 (Supplementary file 3), and we predicted a 6 bp stem-loop followed by eight U residues, consistent with intrinsic termination.
To further test whether genes associated with 3´ ends in 5´ UTRs or ORFs are indeed regulated by premature Rho-dependent transcription termination, we generated lacZ transcriptional reporter fusions using the entire 5´ UTR and ORF for 27 genes (Supplementary files 2 and 3) arbitrarily chosen for a range of calculated Rho scores and possible regulation. All constructs had the same constitutive promoter. β -galactosidase activity was assayed in the context of a WT E. coli background or a mutant strain with an R66S substitution in Rho (rhoR66S), which disrupts the primary RNA-binding site (Baniulyte et al., 2017;Bastet et al., 2017;Martinez et al., 1996). A fusion to thiM, which harbors a 5´ UTR Rho-dependent terminator (Bastet et al., 2017), exhibited significantly higher levels of with Rho termination in the 5´ UTR or ORF for sugE,cfa,cyaA,mdtJ,add,cspB,cspG,moaA,pyrG,yhaM,ydjL,yhiI,ytfL  These assays support the notion that the 3´ end observed in the ispU 5´ UTR ( Figure 2C) is generated by intrinsic termination. It is unclear why we did not detect evidence for Rho termination of mnmG, rpsJ, argT, srkA or trmL, despite these genes having significant Rho scores. Interestingly, fusions to the ompA, yebO, glpF and rimP 5´ UTR and ORF had decreased expression in the Rho mutant background (Figure 2-figure supplement 1D). This may be a consequence of additional levels of regulation or indirect effects of Rho inhibition. The ompA and glpF genes were not associated with Rho termination by DirectRNA-seq.
Transcriptional lacZ reporter gene fusions to only the 5´ UTR (i.e. without the ORF) were also generated for seven genes, to distinguish between Rho termination in the ORF and in the 5´ UTR. The effect of rhoR66S was eliminated for the shorter cyaA and eptB fusions ( Figure 2E), suggesting that Rho-dependent termination occurs within the coding sequence of these genes.
Regulation of Rho termination in 5´ UTRs is probably associated with the accessibility of Rut sequences, whereas Rho termination within coding sequences is probably associated with regulated translation initiation (reviewed in (Kriner et al., 2016)), with translational repression indirectly leading to Rho termination.
Finally, using RNA extracted from WT and rhoR66S strains grown to OD 600 ~0.4 and 2.0 in LB, northern analysis was performed with probes for the 5´ UTRs of sugE, cfa, cyaA, speA, mdtJ, eptB and ispU ( Figure 2F) as well as with probes for the coding sequences of cfa, cyaA, speA and mdtJ (Figure 2-figure supplement 1E). For all of the mRNAs, we detected 5´ fragments that likely correspond to the 3´ ends detected by Term-seq (indicated with an asterisk) (Supplementary file 3). For sugE and cfa, however, the dominant band on the northern blot was not necessarily the most dominant 3´ end sequenced using Term-seq. The enrichment for the 5´ fragments as well as longer transcripts likely corresponding to the full-length sugE, cfa, cyaA, speA and mdtJ mRNAs (indicated with an arrow) in the rhoR66S samples reflect transcriptional readthrough in the mutant background. This is consistent with Rho-dependent termination as seen for the lacZ fusions. The effect of rhoR66S on eptB was intermediate, while no effect was observed for ispU. For all of the Rho-terminated genes, we were surprised to observe the significant increase in the levels of short transcripts, as seen most strikingly for speA. This suggests that the detected 3´ ends can be generated by RNA processing from both Rhoterminated and full-length transcripts, with the increased abundance of longer transcripts in the rhoR66S samples leading to higher levels of the processed product. We also noted that the effects of the rho mutation varied under the different growth conditions tested, as is most apparent for cfa and cyaA. Collectively, these data validate premature termination in 5´ UTRs and, in several cases, suggest complex regulation.

Premature Rho termination of mdtJI is dependent on spermidine and translation of a uORF mdtU
The mdtJI mRNA, encoding a spermidine exporter, has a long 5´ UTR (TSS located 278 nt from the start codon) ( Figure 3A) and a 3´ end that mapped 6 nt into the ORF ( Figure 2B).
Additionally, the transcript is subject to premature Rho termination ( Figure 2D (Yohannes et al., 2005)). Northern analysis of these samples probed for the mdtJI mRNA revealed a ~280 nt transcript, consistent with the 3´ end detected by Term-seq (Supplementary file 3), that was susceptible to readthrough upon the addition of spermidine for cells grown at high pH ( Figure 3B). We therefore hypothesized that spermidine inhibits premature Rho termination of the mdtJI mRNA.
Closer inspection of the mdtJI 5´ UTR revealed a putative upstream small ORF (uORF) of 34 codons with the stop codon 106 nt upstream of the mdtJ AUG ( Figure 3A). Translation of this uORF was previously detected by ribosome profiling, with expression of the corresponding small protein (YdgV) documented by western blot analysis (Weaver et al., 2019). To independently verify translation of the uORF, the coding sequence was translationally fused to lacZ together with the upstream sequence and native mdtUJI promoter. Robust β -galactosidase activity was detected for cells carrying the fusion ( Figure 3C). By contrast, no β -galactosidase activity was observed for cells carrying an equivalent fusion with the uORF start codon mutated (ATG→ACG), supporting the conclusion that the uORF (ydgV), here renamed mdtU, is indeed translated.
To investigate the role of mdtU in spermidine-mediated regulation of mdtUJI, the mdtU start codon mutation (ATG→ACG) was introduced on the chromosome of a strain where a 3XFLAG tag was translationally fused to C-terminus of MdtJ. Northern and western blot analysis of strains encoding mdtUJ-3XFLAG-I showed mRNA and protein levels were strongly induced by spermidine, which was abolished in the strain with the mdtU start codon mutation ( Figure 3D). We suggest that translation of MdtU is critical for spermidine-mediated expression of MdtJ. Northern analysis was also carried out to determine if Rho termination in the mdtJI 5´ UTR impacts the induction by spermidine. In the rhoR66S strain, the spermidine-dependent increase in full-length mdtUJI mRNA levels was substantially reduced relative to WT cells ( Figure 3E). However, inhibition of Rho did not completely abolish the stimulatory effect of spermidine on mdtUJI mRNA levels, perhaps because growth in spermidine and high pH may increase transcription initiation of mdtUJI. Together, these data support the model that spermidine, Rho, and translation of the mdtU uORF affect the levels of MdtJI and hence spermidine transport, though the mechanisms deserve further study. A screen for uORFs upstream of mdtJ orthologs in other gammaproteobacterial species showed that MdtU, particularly the C-terminal region, is conserved in at least 17 genera (Figure 3-figure supplement 1). This suggests that mdtU-dependent regulation of mdtUJI expression is a conserved process that may depend on the sequence of the MdtU C-terminus.
To examine how sRNAs impacted the 5´ derived mRNA fragments of eptB, ompA and chiP, we used northern analysis to examine the consequences of deleting or overexpressing the cognate sRNA gene. Two oligonucleotide probes, one within the coding sequence (downstream of the 3´ end identified using Term-seq) and one within the 5´ fragment, were used to determine the relative levels of the full-length mRNAs ( Figure 4B) and 5´ fragments ( Figure 4C). As expected, given the known sRNA-mediated down regulation of eptB, ompA and chiP, the levels of the target mRNAs were elevated in the sRNA deletion background compared to the WT strain and decreased with sRNA overexpression ( Figure 4B). Some other RNA species were detected for eptB (~400 nt), ompA (<3,000 nt), and chiP (~200-500 nt), but these did not match the expected sizes for the mRNAs, and may be degradation and/or readthrough products. For the 5´ UTR region of eptB, there was a reciprocal effect of the ∆ mgrR background, with a strong decrease in the abundance of a ~140 nt band ( Figure 4C). Given that we observed a moderate effect of Rho mutation on eptB expression ( Figure 2D, F), we speculate that MgrR base pairing both promotes Rho termination and protects the resultant RNA from exonucleases. For the 5´ UTR region of ompA, which showed only modest de-repression in the ∆ micA strain, there was a decrease in the abundance of an ~120 nt fragment in the deletion strain ( Figure 4C), and the levels of this fragment increased upon MicA overexpression. These effects are consistent with MicA sRNA-directed cleavage of the ompA mRNA generating the fragment, or the sRNA base pairing protecting the 3´ end from exonucleolytic processing. The effect of the ∆ chiX background on the chiP 5´ fragment was strikingly different. Instead of decreasing, there was a large increase in the levels of an ~90 nt RNA ( Figure 4C). Given the strong signal detected for this transcript, we hypothesized that this RNA might have an independent role as a regulatory RNA.

ChiZ and IspZ sRNA sponges derive from 5´ UTRs
To test the hypothesis that 5´ UTR transcripts with defined bands have independent functions as sRNAs, we carried out further studies on the ~90 nt chiP 5´ UTR transcript ( Figure 4C), which we renamed ChiZ, and the 81 and 60 nt ispU 5´ UTR transcripts (likely expressed from two TSSs with a shared 3´ end, Figure 2C and F), denoted IspZ ( Figure 5A).
To obtain more information about the expression of these putative sRNAs, we performed northern analysis using the same RNA analyzed in the Term-seq experiment ( Figure 5B).
Distinct bands were detected for the two 5´ UTRs, consistent with the generation of stable RNAs with predicted stems protecting the ends (Figure 5-figure supplement 1A-B). While ChiZ was most abundant in cells grown to exponential phase in LB, IspZ was abundant in LB and M63 glucose medium in both exponential and stationary phase. Since many sRNA levels are negatively affected by the lack of the RNA chaperone Hfq (reviewed in (Hör et al., 2020)), we also conducted northern analysis using RNA extracted from WT or Δ hfq cells across growth in LB ( Figure 5B). Similar to other base-pairing sRNAs, and consistent with Hfq binding, ChiZ abundance was low in the ∆ hfq background. IspZ levels were only slightly affected by the absence of Hfq, even though this RNA is bound by Hfq (Melamed et al., 2020). Given that the binding of the sRNA ChiX to the mRNA chiP (containing ChiZ) increases Rho-mediated regulation of chiP (Bossi et al., 2012), we tested the role of Rho on ChiZ levels ( Figure 6A). The effects of the rhoR66S mutant were dependent on growth phase, with a decrease in ChiZ for cells grown in LB to OD 600 ~0.4, when ChiZ levels are highest. In contrast, IspZ levels were not affected by the rhoR66S mutant ( Figure 2F), and IspZ is likely subject to intrinsic termination as stated previously.
Given their association with Hfq, we tested the independent functions of ChiZ and IspZ as base-pairing sRNAs. Since the region of chiP encoding ChiZ has been documented to be a target for base pairing with the sRNA ChiX through compensatory mutations (Figueroa-Bossi et al., 2009;Overgaard et al., 2009), we postulated that ChiZ reciprocally regulates ChiX, sponging its base-pairing activity. To test this possibility, we assayed the effects of ChiZ overexpression.
As for chromosomally-encoded ChiZ ( Figure 5B), longer transcripts were observed for plasmidencoded ChiZ, likely due to readthrough, but only the levels of the 90 nt ChiZ band were strongly reduced in the ∆ hfq mutant (Figure 5-figure supplement 1C). Upon ChiZ overexpression in the WT background, we observed increased levels of the chiP mRNA, without a change in ChiX levels ( Figure 6B). The levels of the chiP mRNA overall were higher in the Δ hfq mutant background, but we no longer observed an increase upon ChiZ overexpression, likely due to the instabilities of ChiX and ChiZ. We also observed upregulation of a P BAD -chiP-lacZ chromosomal translational fusion (Schu et al., 2015) upon ChiZ overexpression in a WT but not a ∆ chiX background ( Figure 6C). These observations support a novel sRNA regulatory network in which an mRNA (chiP) that is the target of an sRNA (ChiX) produces an RNA fragment (ChiZ) that reciprocally sponges the sRNA (ChiX) ( Figure 6D).
We expected IspZ also might function as a base-pairing sRNA, and thus searched for potential targets identified by RIL-seq (Melamed et al., 2020;Melamed et al., 2016), an approach where RNAs in proximity on an RNA binding protein are identified by coimmunoprecipitation, ligation, and sequencing of the chimeras. The predominant target for IspZ in these datasets is the oxidative stress-induced sRNA OxyS (Altuvia et al., 1997). This observation led us to test whether IspZ might act as a sponge of OxyS. As for chromosomallyencoded IspZ ( Figure 5), and in contrast to ChiZ, little readthrough and no effect of  (Altuvia et al., 1997). High levels of IspZ were associated with slightly lower OxyS levels, in line with sponging activity ( Figure 6E). To obtain evidence for direct base pairing between IspZ and OxyS, IspZ was mutated on the overexpression plasmid (IspZ-M1), and compensatory mutations were introduced into the chromosomal copy of OxyS (OxyS-M1). Consistent with the predicted pairing ( Figure 6F), IspZmediated down-regulation was eliminated with IspZ-M1 or OxyS-M1 alone but was restored when both mutations were present ( Figure 6G).

Putative ORF-internal sRNAs
We also noted examples of abundant 3´ ends internal to ORFs (Supplementary file 3), downstream of nearby 5´ ends previously identified by dRNA-seq (Thomason et al., 2015), and associated with a strong signal in total RNA-seq ( Figure 7). A previous study inferred from total RNA-seq data that some sRNAs might be derived from sequences internal to ORFs (Dar and Sorek, 2018a). To test whether we could detect defined transcripts for these internal signals, we selected candidate RNAs derived from the ftsI (renamed FtsO), aceK, rlmD, mglC, and ampG ORFs for further investigation ( Figure 7A). Analysis of dRNA-seq data (Thomason et al., 2015) suggested that the FtsO, aceK int and ampG int 5´ ends likely are generated by RNase processing of the overlapping mRNA, whereas rlmD int and mglC int likely are transcribed from promoters internal to the overlapping ORFs. In nearly all cases, the RNA 3´ ends are not predicted to be due to Rho-dependent transcription termination events (Supplementary file 3), strongly suggesting they are generated by RNase processing or, for aceK int, intrinsic termination (intrinsic terminator score = 5.9, Supplementary file 3).
Northern analysis was performed for these RNAs using the same RNA analyzed in the Term-seq experiment ( Figure 7B); distinct bands were detected for all the RNAs tested. While FtsO and ampG int were relatively abundant under all growth conditions tested, mglC int and rlmD int were only expressed in cells grown in LB, and aceK int was most abundant in LB at OD 600 ~2.0. We also conducted northern analysis using RNA extracted from WT or Δ hfq cells across growth ( Figure 7B). The aceK int transcript was strongly dependent upon hfq, whereas the other RNAs were unaffected by hfq deletion, though all five transcripts have been reported to coimmunoprecipitate with Hfq (Melamed et al., 2020;Melamed et al., 2016). Additionally, the ORF-internal sRNAs are found in chimeras with other putative mRNA and sRNA targets in the Hfq RIL-seq datasets (Melamed et al., 2020;Melamed et al., 2016): FtsO and RybB, aceK int and ompF, rlmD int and MicA, mglC int and ArcZ, ampG int and CyaR. Significant chimeras also were detected between aceK int and gatY in the RIL-seq data set for the ProQ RNA chaperone (Melamed et al., 2020). Collectively, these data suggest these ORF-internal transcripts have independent regulatory functions.

ORF-internal FtsO is an sRNA sponge
To test for the suggested regulatory function, we focused on FtsO, which is encoded internal to the coding sequence of the essential cell division protein FtsI, and exhibited high levels across growth ( Figure 7B). The predominant target for FtsO in the RIL-seq datasets was the sRNA RybB ( Figure 8A) followed by the sRNA CpxQ (Melamed et al., 2020). The Hfq-mediated FtsO-RybB interaction was also detected in an independent CLASH dataset (Iosub et al., 2020).
We hypothesized that FtsO functions as a sponge for RybB and CpxQ, which are induced by misfolded outer membrane proteins and inner membrane proteins, respectively, and down-regulate the corresponding classes of proteins (reviewed in (Hör et al., 2020)). This model was tested by overexpressing FtsO in WT or Δ hfq cells grown to exponential and/or stationary phase (150 and 360 min after subculturing). For the 360 min time point when the levels of both RybB and CpxQ are highest, a reduction was observed for both sRNAs in cells overexpressing FtsO ( Figure 8B). As reported previously, the levels of RybB and CpxQ are significantly lower in the ∆ hfq strain, though some FtsO-dependent down regulation of RybB is still detected.
Interestingly, we also observed that RybB overexpression had the reciprocal effect of decreasing Finally, we mutated the chromosomal copy of ftsO to introduce the same nucleotide substitutions ( Figure 8C) that are silent with respect to the FtsI amino acid sequence. WT cells and cells carrying these mutations were treated with ethanol, causing outer membrane stress, which is known to induce RybB (Peschek et al., 2019). In both strains, a transient increase in RybB levels was observed 5 min after ethanol addition ( Figure 8E). In WT cells, RybB levels then were decreased for 30 min after ethanol treatment. By contrast, in cells with mutant ftsO, RybB levels again increased at 20 min following ethanol treatment. This effect was also observed in a second experiment ( The work demonstrates stable regulatory sRNAs can be derived from sequences internal to ORFs such that the same DNA sequence encodes two different functional molecules.

Widespread premature transcription termination
Through our transcriptome-wide mapping of 3´ ends and Rho-dependent termination, we uncovered extensive RNA-mediated regulation and sRNA regulators encoded by 5´ UTRs and internal to ORFs. Other studies have previously identified RNA 3´ ends and regions of Rhodependent termination in E. coli (Dar and Sorek, 2018b;Ju et al., 2019;Peters et al., 2012;Yan et al., 2018). While there was considerable overlap between our work and these prior studies, there were also substantial differences. For example, reporter gene fusion data and northern analysis supported Rho termination of sugE, cfa, cyaA, speA, and mdtJ, of which, only sugE was detected in the previous genome-wide surveys (Dar and Sorek, 2018b;Ju et al., 2019;Yan et al., 2018). Some of the differences between studies can be attributed to differences in the E. coli strains, growth conditions, and methods used. Indeed, we previously found that small methodological differences have a large impact on the identification of mapped TSSs (Thomason et al., 2015). Like any RNA-seq method, a few 3´ ends also could be due to RNA degradation during library preparation. Nevertheless, our follow-up experiments confirmed the biological relevance of several 3´ ends in 5´ UTRs and internal to ORFs. Given that strain and growth conditions used for our Term-seq and total RNA-seq match those of the previous dRNA-seq analysis (Thomason et al., 2015) in which we identified TSSs and 5´ processed ends, the combined sets represent a valuable resource for examining the E. coli transcriptome (see materials and methods for links to interactive browsers).
The majority of the 3´ ends that we identified were classified as "internal" or "orphan", most of which map within 5´ UTRs or internal to ORFs, and a significant number of which are predicted to be generated by premature transcription termination. This notion of widespread premature transcription termination has been underappreciated in other studies that detected RNA 3´ ends and Rho-mediated termination. It is generally not possible to identify the exact position of Rho termination due to post-transcriptional RNA processing. Nevertheless, our reporter assays showed that in most cases tested, Rho termination could be localized to the 5´ UTR, suggesting that modulation of Rut accessibility in 5´ UTRs could be a common mechanism of regulation.

Multiple levels of regulation at 5´ UTRs
Presuming that many premature termination events are regulatory, we documented and characterized examples of novel, diverse regulatory events for several of the 3´ ends.
Undoubtedly, additional unique regulatory mechanisms exist for many of the other 3´ ends. We propose that the identification of 3´ ends in 5´ UTRs and ORFs is an effective approach to discover novel regulatory elements. Classically, these regulators, such as riboswitches and attenuators, have been identified by serendipity, studies of individual genes, or searches for conserved RNA structures (reviewed in (Breaker, 2018)), but these approaches may miss regulatory RNA elements if the function of the downstream gene is unknown or the region is not broadly conserved. Given that Term-seq is a sensitive, relatively unbiased, and genome-wide approach, it is another means of obtaining evidence for regulation in 5´ UTRs. As an example, the Term-seq data showed that transcripts from the 5´ UTR of the E. coli glycerol facilitator glpF have different 3´ ends under LB and M63 growth conditions, which could be due to uncharacterized regulation. 3´ end-mapping applied to E. coli grown under other conditions or other bacterial species should lead to the characterization of many more regulators, particularly in organisms such as the Lyme disease pathogen Borrelia burgdorferi (Adams et al., 2017) that lack any known cis-acting RNA elements. Consistent with broad applicability, Term-seq previously led to the identification of known riboregulators in Bacillus subtilis and Enterococcus faecalis, and a novel attenuator in Listeria monocytogenes (Dar et al., 2016).
A number of 3´ ends in 5´ UTRs and ORFs were found to be associated with uORFs. Our characterization of the mdtU uORF suggests that regulation of premature mdtJI transcription termination occurs in response to ribosome stalling induced by polyamines. Two recent studies showed that the polyamine ornithine can stall ribosomes immediately upstream of the stop codon of the speFL uORF, affecting Rho binding and the structure of the speFL mRNA (Ben-Zvi et al., 2019;Herrero del Valle et al., 2020). Strikingly, conservation of MdtU is strongest at the Cterminus, and overlaps a region of the mdtU RNA that is predicted to base pair with the mdtJI ribosome binding site. Thus, a mechanism similar to the one found for speFL may regulate mdtJI induction by spermidine, a hypothesis that deserves further study, together with other examples where 3´ ends are located downstream of uORFs.
We also documented three instances of 3´ ends that localized a short distance downstream of known trans-acting sRNA base-pairing sites. These 3´ ends could be generated by endonuclease processing as a result of sRNA base pairing, or could be due to protection against exonucleases as a result of sRNA pairing. An examination of sRNA base-pairing sites predicated by RIL-seq points toward other instances of this type of regulation. For example, Term-seq identified 3´ ends immediately downstream of the predicted sRNA base-pairing regions for the uncharacterized mRNA-sRNA interactions rbsD-ArcZ, dctA-MgrR, and yebO-CyaR detected by RIL-seq chimeras (Melamed et al., 2020). In all these instances, the 3´ ends could be a result of the sRNA regulatory effect, and in some cases, may result in the formation of a new sRNA, as we observed for ChiZ. In general, our data further illustrate the complex regulation that occurs once transcription has initiated.

Generation of sRNAs from 5´ UTRs and ORF-internal sequences
Previous studies have shown that intergenic regions and mRNA 3´ UTRs are major sources of regulatory sRNAs, with a few characterized examples of sRNAs derived from 5´ UTRs, and no characterized ORF-internal sRNAs (reviewed in ). Our data document that 5´ UTRs and ORFs can indeed encode functional base-pairing sRNAs. However, our work also raises important questions, including the mechanisms by which 5´ UTR-derived and ORFinternal sRNAs are generated.
Given that sRNAs derived from 5´ UTRs only require the generation of a new 3´ RNA end (likely sharing a TSS with their cognate mRNA), and are not usually constrained by codon sequences, they could evolve rapidly. We documented the formation of several 5´ UTR fragments by cis-regulatory events. These by-products of regulation could obtain independent regulatory functions, as has been reported for a few riboswitches and attenuators (DebRoy et al., 2014;Melior et al., 2019;Mellin et al., 2014). The sRNA 3´ end can be formed by intrinsic termination or Rho-dependent termination and/or processing. Importantly, for the downstream mRNA to be expressed, there needs to be some transcriptional readthrough. RNA structure predictions strongly suggest the IspZ 3´ end is generated by intrinsic termination (Supplementary file 3) for which we observed very little readthrough. In contrast, the ChiZ 3´ end is generated by Rho-dependent termination with significant readthrough that would allow chiP expression. The mechanisms that drive formation of either 5´-derived sRNAs or the corresponding full-length mRNAs could be regulated and are an interesting topic for future work.
Less is known about both the 5´ and 3´ ends of the ORF-internal sRNAs. The ends might be generated by ORF-internal promoters, termination, or RNase processing. In cases where one or both sRNA ends are generated by processing, this is presumably coupled with downregulation of the overlapping mRNA. Strikingly, the number of sequencing reads for FtsO is orders of magnitude higher than that for the ftsI mRNA. How and when FtsO is produced are interesting questions for future work. It is possible that the ftsI mRNA is protected from cleavage by ribosomes during cell division such that FtsO is only generated in the absence of mRNA translation. Other coding sequence-derived sRNAs, such as the putative regulator internal to mglC, likely have their own TSS. In some cases, such as rlmD int and ampG int, a transcript originating from a TSS internal to the cognate mRNA could be processed to form the 5´ end of the sRNA. These observations underscore how the interplay of transcription initiation, transcription termination, and RNase processing leads to many short transcripts that have the potential to evolve independent regulatory functions.

Roles of 5´ UTR-derived and ORF-internal sRNAs
We identified and characterized three sRNA sponges that have 3´ ends in either 5´ UTRs or internal to the coding sequence. The first example, ChiZ-ChiX-chiP, represents a novel reciprocal autoregulatory loop. ChiZ is generated from the chiP 5´ UTR encompassing the site of pairing with the ChiX sRNA. We found ChiZ is formed by Rho-dependent termination, and in the absence of ChiX base pairing with chiP. When cells utilize chitobiose as a carbon source and ChiX levels are naturally low, there are higher levels of chiP (Overgaard et al., 2009) and likely also higher levels of ChiZ. In this model, when chitooligosaccharides need to be imported, ChiZ prevents ChiX from base pairing, but does not promote degradation of this sRNA. When metabolic needs shift, ChiX could be released from ChiZ allowing ChiX to regulate chiP and other targets. This may work in competition, conjunction, or at separate times from the chbBC intergenic mRNA sequence, which also sponges ChiX, but results in ChiX decay (Overgaard et al., 2009). It will be interesting to see if other 5´ UTRs and sRNAs form similar autoregulatory loops, since 5´ UTRs are enriched for sRNA pairing sites, and we have shown that sRNA pairing is associated with distinct small transcripts from 5´ UTRs.
In a second example, we characterized IspZ, which is generated from the 5´ UTR of ispU (uppS), encoding the synthase for undecaprenyl pyrophosphate (UPP), a lipid carrier for bacterial cell wall carbohydrates (Apfel et al., 1999). We suggest IspZ may connect cell wall remodeling with the oxidative stress response. Cellular levels of toxic reactive oxygen species are increased when cell wall synthesis is blocked, and oxidative damage impedes ispU-related cell wall growth (Kawai et al., 2015). Thus, IspZ downregulation of the hydrogen peroxide-induced sRNA OxyS may dampen the oxidative response at a time when the response might be detrimental.
While small transcripts from within coding sequences have been noted previously (Dar and Sorek, 2018a;Reppas et al., 2006), and a homolog of FtsO (STnc475) has been detected for Salmonella enterica (Smirnov et al., 2016), we are the first to document a regulatory role for a bacterial ORF-internal sRNA. FtsO was found to base pair with, and negatively regulate the membrane stress response sRNA, RybB. The ftsI mRNA encodes a low abundance but essential penicillin-binding protein that is localized to the inner membrane at the division site and cell pole (Weiss et al., 1997). The cell may need to alter its response to membrane stress during the division cycle when many membrane components are needed, and we suggest FtsO could facilitate crosstalk between cell division and membrane stress by regulating RybB activity.
Intriguingly, we observed the greatest effect of ethanol addition on RybB levels in the Δ ftsO background at 20 min, which is also the doubling time for E. coli MG1655. While it is reasonable to assume that regulatory sRNAs encoded by intragenic sequences are rare, due to the challenge of encoding two functions in one region of DNA, we think it is likely that other ORFinternal sRNAs have function.
Our work has significantly increased the number of sRNAs documented to modulate the activities of other sRNAs by sponging their activities, as found for ChiZ, or affecting their levels, as shown for IspZ and FtsO. The findings raise other questions including how many more short transcripts generated by termination or processing have regulatory functions. It also is intriguing that some abundant sRNAs are subject to regulation by multiple sponges, including ChiX, which is regulated by ChiZ and the chbBC intergenic region (Overgaard et al., 2009), and RybB, which is regulated by FtsO, RbsZ (Melamed et al., 2020) and the 3´ETS leuZ tRNA fragment (Lalaouna et al., 2015). Finally, little is known about how the activities and levels of the sponges themselves are regulated. The levels of some, but not others, are influenced by Hfq binding. A number appear to be constitutively expressed, such that target sRNA levels must increase to overcome the effects of the sponges.
Further identification and characterization of RNA fragments generated by premature termination or processing, detected by mapping the 5´ and 3´ ends of bacterial transcriptomes, will help elucidate the effects of regulatory RNAs. Our datasets point to a plethora of potential cis-and trans-acting regulatory elements in 5´ UTRs and ORF-internal regions, providing a valuable resource for further studies of gene regulation. (ATG→ACG) by recombineering the PCR amplified thyA marker (using primers JW10309 + JW10310) into the mdtU gene and then replacing the thyA marker with the mdtU mutation (recombineering a PCR-amplified product made using primers JW10311 -JW10314). The native thyA locus was restored as described previously (Stringer et al., 2012). This resulted in the wildtype mdtU mdtJ-3XFLAG (YY20) or the mdtU start codon (ATG→ACG) mutant mdtJ-3XFLAG (AMD742) strain. The

Growth conditions
Bacterial strains standardly were grown with shaking at 250 rpm at 37˚C in either LB rich medium or M63 minimal medium supplemented with 0.2% glucose and 0.001% vitamin B1.

RNA isolation
E. coli cells corresponding to the equivalent of 10 OD 600 were collected by centrifugation, washed once with 1X PBS (1.54 M NaCl, 10.6 mM KH 2 PO 4 , 56.0 mM Na 2 HPO 4 , pH 7.4) and pellets snap frozen in liquid N 2 . RNA was isolated using TRIzol (Thermo Fisher Scientific) exactly as described previously (Melamed et al., 2020). RNA was resuspended in 20-50 µl DEPC H 2 O and quantified using a NanoDrop (Thermo Fisher Scientific). RNA was fragmented by incubating 9 µl of cleaned-up RNA with 1X Fragmentation Reagent (Invitrogen) for 2 min at 72˚C, followed by an addition of 1X Stop Solution (Invitrogen).

Term-seq
Samples were stored on ice following individual fragmentation of each sample. Fragmented-RNA was pooled together and cleaned using the RNA Clean and Concentrator-5 kit (Zymo) according to the manufacturer's instructions. Library construction continued following the bacterial-sRNA adapted, RNAtag-seq methodology starting at the rRNA removal step (Melamed et al., 2018). Term-seq RNA libraries were analyzed on a Qubit 3 Fluorometer (Thermo Fisher Scientific) and an Agilent 4200 TapeStation System prior to paired-end sequencing using the HiSeq 2500 system (Illumina).
BedGraph files were generated using deepTools (Ramírez et al., 2016) on reads from each strand separately.
An initial set of termination peaks was called per sample on the bedGraph files from uniquely aligned reads using a novel signal processing approach combined with a statisticallyinformed method of combining multiple replicates. Briefly, the scipy.signal Python package was used to call peaks on each replicate in a manner which handled high, sharp peaks as found in Term-seq data, using the scipy.signal.find_peaks function with a width of (1, None), a prominence of (None, None), and a relative height of 0.75. Peaks for each replicate were then combined using the IDR framework (Landt et al., 2012) into a set of peaks that were reproducible (both strong and consistent) across replicates. The code for this can be found at https://github.com/NICHD-BSPC/termseq-peaks and can be used in general for Term-seq peakcalling in other bacteria. Termination peaks were subsequently curated according to the following criteria. The single-bp peak coordinate was set to the strongest signal nucleotide within the boundary of the initial broader peak using multiBigWigSummary from deepTools 3.1.3. The most downstream position, relative to the peak orientation, was chosen when several positions were equally strong. Scores from peaks within a distance of up to 100 bp were assessed to select the peak with the highest score among the cluster for further analysis. These curated peaks were used for all analysis herein (Supplementary file 1).

Total RNA-seq
Total RNA-seq was performed using the same RNA that was used for the Term-seq library preparations. Total RNA-seq library construction was carried out based on the RNAtag-seq methodology (Shishkin et al., 2015), which was adapted to capture bacterial sRNAs (Melamed et al., 2018). Total RNA-seq RNA libraries were sequenced as for Term-seq. Total RNA-seq data processing followed the same procedures as Term-seq data analysis for QC, adaptor removal and sequencing read mapping.

BCM Treatment and DirectRNA-seq
One culture of E. coli MG1655 cells (GSO989) was grown in LB to an OD 600 ~0.5 and the culture was split and half was treated with 100 μ g/ml of BCM (gift from Max Gottesman) for 15 min. Total RNA was isolated from 1.5 ml of untreated and BCM-treated cultures using the hotphenol RNA extraction method followed by ethanol precipitation as described previously (Stringer et al., 2014). Genomic DNA was removed by treating 8 μ g of total RNA with 4 U of Turbo DNase (Invitrogen) for 45 min at 37˚C. DNA-free RNA was purified using phenol:chloroform:isoamyl alcohol and ethanol precipitation as described previously (Stringer et al., 2014). rRNA was removed using a Ribo-Zero (Bacteria) kit (Epicentre) according to the manufacturer's instructions. The RNA libraries were prepared and processed at the Helicos BioSciences facility where poly-A tails and a 3´-dATP block were added to make the RNA suitable for direct sequencing on the HeliScope™ Single Molecule Sequencer (Ozsolak and Milos, 2011).

Identification of Rho-dependent 3´ ends
CLC Genomics Workbench (v8.5.1) was used to align DirectRNA-sequencing reads from untreated and BCM treated samples to the MG1655 NC_000913.3 sense and reversecomplemented genome to properly identify the position of the first mapped 3´ end nucleotide.
Mapping parameters were set to default, except for "Length fraction" and "Similarity fraction", which were set to 0.7 and 0.9, respectively. Quality scores were not generated by the HeliScope™ Sequencer; arbitrary quality scores were added to each read in fasta files to fit import requirements for CLC Genomics Workbench, but they were ignored when mapping. The ). We refer to the p-value from the Fisher's exact test as the "significance score", and we refer to the R (BCM/Untreated) ratio as the "Rho score" (Supplementary file 2).
Putative Rho termination regions were those genome coordinates with a positive Rho score and a significance score <1e -4 . Only the position with the highest Rho significance score within an 800 nt window upstream and downstream is reported in Supplementary file 2. Note that Rho scores and significance scores listed in Supplementary file 3 were calculated for specific positions matching the dominant Term-seq 3´ ends; there may be nearby positions with a lower significance score and/or a higher Rho score.

Classification of 3´ ends
The intersect function of Bedtools 2.28.0 (Quinlan and Hall, 2010), ran via pybedtools v0.8.0 (Dale et al., 2011), was used to assign each peak to one or more classes: Primary (3´ peaks located on the same strand either within 50 bp downstream of the 3´ end of an annotated mRNA ORF, tRNA, rRNA or sRNA with the highest score), Antisense (3´ peaks located on the opposite strand of an annotated mRNA ORF, tRNA, rRNA or sRNA within 50 bp of its start and end coordinates), Internal (3´ peaks located on the same strand within an annotated mRNA ORF, tRNA, rRNA or sRNA coordinates, excluding the 3´ end coordinate) and Orphan (3´ peak not falling in any of the previous classes).
3´ ends were also categorized according to their position relative to mRNA 5´ UTRs and internal mRNA regions (Supplementary file 3). Any 3´ end (Supplementary file 1) that was located within a region of 200 bp upstream of an annotated start codon to the stop codon were extracted and further analyzed. To remove any 3´ ends that likely belonged to an upstream gene in the same direction, TSS data (Thomason et al., 2015) obtained using the same growth conditions and E. coli strain as Term-seq was considered. All these 3´ ends were examined for the first upstream feature (either a TSS or an ORF stop codon). Any 3´ end where the first upstream feature was a stop codon was eliminated, unless there was also a TSS ≤ 200 bp upstream the 3´ end or that upstream feature was the stop codon of an annotated "leader peptide" on the EcoCyc E. coli database (mgtL, speFL, hisL, ivbL, ilvL, idlP, leuL, pheL, pheM, pyrL, rhoL, rseD, thrL, tnaC, trpL, uof). Any 3´ end where a TSS was only 20 bp or less upstream was also eliminated. This resulted in the 3´ end coordinates in Supplementary file 3. For the LB 0.4 condition, 3´ ends were given a Rho score from Direct-RNA-seq (as described above) and an intrinsic terminator score (with a custom script as defined in (Chen et al., 2013)). uORFs for which synthesis was detected by western analysis and/or translational reporter fusions (Hemm et al., 2008;VanOrsdel et al., 2018;Weaver et al., 2019), sRNAs for which synthesis was detected by northern analysis (this study) and other characterized RNA regulators were noted for the LB 0.4 condition.

Comparison of Term-seq 3´ ends and Rho termination regions to equivalent genomic positions identified in other studies
As detailed in Figure  galactopyranoside and stopped by adding Na 2 CO 3 . All assays were done at room temperature.
The OD 600 and A 420 of the cultures were measured using a Jenway 6305 spectrophotometer. The translational chiP-lacZ fusions (DJS2979 and DJS2991) were assayed as above, with the following changes. Three separate colonies were grown overnight in LB with 100 μ g/ml ampicillin, diluted to an OD 600 of 0.05 in the same medium supplemented with 0.2% arabinose and 1 mM IPTG, and grown for 150 min (OD 600 ~1.5) at 37˚C. Reactions were performed at 28˚C and the OD 600 and A 420 of the cultures were measured using an Ultrospec 3300 pro spectrophotometer (Amersham Biosciences). For all experiments, β -galactosidase activity was calculated as (1000 x A 420 )/(OD 600 x V ml x time min ).

Northern blot analysis
Northern blots were performed using total RNA exactly as described previously (Melamed et al., 2020). For small RNAs, 5 μg of RNA were fractionated on 8% polyacrylamide urea gels Hybridization Buffer (Ambion) and hybridized with 5´ 32 P-end labeled oligonucleotides probes (listed in Supplementary file 4). After an overnight incubation, the membranes were rinsed with 2X SSC/0.1% SDS and 0.2X SSC/0.1% SDS prior to exposure on film. Blots were stripped by two 7-min incubations in boiling 0.2% SDS followed by two 7-min incubations in boiling water.

Immunoblot analysis
Immunoblot analysis was performed as described previously with minor changes (Zhang et al., 2002). Samples were separated on a Mini-PROTEAN TGX 5%-20% Tris-Glycine gel (Bio-Rad) and transferred to a nitrocellulose membrane (Thermo Fisher Scientific). Membranes were blocked in 1X TBST containing 5% milk, probed with a 1:2,000 dilution of monoclonal α -

FLAG-HRP (Sigma) and developed with SuperSignal West Pico PLUS Chemiluminescent
Substrate (Thermo Fisher Scientific) on a Bio-Rad ChemiDoc MP Imaging System.

Data and Code Availability
The raw sequencing data reported in this paper have been deposited in SRA under accession number PRJNA640168. Code for calling 3´ ends in Term-seq sequencing reads can be found at https://github.com/NICHD-BSPC/termseq-peaks. Code for calling Rho termination regions can be found at https://github.com/gbaniulyte/rhoterm-peaks.
The processed RNA-seq data from this study are available online via UCSC genome browser at the following links:       Protein extracts were separated on a Tris-Glycine gel, transferred to a membrane, stained using Ponceau S stain, and probed using -FLAG antibodies. We do not know the identity of the higher molecular weight bands observed for the WT sample in the western analysis. They could be due to multimeric MdtJ or MdtJ association with the membrane.             (Shishkin et al., 2015)), Term-seq (modified from (Dar et al., 2016)) and DirectRNA-seq (Ozsolak and Milos, 2011). RNA 3´ end adapter (red line), cDNA adapter (green line) and stranded sequencing (asterisks) are indicated.     Table S2, column B and R: "TTS_position" and "TTS_type_rho_dependent"); Dar and Sorek, which used Total RNA-seq and Term-seq of E.
coli BW25113 grown to exponential phase in LB ± BCM ( (Dar and Sorek, 2018b), Table S2, column D and L: "primary 3´ end position" and "termination mechanism"). If 3´ ends were called within a 500 nt window on the same DNA strand between both datasets, they were considered shared. The number of unique or shared Rho-dependent termini in each study are  β -galactosidase activity for additional 5´ UTR + ORF-lacZ fusions in WT (AMD054) and rhoR66S mutant (GB4). The assay was carried out and data displayed as described for (B). With the exception of ispU, ompA and glpF, all loci were associated with significant Rho termination regions (Supplementary file 2), but some loci had no detectable 3´ end in the Term-seq LB 0.4 dataset (add, cspG, ydjL, ytfL, rimP, mnmG and srkA).
All identified Rho termination regions are represented, defined as regions with at least one genomic coordinate with a significance score <1e -4 . Rho scores were calculated for each genomic position by comparing DirectRNA-seq coverage in windows 800 nt upstream and downstream in the treated (+BCM) and untreated (-BCM) samples (see materials and methods). The 3´ genomic coordinate with the highest Rho score, DNA strand, 3´ end classification (see materials and methods), the gene annotation for the classification (details), the read coverage in the 800 nt windows upstream and downstream the 3´ end position ±BCM, the Rho score, and the p-value from the Fisher's exact test (significance score) make up the columns of the table. An "undefined" Rho score indicates one that could not be calculated due to zero reads in the -BCM downstream region. A significance score of "N/A" indicates that the significance score was too low to accurately report.