Robust regulation of transcription pausing in Escherichia coli by the ubiquitous elongation factor NusG

Significance Pausing by RNA polymerase is essential for transcription in all domains of life. Little is known about how pausing is regulated by trans-acting factors and cis-acting signals in DNA/nascent RNA in vivo. Employing an advanced version of (nascent elongating transcript sequencing) NET-seq (RNET-seq), we found that the universally conserved transcription elongation factor NusG is a robust suppressor of backtracked and nonbacktracked pauses in Escherichia coli. Our in vivo work revealed a similarity of NusG to Spt5, its archaeal and eukaryotic homolog, which regulates the elongation rate of many eukaryotic genes.


Supporting Information Text
Depletion of an essential protein in E. coli. In contrast to B. subtilis, NusG is an essential protein in most E. coli strains (1), and our attempts failed to knockout nusG in several E. coli strains. Thus, we instead developed a NusG depletion strategy by repressing transcription of nusG in its original locus with an inducible Para-dCas9 protein roadblock guided to the translation start codon of nusG by nusG-sgRNA (2). Induction of the dCas9 roadblock with arabinose resulted in rapid reduction in nusG transcription and time-dependent depletion of NusG protein with almost complete loss within 4 hours when the cells ceased to grow (SI Appendix, Fig. S1). Even without arabinose-dependent induction of dCas9, the NusG protein level in our depletion strain was lower than the isogenic WT strain that lacked the nusG-targeting sgRNA, indicating incomplete repression of the dCas9/ nusG-sgRNA system in the absence of arabinose (SI Appendix, Fig. S1A). Therefore, we used RNET-seq data generated from a strain lacking the nusG-sgRNA as a surrogate for a WT control. We generated RNET-seq data from three biological replicates of WT and NusG-depleted (dNusG) cells.
Bioinformatic pipelines for analysis of RNET-seq data. All applications for RNET-seq were deployed on the DNANexus platform (3). The Aligner 2.1 pipeline was used to align sequence reads to the reference NC_000913.2 E. coli genome and to remove reads that mapped to more than a single genomic location. The Aligner 4.1 pipeline was used to align all sequence reads to the genome including reads that mapped to multiple genomic locations such as rRNA genes. Finder 2.2 was used to identify pause sites according to minimum threshold values for pause score and count (3). The Differential_Pauses pipeline was developed in this work for analyzing differential pause site strength. Following our previously developed bioinformatic definition of pause peaks in RNET-seq data, we calculated the pause score values for each position in the genome as a ratio of the count of a unique 3' end to the median count in the 100 bp region centered at the 3' end (3). We considered an RNET-seq peak possessing a score above a threshold value as an authentic transcription pause site. The minimum read counts for pauses that qualified as pause sites were automatically calculated as 597 for data derived from WT cells and 533 for data derived from dNusG cells and the minimum score value was set to 50. This stringent cutoff for genome-wide analysis resulted in a total of 1613 pauses in WT cells and 5091 pauses in dNusG cells (SI Appendix, Dataset S1). Adjacent pauses within a 10 nt window may be considered as a single pause site to account for 1-5-nt heterogeneity in position of the 3' RNA ends observed at some pause sites (SI Appendix, Dataset S2).
All instances of false unique pauses derived from single-nucleotide polymorphisms in rRNA and tRNA genes were removed manually from the Finder-generated lists of strong pauses (score values ≥50). Next, we categorized the strong pauses as unique to WT, dNusG, and shared by both WT and dNusG cells. We identified 4408 pauses unique to dNusG cells and 444 unique to WT cells (SI Appendix, Dataset S3). Among the 444 pauses unique to WT cells, 361 had a single 3' end and 83 had multiple 3' ends clustered within a 10-bp window. According to our definition, such clustered pauses likely derived from the same local sequence elements in the DNA and nascent RNA and we further discuss their origin in SI Appendix, Fig. S18 (3). We also identified 509 pauses that were shared by WT and dNusG cells (SI Appendix, Dataset S3). Only a fraction of these pauses was affected by NusG depletion.
Setting minimum values for pause score and read count to 12 (including weak pause sites) and 200 (including less abundant transcripts), respectively, resulted in a total of 17243 pauses in WT cells and 27449 in dNusG cells (SI Appendix, Dataset S4). This less stringent cutoff identifying weaker pauses was used to populate short genomic regions such as 5' UTRs and translation initiation regions with pauses, which otherwise had an insufficient number of pauses for reliable data analyses.
To determine how NusG depletion affected the pause strength, we generated differential pause tables comparing the pause scores at each pause in WT and dNusG cells. As the majority of the pauses were unique to either WT or dNusG cells, the impact of NusG depletion was determined as a ratio of score values in dNusG versus WT cells even if the pause at the corresponding genomic position was observed only in one type of cell. The stringent cutoff of score values ≥50 resulted in 4301 differential pauses (SI Appendix, Dataset S5). The relaxed cutoff of score values ≥12 resulted in 28171 differential pauses (SI Appendix, Dataset S6). A summary of information from Datasets 1 to 6 is presented in SI Appendix, Table S1. The strength of the hisL pause site, which has been extensively characterized in vitro (4), follows the time course of NusG depletion (SI Appendix, Fig. S17).
In general, the Differential_Pauses pipeline indicated that pauses unique to WT cells corresponded to NusG-stimulated pauses and pauses unique to dNusG cells corresponded to NusG-suppressed pauses (SI Appendix, Datasets S3 and S5). Data generated by Differential_Pauses generally agreed with the data generated by Finder.
NusG suppresses pausing in rRNA operons. NusG is a component of a multiprotein transcription antitermination complex that forms during transcription of seven rRNA operons in E. coli, making it impossible to assign RNET-seq reads to a specific rRNA operon (5). However, RNET-seq read coverage of all seven rRNA operons was reduced 2-to 3-fold (2.7-fold average) in dNusG cells, consistent with a role of NusG in antitermination (SI Appendix, Fig. S15A). NusG depletion also increased the strength of several pauses in the rRNA operons, but only the single strongest pause in dNusG cells passed the stringent cutoff score value of ≥50. This pause, which was not observed in WT cells, was located at G1025 in a bulge between helix 41 and helix 42 within domain II of mature 23S rRNA (6). Except for possessing a G residue at the 3' end, the sequence of the G1025 pause was similar to the -10G/-9G…-1Y +1G/T logo of NusG-suppressed pauses from protein-coding regions (SI Appendix, Fig. S15B). Helix 41 of 23S rRNA is formed by base paring between nts 991-1018 with nts 1144-1163 (SI Appendix, Fig. S15C). We conclude that the overall effect of NusG depletion on pausing in rRNA operons was moderate, compared to the effect on pausing within protein-coding genes.

Impact of transcript cleavage factors GreA and GreB on pausing in vivo. E. coli
Gre factors rescue backtracked transcription complexes by stimulating the intrinsic endonucleolytic cleavage activity of RNAP such that the newly formed 3' end of the nascent transcript is properly aligned in the catalytic center of RNAP (7). We reasoned that the effect of NusG on backtracked pauses could be underestimated in vivo due to the robust rescue of these pauses by Gre factors, as was shown previously by RNET-seq comparing WT and ∆greA ∆greB strains (8,9). This concern was derived from sequence bias that we observed at position +1 in the sequence logo of pause sites ( Fig. 3C-D). The +1T residue was almost equally frequent in the logo as the reported +1G, but +1T was not a part the previously identified consensus pause motif -10G -1Y +1G. This potential bias reinforced our concern about the impact of 3' RNA end processing by Gre factors at these sites (SI Appendix, Table S1) (8,(10)(11). To further investigate the possible impact of Gre factors, we generated separate sequence logos of NusG-suppressed pauses containing +1A, +1C, +1G or +1T. This position within the transcriptional bubble is located adjacent to the downstream edge of the RNA-DNA hybrid in the transcription elongation complex. This analysis revealed that a +1 pyrimidine (T or C) correlated with the presence of the -9G residue in the logo, whereas a +1 purine (G or A) correlated with the presence of -10G and -1 pyrimidine residues in the logo (SI Appendix, Fig. S10A-D). Note that -10G and -9G are at the upstream edge of the RNA-DNA hybrid in RNAP ( Fig. 4B-D). The pauses with -9G +1T residues likely corresponded to backtracked elongation complexes that were subsequently cleaved upstream of a U residue in RNA by Gre factors. This hypothesis was confirmed by analyzing sequence logos of pauses enriched in different read lengths in our libraries. Backtracked pauses result in RNET-seq read lengths of >18 nt (3,8). Gre factor-stimulated trimming of nascent RNA at the 3' end in vivo or in crude cell lysates of WT and dNusG cells was expected to reduce the read length. The finding that >18 nt reads (backtracked but not processed by Gre factors) were enriched with -10G, -1Y, and +1G sequences is consistent with our hypothesis. In contrast, 17 nt reads that were likely products of the 3' RNA cleavage, were enriched in -9G and +1T residues ( Fig. 4B-D). A similar but less pronounced tendency was observed for >18 nt and 17 nt reads of NusG-suppressed pauses (SI Appendix, Fig. S10E-F). The observed patterns strongly suggest that reads >18 nt belonged to backtracked complexes, and 17 nt reads, at least in part, were the products of preferential Gre factor-mediated cleavage 5' of the U residue.
Among pairs of two adjacent pauses separated by 1-2 nts, the downstream pauses generally possessed a larger fraction of backtracked (>18 nt) reads and a smaller fraction of 17 nt reads compared to the upstream pauses (SI Appendix, Table S6). Comparison of 3' ends of several selected NusG-suppressed pauses in the WT strain (this work) and the greA greB knockout strain (9) confirmed that pausing at the -1Y +1G sequence context of the consensus motif remained intact in ∆greA ∆greB cells, whereas subsequent Gre factor-stimulated cleavage in WT cells enriched the elongation complex register with +1T (SI Appendix, Fig. S18). In vitro transcription confirmed that pausing occurred at the position observed in vivo for several pause sites that we tested (SI Appendix, Table S7). For some pauses, in vitro transcription reveled pausing 1 to 4 nt downstream of the position detected in vivo, providing compelling evidence of the involvement of Gre factors at these sites in vivo. In these cases, the sequence of in vitro pauses fits well with the consensus motif -10G -1Y +1P, consistent with this position being the initial pause followed by backtracking and cleavage in vivo (SI Appendix, Table S7). Based on the occurrence of +1T and +1G residues in the sequence logos, Gre factors impacted NusGsuppressed and NusG-independent pause sites almost equally ( Fig. 3C-D). Note that backtracked complexes of RNAP are the target of Gre factors (7). Finally, our data showed no correlation between the influence of NusG on pause strength and backtracking (SI Appendix, Fig.  S14). Collectively, Gre-stimulated cleavage of nascent transcripts participated in the generation of nascent RNA 3' ends at some pause sites but did not overshadow the general suppression of pausing by NusG in vivo. Further validation of this hypothesis would require engineering of the dNusG E. coli strain lacking the greA and greB genes, which is beyond the scope of this study.

Materials and Methods
Strains, plasmids, and oligonucleotides. E. coli strains used in this study are listed in SI Appendix, Table S8. Plasmids used in this study are described in SI Appendix, Table S9. Sequences of DNA and RNA oligonucleotides used in this study are described in SI Appendix ,  Table S10.
Construction of the nusG knockdown strain. E. coli train SJ_XTL219 is a tCRISPRi derivative of MG1655 for blocking expression (knockdown) of chromosomal genes by induction of a chromosomally integrated dcas9, which encodes endonuclease-deficient Cas9, and recombineering of sgRNA. dcas9 was inserted into the ara operon under control of the arabinoseinducible PBAD promoter. Linked to a tet-sacB counter-selectable marker, the sgRNA sequence contained the Cas9-binding and transcription terminator modules that were constitutively expressed from the galM region of the chromosome (2). rpoC carrying a 3'-terminal 6His-tag sequence was introduced from strain NB854 (8) into strain SJ_XTL219 using P1 transduction and selection for a linked Km r resistance marker. The resulting strain (NB1246) was used as the NusG depletion strain for RNET-seq.
The PAM-following sequence AAAAAGCGCTGGTACGTCGT in nusG was selected for targeting by dCas9. This sequence was embedded into the 90-mer targeting oligonucleotide sgRNA-nusG adjacent to the dCas9 handle and flanked by 35-nt chromosome homology regions (SI Appendix, Table S10). sgRNA-nusG was created by replacement of the tet-sacB cassette in strain NB1246 with the targeting oligonucleotide by recombineering using the lambda RED functions from plasmid pSIM18 (15,16). Colonies were selected on LB agar plates without NaCl but containing 6% sucrose and additionally screened for Tet-sensitivity. Replacement of the tet-sacB cassette with the nusG-specific sgRNA was confirmed by PCR. Removal of plasmid pSIM18 by several passages of cells on LB plates at 37°C was confirmed by hygromycin sensitivity on LB plates supplemented with 200 µg/ml hygromycin. Induction of dcas9 in the resulting strain (NB1247) with arabinose results in a transcriptional roadblock that represses transcription of nusG.
DNA templates and proteins used for in vitro transcription. Selected pause sites for analysis by in vitro transcription were fused with a strong engineered promoter followed by a C-less cassette. The resulting constructs were tested for pausing by single-round in vitro transcription. Templates for in vitro transcription were generated via PCR by merging two overlapping fragments as described previously (3). The first fragment containing the pause site of interest surrounded by its flanking regions was amplified using E. coli chromosomal DNA as the template and a pair of site-specific primers. The second fragment contained a consensus promoter with an extended -10 region, followed by a 29 nt C-less cassette and a sequence that forms a strong RNA hairpin when transcribed, which insulated the promoter-proximal region of the transcript from the downstream sequence. The promoter/insulator sequence was a derivative of the B. subtilis trp leader amplified from plasmid pAY196 and common to all tested pause sites (17,18). PCR fragments were fractionated on precast 8% TBE polyacrylamide gels for 45 min at 200 V. Gels were stained with SyBR Gold (Life technologies) and bands of the correct length were excised from the gel. DNA was extracted with 60 µl of buffer containing 3 mM Tris-HCl, pH 8.9, 0.3 mM EDTA and 50 mM betaine by shaking at 1250 rpm for 1 hr at 60°C. Ten µl of the pause site-containing fragment and 1.2 µl of the promoter-containing fragment were combined in a 48 µl PCR reaction to merge the overlapping fragments using the promoter forward primer and the pause site reverse primer. The resulting DNA templates were recovered using the QIAquick PCR purification kit (Qiagen) and eluted with 50 µl of 0.5x TE buffer. E. coli RNAP holoenzyme was purified from strain NB959 carrying a 3'-terminal 6His-tag sequence at the rpoC gene (8). Previously described recombinant NusG protein (19) was overproduced and purified by the Protein Purification Core (Frederick National Lab Laboratory Services).

Fig. S1. NusG depletion after dCas9 induction with arabinose. (A) Top panel, Western blot
analysis indicates that the level of NusG was substantially higher in the WT strain compared with the depletion strain without dCas9 induction, indicating partial leakage of the PBAD promoter in the absence of arabinose induction. Four hours of dCas9 induction with arabinose resulted in NusG depletion below detectable levels in all samples used for RNET-seq. The image displays 3 biological replicates for the induced strain after probing for NusG. The WT parental strain was used as a control. Bottom panel, Western blot for the β subunit of RNAP (RpoB) as a loading control. (B) RNET-seq read coverage displayed in IGV browser showed a dramatic inhibition of transcription of nusG upon dCas9 induction. RNET-seq detected the nusG-targeting sgRNA (antisense to the coding sequence), as well as strong pausing of RNAP near the translation start codon of nusG and immediately upstream from the dCas9 roadblock. (C) RNA-seq data displayed in IGV showed the complete absence of the full-length nusG mRNA and a substantial repression of the upstream secE mRNA upon NusG depletion. The latter phenomenon might be derived from delayed release of the secE transcript from dCas9-blocked RNAP and reduced WT and dNusG data, respectively. Log2 values of fold-change in the pause score upon NusG depletion (log2FC) are indicated except for the waaQ (ops) site that was not detected by our Differential Pauses pipeline (ND). Values are positive for NusG-suppressed sites and negative for NusG-stimulated sites. Sequence around the pause sites is indicated. The bottom of each panel shows the results of a single-round in vitro transcription pause assay of the indicated templates in the absence and the presence of NusG (±NusG). Time points of the reaction are indicated above each lane. Ch, chase reactions. P, pause band; R, run-off transcript; *, minor pauses. The pause half-life (T ½) and efficiency (Eff) are indicated below each set of lanes. Efficiency exceeding 100 % is a consequence of data fitting. Pausing at two residues (T 3806206 and C 3806208 in the E. coli reference genome NC_000913.2) are observed in vitro at the waaQ (ops) pause site (20), the pause half-lives are indicated for both pauses.       S8. NusG depletion affects expression of numerous genes. Affected genes were identified by RNA-seq using differential gene expression and Gene Ontology analysis (biological process in Escherichia coli at http://geneontology.org/). The RNA-seq read coverage data of WT cells (blue) and dNusG cells (red) as they appear in the IGV browser. Expression of genes for primary alcohol catabolism such as ethanolamine metabolic process (A) and glycolate metabolism (B), as well as some genes involved in the SOS response to DNA damage such as lexA (C) and recA (D) were increased after NusG depletion. Conversely, expression of genes involved in maltodextrin/maltose transport (E), arginine transmembrane transport (F), flagellumdependent motility (G), and aggregation/biofilm formation (H) were decreased after NusG depletion.

Fig. S12. Distribution of the hairpin to 3' end distances at NusG-suppressed pauses. 458
NusG-suppressed pauses that possess dNusG Score values above 300 and log2FC-Pause-TPM values above 3 were selected from SI Appendix, Dataset S5. These sequences were scrambled to make a negative control. Prediction of RNA folding was performed using CLC Genomics Workbench software (Qiagen, https://resources.qiagenbioinformatics.com/manuals/clcgenomicsworkbench/current/index.php?m anual=Annotate_with_Flanking_Sequence.html). The structure closest to the 3' end was selected if several alternative hairpins were predicted for a single pause site. . Exponential fits of the data are marked by solid lanes and the corresponding equations are indicated. Upon NusG depletion, the decline in 3' end coverage toward the end of genes is increased for rho but decreased for prfB because of a strong pause site at the beginning of prfB. (E) NusG-suppressed pauses in prfB in WT and dNusG cells. Genome-aligned reads are in gray while mapped 3′ ends corresponding to paused RNAP are in blue and red in WT and dNusG cells, respectively. The values of log2 fold change in pause score after NusG depletion (log2FC) and the sequence around the pause site are indicated above and below the IGV panel, respectively. An in frame TGA stop codon that is a target for feedback regulation by the prfB gene product (termination factor RF2) is boxed and highlighted in gray. (F) Genome-wide normalized RNAP coverage along all annotated E. coli ORFs in WT and dNusG cells. The density of RNET-seq reads is plotted using a five-period moving average. NusG depletion does not impact transcription polarity.

Fig. S15. NusG depletion inhibits transcription of rRNA genes identified by RNET-seq. (A)
Transcription traffic in the rrnB operon is reduced 2-fold after NusG depletion. Arrow marks the position of the strongest NusG-suppressed pause in the 23S rRNA gene. Positions of the promoters (P1 and P2) and terminators (T1 and T2) are shown (B) Comparison of the sequence logos of NusG-suppressed and NusG-independent pauses in protein-coding genes (Fig. 3C,D). The sequence of the strongest pause in the 23S rRNA gene of the rrnB operon is shown at the bottom. Position -1 corresponds to the 3' end of paused RNA and is marked with an arrow. (C) Secondary structure of a fragment of E. coli 23S rRNA that participates in binding incoming aa-tRNAs. 23S rRNA helices are marked by blue numbers (6). The 3' end of nascent RNA at the strongest NusG-suppressed pause site is indicated by an arrow.

Fig. S16. Model of NusG-bridged RNAP-ribosome complexes at
NusG-suppressed pause sites. RNAP pausing provides time for association of a NusG-bound 30S ribosomal subunit. The delivered NusG releases this pause at the translation initiation codon. Ribosome-assisted recruitment of NusG to RNAP results in the formation of a coupled transcription-translation complex containing RNAP and a 30S ribosomal subunit that is bridged by NusG. Binding of NusG-30S complex to RNAP signals escape from pausing at ATG start codons.  Backtracking of elongation complexes is followed by cleavage of the nascent RNA by Gre factors upstream of a U residue in WT and dNusG cells (Fig. 4D). In turn, cleavage shifts the paused 3' ends 1 to 4 bases upstream relative to -1Y 3' ends in non-cleaving ∆greA ∆greB cells. Cleavage also results in the appearance of the +1T residue in the sequence logo of pauses in WT and dNusG cells.  65  35  10  9  43  38  10  7  23  27  33  3 dNusG  #  509  333  176  51  44  219  195  70  43  102  117  177   shared  % 100  65  35  10  9  43  38  14  9  20  23  35  3  WT  #  361  202  159  38  85  102  136  51  31  88  100  92   single 3' % 100  56  44  11  24  28  38  14  8  24  28  25  4  WT  # 17243 12281 4962 1960 3249  4986  7048  1801 1402 4212 4858  4966   _12 h  % 100  71  29  11  19  29  41  10  8  24  28  29  4 dNusG  # 27449 22952 4497 3461 3100 10429 10459 2634 2120 5391 7051 10249   _12 h  % 100  84  16  13  11  38  38  10  8  20  26  37  5 Differen # 4301  3541  760  491  406  1761  1643  7 k  6  22  29  36   tial_50 i % 100  82  18  11  9  41  38  10 l  8  20  25  38  6 Differen # 28171 23602 4569 3380 4076  9835  10880  8 k  7  23  29  34   tial_12 j % 100  84  16  12  14  35  39  11 l  8  20  25 35 a Number (#) or percent of total (%) of each kind of pause found in each dataset. b Pauses within protein coding sequences (ORF) or intergenic (UTR). c Count of each nucleotide at the +1 position located immediately downstream of the pause. d Count of sequence reads that were ≤15, 16, 17, 18, or ≥19 nt in length. e Pauses in wild-type cells (WT) and NusG-depleted cells (dNusG) identified using a stringent cutoff score ≥50. f Pauses within a 10 bp window were merged into a single pause site and reported in SI appendix, Dataset S2. g Nucleotide at the +1 position and length of sequence reads are reported for the highest score pause at pause sites with multiple 3' ends in Dataset S2. h Pauses in WT and dNusG cells identified using less stringent cut-off score values ≥12. I Differential pause strength identified using stringent cut-off of score values ≥50 in either WT or dNusG cells. j Differential pause strength identified using less stringent cut-off of score values ≥12 in either WT or dNusG cells. k Percent of total (%) of sequence reads that were ≤15, 16, 17, 18, or ≥19 nt in length in WT cells, highlighted in gray. l Percent of total (%) of sequence reads that were ≤15, 16, 17, 18, or ≥19 nt in length in dNusG cells, highlighted in yellow. Table S2. List of the strongest pauses. Less than a single pause site from this Table is predicted to occur per genome according to the negative exponential dependence between the number and the score of pauses (Fig. 1C and SI Appendix, Dataset S3). .45 a Relative position of the pause within an ORF for the sites in coding regions (between 0.01 and 0.99), a negative integer indicates the distance in base pairs to the nearest downstream coding sequence for pauses in a 5' UTR, or a positive integer indicates distance to the nearest upstream coding sequence for pauses in a 3' UTR. b Pause in sense or antisense strand. c Expression in transcripts per kilobase million (TPM). d Sequence surrounding the pause (3' RNA end at pause is capitalized). e Nucleotide at the +1 position of the non-template strand immediately downstream of the pause. f Fraction of long sequence reads (>18 nt in length) originated from backtracked elongation complexes. g Highlighted in yellow are the pauses found only in dNusG cells. All other pauses are shared between WT and dNusG cells.    Relative position of the pause within an ORF (between 0.01 and 0.99). A negative integer indicates the distance to the nearest downstream protein-coding sequence for pauses in a 5' UTR. A positive integer indicates distance to the nearest upstream protein-coding sequence for pauses in a 3' UTR. c The log2(Fold Change) of ratio of pause scores normalized to transcripts per kilobase million (TPM) values in dNusG vs. WT cells. nd, indicates the pauses that were not identified using differential pause strength analysis (SI Appendix, Datasets S5 and S6). d Difference between Log2FC values at the downstream pause (n+1 rows) and the upstream pause (n rows). Dominating positive values in this column indicate that the downstream pauses generally possess a larger fraction of backtracked reads than the upstream pauses among pairs of two adjacent pauses. e Sequences are aligned relative to the 3' end of the paused RNA (marked with a capital letter in bold font). -9G +1T sequence at upstream pauses (n rows) are highlighted in cyan. -10G -1Y +1G sequence at downstream pauses (n+1 rows) are highlighted in yellow. f Fraction of sequence reads that were <16, 16, 17, 18, or >18nt in length. g Fraction of long reads (>18 nt) at downstream pauses (n+1 rows) minus fraction of long reads at upstream pauses (n rows). h nd, not determined by our bioinformatic pipeline. i Sum of the differences between the fraction at each downstream pause and each upstream pause (n+1 row) -(n row) for each read length.