Transcription and translation of the rpsJ , rplN and rRNA operons of the tubercle bacillus

Several species of the genus Mycobacterium are human pathogens, notably the tubercle bacillus (Mycobacterium tuberculosis). The rate of proliferation of a bacterium is reflected in the rate of ribosome synthesis. This report describes a quantitative analysis of the early stages of the synthesis of ribosomes of M. tuberculosis. Specifically, the roles of three large operons, namely: the rrn operon (1.7 microns) encoding rrs (16S rRNA), rrl (23S rRNA) and rrf (5S rRNA); the rpsJ operon (1.93 microns), which encodes 11 ribosomal proteins; and the rplN operon (1.45 microns), which encodes 10 ribosomal proteins. A mathematical framework based on properties of population-average cells was developed to identify the number of transcripts of the rpsJ and rplN operons needed to maintain exponential growth. The values obtained were supported by RNaseq data. The motif 59-gcagac-39 was found close to 59 end of transcripts of mycobacterial rplN operons, suggesting it may form part of the RpsH feedback binding site because the same motif is present in the ribosome within the region of rrs that forms the binding site for RpsH.


INTRODUCTION
The genus Mycobacterium is of interest because several species are pathogenic to man and the diseases they cause are difficult to treat.For example, in humans tuberculosis is caused by Mycobacterium tuberculosis, leprosy by Mycobacterium leprae, and Mycobacterium abscessus causes a lung disease among patients suffering from cystic fibrosis (Ripoll et al., 2009).Furthermore, tuberculosis in cattle is caused by Mycobacterium bovis and Mycobacterium marinum is pathogenic to fish (Tobin & Ramakrishnan, 2008).M. bovis BCG is non-pathogenic and is often used to provide a frame of reference for the pathogens, and so is Mycobacterium smegmatis.It is desirable to understand how mycobacteria proliferate in order to improve the medical treatments available and to counteract the threats of drug resistance.
The foundations for studying the properties of bacterial cell cultures were laid in the period 1958-1976. First, Schaechter et al. (1958) studied the relative masses of DNA, RNA or protein (DNA : RNA : protein) of bacterial cultures grown in different media.They showed that bacterial cells increased in size with increasing growth rate and that the number of ribosomes increased to meet the demand for a faster rate of protein synthesis.They also showed that the notion of an average cell is conceptual because it represents cells of all ages from the newly born (age a50) to cells about to divide (age a51).Because the average cell reflects the entire age distribution it is more accurately termed a populationaverage cell, which we refer to here as a 'cell' for brevity.
Second, a number of studies (Byrne et al., 1964;Miller et al., 1970;Stent, 1964) showed that gene transcripts were translated as they were transcribed (transcription/translation coupling).Third, the exponential growth equation provides the basis for a mathematical approach for bacterial cell growth.Fourth, knowledge of both the ratios DNA : RNA : protein and the size of the genome allows the macromolecular composition of an average cell to be estimated (Cox, 2004).The work of Miller et al. (1970) provided the first view of a large functional operon (Fig. 1) and also revealed that large operons are uncommon.
Ribosome synthesis is essential to cell proliferation.The importance of ribosomes is reflected in the ways that the required genes are organized within the genome; notably, because a high proportion are located within three large operons, namely: the rRNA operon (5167 bp, 1.76 microns) encoding the three components rrs, rrl and rrf, which are 16S rRNA, 23S rRNA and 5S rRNA, respectively; the rpsJ operon (5679 bp, 1.93 microns), which comprises 11 ribosomal protein genes; and the rplN operon (4257 bp, 1.45 microns), which comprises 10 ribosomal protein genes.The study of these operons is facilitated by the wealth of information available for the structure and function of ribosomes and ribosomal components (Bashan & Yonath, 2008;Williamson, 2009).
The structure and expression of the rRNA (rrn) operon of M. tuberculosis has been studied (Kempsell et al., 1992) and compared with rrn operons of other mycobacteria (Gonzalezy-Merchand et al., 1996, 1997;Ji et al., 1994a, b, c).In contrast, no studies of the two mycobacterial operons rpsJ and rplN that together encode 21 ribosomal proteins have been reported.For this reason, we have studied the properties of these two operons with the aim of defining key steps in the proliferation of the tubercle bacillus.
Although the rpsJ and rplN operons of mycobacteria resemble those of Escherichia coli in structure, the ways by which they are regulated may differ because of the large difference in the maximum growth rates of the two species.
Electron micrographs of these operons undergoing transcription and translation were obtained by Miller et al. (1970) after lysis of cells of E. coli grown with a generation time of 1 h (Fig. 1).The number of transcripts reveals the activity of the operon.In principle, this figure provides a reference point for the study of the mycobacterial homologues.The required mathematical framework was developed and used to quantify the expression of the mycobacterial operons.Both the sequence and the protein co-factors needed for function were identified.

METHODS
Culture conditions and RNA isolation.M. tuberculosis H37Rv was grown in Middlebrook 7H9 medium supplemented with 0.2 % w/v glycerol and 10 % Albumin Dextrose Catalase (ADC) supplement in roller bottle culture.RNA was isolated from mid-exponentially growing bacteria as described by Arnvig et al. (2011).RNA was treated with Turbo DNase (Ambion) until DNA free.The quality of RNA was assessed using a Nanodrop spectrophotometer (ND-1000; Labtech) and an Agilent bioanalyser.
RNA sequencing.A single RNA sample was used by vertis Biotechnologie to construct a cDNA library for whole transcriptome analysis.RNA was fragmented with ultrasound (four pulses of 30 s at 4 uC), then treated with Antarctic phosphatase and rephosphorylated with polynucleotide kinase (PNK).Afterwards, the fragmented RNA was poly(A)-tailed using poly(A) polymerase and the Illumina 59TruSeq adaptor was ligated to the 59-phosphate of the RNA.First-strand cDNA synthesis was performed using an oligo(dT)adaptor primer and M-MLV reverse transcriptase.The resulting cDNA was amplified by PCR.The cDNA library was sequenced as single-end reads on an Illumina HiSeq 2000 system.
Read mapping.FastQC (Babraham Institute) was used for quality control of the Illumina produced FASTQ file.Poor-quality read bases were trimmed using the SolexaQA package (Cox et al., 2010); default parameters were used, trimming bases with confidences P.0.05, and removing reads ,25 bases.Good-quality reads were mapped to the reference sequence of M. tuberculosis H37Rv (GenBank/EMBL/DDBJ accession no.AL123456) as single end data using Burrows-Wheeler Aligner (BWA) (Li & Durbin, 2009).Transcriptome coverage, defined as the number of reads mapped per bp of the H37Rv genome, was calculated using BEDTools (Quinlan & Hall, 2010) and was found to be 432-fold.The number of reads mapped to each annotated ORF was calculated using BEDTools ranging from 1 to 19 953.Calculation of pairwise correlation coefficients demonstrated a high degree of reproducibility between this dataset (Spearman r ranging from 0.80 to 0.85; see Fig. S1, available in the online Supplementary Material) and previously published transcriptome analyses of exponentially growing M. tuberculosis by RNaseq (Arnvig et al., 2011;Cortes et al., 2013).
Mathematical equations needed for quantitative analysis of transcription and translation.During exponential growth a cell component x such as RNA or protein will increase with time according to equation (A1) where x is the amount at time t, t 0 is the time at a reference point and m is the specific growth rate (h 21 ).

(A1)
The specific synthesis rate, v x , at which x increases is given by the equalities in equation (A2) (A2) The rate v c-p(i) at which the number n c-p(i) of copies of protein p (i) increases is given in equation (A3).

(A3)
The specific synthesis rate is also the product of the number n R(i) of ribosomes synthesizing p (i) and the rate e aa(i) amino acids h 21 at which amino acids are incorporated into nascent polypeptide chain.Equating the right hand sides of equations (A2) and (A3) leads to equation (A4) (A4) where l aa(i) amino acids is the length of p (i) .Rearranging equation (A4) to make n R(i) the subject leads to equation (A5).

(A5)
A conversion factor n R(i)/tr(i) is needed to relate the number of ribosomes to a transcript of ORF (i) encoding p (i) [see equation ( As a consequence of transcription/translation coupling it is thought that the moving ribosome controls the rate of transcription by preventing RNA polymerase (RNAP) from spontaneously backtracking (Burmann et al., 2010;Proshkin et al., 2010).Consequently, both transcription and translation are determined by codon usage and the availability of nutrients.We consider, as a first approximation, that the peptide chain elongation rate, e aa(av) , derived from the ratios DNA : RNA : protein is representative of the rate, e aa(i)., for ORF (i) .

RESULTS AND DISCUSSION
Inspection of the relevant mycobacterial genomic data reveals the presence of three large operons that encode proteins: one encoding components of ATP synthase and two operons, namely, rpsJ and rplN, that encode ribosomal proteins.The rpsJ and rplN operons of E. coli have been studied (Mattheakis et al., 1989;Stelzl et al., 2003).In general, the rpsJ and rplN operons are found among a cluster of ribosomal proteins (Coenye & Vandamme, 2005) modified by deletions and by the introduction of non-ribosomal protein genes through horizontal gene transfer.There are no reports of detailed studies of these operons in mycobacteria.
The principal features of the above-mentioned operons are evident from Fig. 1: namely, the length of the DNA segment, the number of transcripts and the number of ribosomes per transcript.The number of transcripts per operon would be expected to depend on growth rate.Mycobacteria grow slowly compared with E. coli and so we would expect fewer transcripts per operon.

Number of transcripts of a gene present per cell
Number of transcripts calculated from the growth equation.The mathematical treatment (see Methods) is based on the exponential growth equation that defines the relation between the specific growth rate, m;, the number of copies n c-p(i) of a protein p (i) of length l aa(i) amino acids encoded by ORF (i) and the number of transcripts n tr(i) needed to maintain the rate of synthesis that is required: see equation (1) where n R(i)/tr(i) is the number of ribosomes per transcript [see also equation (A6) of Methods]. (1) The number of transcripts of a gene was obtained by evaluating the parameters of the right hand side of equation ( 1), which is the required form of the growth equation.Three assumptions were made in applying this equation.First, it was assumed that, except for rplL, ribosomal proteins are present as one copy per ribosome.Second, ribosomal proteins were assumed to be located mainly in ribosomes; in other words, the number of copies of a ribosomal protein was assumed to be approximately equal to the product of the number of ribosomes and the number of copies of that protein per ribosome.Third, it was assumed that the peptide chain elongation rate was approximately equal to the polypeptide chain elongation rate of the protein fraction of the population-average cell.However, the rate of synthesis of individual proteins may vary according to codon usage.
Evaluation of the number of transcripts per gene from RNaseq data.The first step was to relate the numbers of partially sequenced gene transcripts revealed by the RNaseq platform to the numbers of gene transcripts of ORFs encoding both RNA and protein components of the ribosome.The principal steps in the RNaseq procedure starting from a sample (n cells ) of a bacterial culture and ending with the number [n p-tr(i) ] of partially sequenced transcripts assigned to an ORF [ORF (i) ] are shown in Methods.The crucial factor is that only a section of 25-70 terminal nucleotides of each mRNA fragment is measured.The limitation of this approach is shown (Fig. 2) by the data for each of the three rRNA components rrs, rrl and rrf.The profiles found for the number of gene transcripts at each point from the 59 to the 39 end of the gene are shown for the mature species in Fig. 2(a) and the profiles revealed by the RNaseq platform are shown in Fig. 2(b).If the sequences of the RNA fragments produced by the RNaseq procedure had been complete the two profiles would have been identical within experimental error.The partial sequencing step generates the question 'How are the profiles shown in Fig. 2(a) and (b) related each to the other?'.
In response, we sought to establish a frame of reference that forms the basis for a quantitative approach.The information required is the macromolecular composition of the cells under scrutiny that can be derived from the ratios DNA : RNA : protein.The data required are not available for M. tuberculosis H37Rv, but are available for the very closely related family member M. bovis Pasteur (Beste et al., 2005), which is frequently used as a model for the tubercle bacillus.These data are shown in Table S1.
The first step in our analysis was to explore the relation between n reads(i) the number (millions) of 'reads' (partially sequenced transcripts) and l tr(i) nucleotides the length of the entire transcript.The subscript (i) denotes a particular ORF or operon.The plot of log n reads(i) versus log l tr(i) was found to be a linear plot with a slope of 1.48, which led to the empirical result shown in equation (2).The observed numbers of reads were found to agree with the calculated values to within 10 % or better (Table 1).The plot of the number of reads (millions) versus the length of the ORF (nucleotides) raised to the power 1.48 was found to be linear (Fig. 2c). (2) The slope of equation ( 2) is equal to the product of three components: namely, the number n c-p(i) of copies of the product p (i) of ORF (i) , the number n cells of cells providing the RNA sample, and a constant k that is characteristic of the RNaseq platform [see equation ( 3 (3) We suppose that there are 3730 ribosomes per cell and n c-p(i) 53730 copies for each of the three rRNA components.Hence, equation ( 2) may be written as equation ( 4), after dividing each side by 3730.(5) These equations may now be applied to the data available for both the rpsJ and rplN operons.
The number of transcripts of ORF (i) can be related to the number n R(i) of ribosomes synthesizing protein p (i) .
Transcription/translation coupling requires that each transcript is protected from degradation by nuclease action by the protective action of the bound ribosomes and their associated elongation factors when bound to mRNA plus space that is insufficient for degradosomes to bind.
Provisionally, we assign one ribosome per approximately 100 nt, which is equivalent to the diameter of a ribosome (22 nm equivalent to 65 nt) flanked by 17 or so nucleotides on either side.
This conversion factor of one ribosome per 100 nt also allows us to obtain the number of transcripts per cell directly from the equation for exponential growth (see Methods); for example, see equation ( 1) above.Please note that equations derived in Methods are referred to as (A1) and so on.The application of equation (A5) requires the data for the macromolecular properties of cells listed in Table S1.The values obtained using equation (A5) for the numbers of transcripts per cell provide an independent test of the accuracy of the RNaseq data.
Operons rpsJ and rplN M. tuberculosis possesses three large operons with mRNA transcripts that are comparable in length to precursor-rRNA.One operon encodes components of ATP synthase and two, the rpsJ and rplN operons, encode ribosomal proteins (Fig. S2).Properties of the 11 proteins of the rpsJ operon are shown in Table S2.The component proteins of the rplN operon are shown in Table S3.Our knowledge of rpsJ and rplN operons was gained mainly from studies of E. .coli and other fast-growing bacteria.The rpsJ and rplN operons of M. tuberculosis and E. coli are homologous in structure, and we infer that they are also homologous in function.
The coding region of the rpsJ operon extends over 5680 bp and encodes 1803 aa.These data allow an estimate of the number of rpsJ transcripts per cell to be calculated by means of equation (A6).The macromolecular composition of the cells (see Table S1) allows the number of ribosomes actively translating the rpsJ operon at any instant to be evaluated.The same considerations apply to the rplN operon, which extends over 4500 bp and encodes 1349 aa (see Table S3).The numbers of transcripts per cell calculated for the two operons from RNaseq data using equation (A6) and the numbers of ribosomes estimated to be translating each of the operons using equation (A5) are presented in Table 2.The two sets of data agree to within 25 % or better.The data derived from the use of the growth equation are comparative.These data provide the basis for the schematic views of transcription (Fig. 3a) and coupled transcription/translation (Fig. 3b) of the rpsJ and rplN operons.
Protein synthesis may be viewed from several different perspectives, for example: (i) the rate amino acids h 21 of peptide chain elongation; (ii) the time taken to synthesize protein p i ( ) ( ); and (iii) the rate (see Methods) at which completed copies of p i ( ) are produced (the 'run off' rate).It is estimated (based on amino acids h 21 ) that a ribosome takes 28 min to translate the rpsJ operon and that one operon is completed every 32 s.Similarly, 30 min are needed for a ribosome to translate the rplN operon and one operon is 'run off' every 32s.These properties are evident from the schematic views of transcription/translation of these operons shown in Fig. 3. Allowing for the slower growth rate of the mycobacterium, Fig. 3(a, b) are consistent with the electron micrographs (Fig. 1) of active operons visualized by Miller et al. (1970).The electron micrograph obtained for E. coli shows eight nascent transcripts compared with the estimate of approximately a single transcript for the tubercle bacillus.The numbers of ribosomes per rplN operon was calculated by means of equation ( 4).The values obtained were 45 ribosomes for M. bovis BCG (Fig. 3b) and 218 for E. coli, which is in accord with the historic electron micrograph shown in Fig. 1.

Transcriptional control elements of the rpsJ and rplN operons
The rpsJ, rplN and rrn operons of selected mycobacteria share features of transcriptional control that include 235 and 210 promoter elements, transcription start sites and stringent elements (Table 3), which leads us to infer that the transcriptional control elements of the three operons are likely to be of comparable strengths.The ribosome binding motifs of transcripts of the rpsJ and rplN operons were found to differ (Table 3).Two strong motifs were found to be present for transcripts of rpsJ and only one was found in transcripts of rplN.
The wasteful production of an excess of the proteins encoded by the rpsJ and rplN operons is prevented by feedback control mechanisms.Inspection revealed no insight into how the rpsJ operon is controlled.However, a possible mechanism for the control of the rplN operon was discerned (Table 3).The key observation is that the motif 59-gcagac-39 present near the 59 ends of transcripts of the operon match the identical sequence in mycobacterial 16S rRNA which, by analogy with E. coli 16S rRNA, is believed to form part of the binding site for RpsH in the 30S ribosomal subunit.Hence, we propose that when RpsH is in excess it has the capacity to bind to the initiating RNAP complex to prevent transcription of the rplN operon.Our scrutiny of a wide range of homologous mycobacterial sequences led us to propose a switch mechanism, which is outlined in Fig. 4. Direct experimental support for our proposal is needed.
Inspection of the genomic nucleotide sequence showed that the rplN operon of M. abscessus ATCC 19977 was found to be anomalous because the genes encoding ribosomal proteins were interrupted by five genes encoding nonribosomal proteins.The inserted ORFs are found between rpsN1 and rpsH.It is not clear how this location affects the role of RpsH in the control of the expression of the operon.The effects of the inserted genes need further study to establish the mechanisms involved in the transcription of the ribosomal protein genes and to ascertain the extent to which the rates of synthesis of the ribosomal proteins are affected.

Role of NusG in transcription/translation coupling
Nus factors (N utilizing substances) NusA, NusB, NusE (RpsJ) and NusG were first identified in E. coli infected with bacteriophage lambda.These host factors were found to play essential roles in the expression of protein N of the bacteriophage and they are also required for the expression of each of the three operons under scrutiny (Burmann et al., 2010).For example, NusG is a factor that is essential for transcription/translation coupling.This factor has three separate domains and the functions of two of them are known.The NusG N-terminal domain (NusG-NTD) has the capacity to bind to RNAP, whereas the C-terminal domain (NusG-CTD) can combine with the NusE (RpsJ) component of ribosomes.These two functions of NusG  The motif 59-gcagac-39 (highlighted) forms part of the 16S rRNA (rrs) binding site for RpsH.Hence, we infer that the presence of this motif near to the 59 end of transcripts of the rplN operon is also part of a potential binding site for RpsH.We propose that transcription proceeds freely when concentrations of unbound RpsH are low (i) and that increasing concentrations of free RpsH may lead to recruitment of RpsH by the RNAP complex leading to termination of transcription (ii).The proposed switch is common to Mycobacterium since similar structures were found to be present at the 59 ends of the rplN operons of different mycobacteria (see enable transcription to be coupled with translation.NusG-CTD can also bind to Rho to terminate transcription (Burmann et al., 2010;McGary & Nudler, 2013;Proshkin et al., 2010).The rate of transcription depends on the rate of translation because the ribosome moves through a progressive, unilateral, translocation (Fig. 5).In contrast, in the absence of NusG, RNAP is capable of backtracking within the transcription bubble.When transcription and translation are coupled, the movement of the ribosome modulates the rate of transcription by preventing backtracking and thereby increasing the efficiency of transcription (Proshkin et al., 2010).
Two factors, NusB and NusE (RpsJ) carry out an essential role in the transcription of rrn operons by preventing premature termination of the transcript.The crystal structure of NusB from M. tuberculosis is known (Gopal et al., 2000a) and its interaction of with NusE (RpsJ) has been investigated (Gopal et al., 2000b)

Concluding remarks
The electron micrographs of Miller et al. (1970) showed the presence of RNA transcripts longer than a micron transcribed from a section of DNA corresponding in size to the rplN operon, and also showed that transcription and translation were coupled.As described above, NusG is the factor required for this coupling to take place.We have presented a quantitative view of factors affecting the synthesis of two operons, rplN and rpsJ, encoding a total of 21 ribosomal proteins.Quantitative analysis was based on two approaches, namely: the mathematical framework developed from the equation for exponential growth and RNaseq measurements.Ribosomal proteins RplD, RpsH and RpsJ were found to participate in regulating the expressions of the rpsJ, rplN and the rRNA operons, respectively.The unusual structure of the rplN operon of M. abscessus ATCC 19977 requires further investigation.Transcription start sites of the rpsJ and rplN operons of M. tuberculosis H37Rv were first identified by an application of the RNaseq procedures (Cortes et al., 2013) and homologous sequences were then found in four other representative mycobacterial species, namely: the pathogens M. marinum and M. leprae Br4923, and the opportunistic pathogens M. smegmatis MC2 155 (Brown-Elliott & Wallace, 2002) and M. abscessus ATCC 19977.The comparison aids the identification of functional motifs based on the notion that sequences diverge during evolution unless they are constrained by functional requirements.The results contribute to our understanding of the early stages of ribosome synthesis during mycobacterial growth.

Fig. 1 .
Fig. 1.Historic electron micrograph of the transcription/translation of an operon of Escherichia coli (m50.69 h "1 ).The section of DNA being transcribed corresponds to the rplN operon in length.Image reproduced from Miller & Hamkalo (1972) with permission, licence number 3525270508092.
term n reads i p i ( )/ ( ) is the number (millions) of reads of ORF (i) per copy of the gene product.The rRNA species provide examples of the general case.The number n tr i ( ) of transcripts per ORF is given by the ratio n the total number of reads divided by the number of reads per copy of the product of ORF (i) (see equation 5).

Fig. 2 .
Fig. 2. Comparison of the profiles of the number of copies of rrs, rrl and rrf components of rRNA versus 'gene location' with the corresponding profiles of reads identified by the RNaseq method.(a) Profiles of rRNA components.(b) Profiles of the corresponding numbers of reads.(c) The relation between the numbers of reads (millions) with the length l tr(i) of the transcript.As discussed in the text, the plot is described by the equation n l reads i tr i ( ) ( ) .= • 169 1 48

Fig. 4 .
Fig. 4. Possible effects of RpsH binding on transcription of rplN operons.The motif 59-gcagac-39 (highlighted) forms part of the 16S rRNA (rrs) binding site for RpsH.Hence, we infer that the presence of this motif near to the 59 end of transcripts of the rplN operon is also part of a potential binding site for RpsH.We propose that transcription proceeds freely when concentrations of unbound RpsH are low (i) and that increasing concentrations of free RpsH may lead to recruitment of RpsH by the RNAP complex leading to termination of transcription (ii).The proposed switch is common to Mycobacterium since similar structures were found to be present at the 59 ends of the rplN operons of different mycobacteria (see Table3).

Fig. 5 .
Fig. 5.The roles of NusG in transcription/translation coupling.(a) Composition of an active RNAP complex.RNAP is shown in dark grey, DNA in blue and nascent RNA in red.The ribosome is shown in green with the nascent polypeptide chain in light grey; the bulge in the small subunit denotes the location of NusE (RpsJ).NusG is shown in orange: its shape denotes two functional sections.The larger section denotes the N-terminal domain, which binds to RNAP.The smaller section denotes the C-terminal domain, which interacts with NusE in situ.Rho is shown in purple.(b) After translation is completed NusG remains bound to RNAP and may also bind to Rho through the C-terminal domain leading to termination of transcription.

Table 1 .
Comparisons of the numbers of reads observed and calculated for rRNA species

Table 2 .
Comparison of data based on RNaseq with data derived from macromolecular composition *The numbers of 'active ribosomes' are inferred values derived from RNaseq data using the approximation that there is one ribosome per 100 nt of mRNA (see text).DDenotes comparative values calculated from data for the macromolecular composition of cells (see TableS1) by using equation (A5).IP: 54.70.40.11On: Wed, 28 Nov 2018 17:27:45

Table 3 .
Motifs of promoters and Shine-Dalgarno sequences of the operons studied The motif shown in italics is a potential binding site for RpsH.DThe hash sign denotes the 12 base motif (59-gagaactcaata-39) that forms part of the RNase III binding site.dThe observed motifs are shown in bold and are underlined.The termination codon is shown in bold upper-case letters.