A chemical kinetic basis for measuring translation initiation and elongation rates from ribosome profiling data

Analysis methods based on simulations and optimization have been previously developed to estimate relative translation rates from next-generation sequencing data. Translation involves molecules and chemical reactions, hence bioinformatics methods consistent with the laws of chemistry and physics are more likely to produce accurate results. Here, we derive simple equations based on chemical kinetic principles to measure the translation-initiation rate, transcriptome-wide elongation rate, and individual codon translation rates from ribosome profiling experiments. Our methods reproduce the known rates from ribosome profiles generated from detailed simulations of translation. By applying our methods to data from S. cerevisiae and mouse embryonic stem cells, we find that the extracted rates reproduce expected correlations with various molecular properties, and we also find that mouse embryonic stem cells have a global translation speed of 5.2 AA/s, in agreement with previous reports that used other approaches. Our analysis further reveals that a codon can exhibit up to 26-fold variability in its translation rate depending upon its context within a transcript. This broad distribution means that the average translation rate of a codon is not representative of the rate at which most instances of that codon are translated, and it suggests that translational regulation might be used by cells to a greater degree than previously thought.

Introduction Translation-associated rates influence in vivo protein abundance, structure and function. It is therefore crucial to be able to accurately measure these rates. The ribosome synthesizes a protein in three steps namely initiation, elongation, and termination [1][2][3]. Translation is initiated at the start codon of the mRNA transcript by the ribosomal subunits that form a stable translation-initiation complex [4,5]. During the elongation step, the ribosome moves along the mRNA transcript decoding individual codons and adding residues to the growing nascent chain. Translation is terminated when the stop codon is in the ribosome's A-site resulting in release of the synthesized protein. The initiation and elongation phases of translation contribute to protein levels inside a cell; indeed, alteration of their rates can cause protein abundance to vary by up to five orders of magnitude [6][7][8], and alter protein structure and function [9]. Termination does not influence the cellular concentration of proteins as it is not a rate limiting step [10]. Therefore, knowledge of translation initiation and codon translation rates are important to understand the regulation of gene expression.
Significant efforts have been made to extract these rates from data generated from ribosome profiling experiments [11][12][13][14], a technique that measures the relative ribosome density across transcripts [15]. In a number of methods, the actual rates are not measured but instead a ratio of rates, or other relevant quantities have been reported [16][17][18][19][20]. Estimates of translation-initiation and codon translation rates have helped identify the molecular determinants of these rates. For example, estimated initiation rates correlate with the stability of mRNA structure near the start codon and in the 5' untranslated region [10,14,16,21] indicating mRNA structure can influence initiation. Similarly, codon translation rates have been found to positively correlate with their cognate tRNA abundance [16,22], and anti-correlate with the presence of downstream mRNA secondary structure [23,24] and positively charged nascent-chain residues inside the ribosome exit tunnel [20,25]. Some of these findings are controversial as different analysis methods and data have led to contradictory results concerning the role of tRNA concentration [16,19,26,27], positively charged residues [17,20] and coding sequence (CDS) length [10,12,14,16]. Moreover, the accuracy of these methods is unknown because orthogonal, high-throughput experimental measurements of translation rates do not exist.
In the absence of data that could differentiate the accuracy of different methods, we argue that the methods most likely to be accurate will be those that are constrained by and account for the chemistry and physics of the translating system. Here, we present three such methods, derived from chemical kinetic principles that permit the extraction of translation-initiation rates, transcriptome-wide average elongation rate and individual codon translation rates from Next-Generation Sequencing (NGS) data. These methods are verified against artificial ribosome profiling data generated from detailed simulations of the translation process where the translation rates are known a priori. We then apply these methods to in vivo ribosome profiling data and extract the transcriptome-wide translation-initiation and codon translation rates in S. cerevisiae and transcriptome-wide average elongation rate in mouse stem cells. We show that the translation rate parameters correlate with factors known to modulate these rates, and assign absolute numbers to these rates.

Theory
Measuring translation-initiation rates. To derive an analytical expression relating the translation-initiation rate to the experimental observable of ribosome density along a transcript we assume steady state conditions, meaning that the ribosome flux at each codon position is equal to the rate at which fully synthesized protein molecules are released. Thus, for any transcript i, we have: where F(1, i) is equal to the flux of ribosomes initiating translation at the start codon, F(j, i) is the flux of ribosomes moving from codon position j to (j + 1) on copies of transcript i, N c (i) is the number of codons in the transcript, and F(N c (i), i) is the flux of ribosomes terminating translation from the stop codon. In our modeling, the position of a ribosome on a transcript is defined by its A-site location.
Let ' be the number of codon positions that a ribosome covers on a transcript. A ribosome typically covers 10 codon positions and hence ℓ = 10 with the A-site of the ribosome located at the sixth codon position in this segment [15]. A ribosome initiating translation will, therefore, cover codon positions 1 through 6 in the coding sequence, with the other portion covering part of the transcript's 5 0 UTR. It follows then that a ribosome translating any of the first ' þ 1 codons will prevent a new ribosome from initiating translation by physically blocking the first six codon positions of the transcript. Thus, the ribosome flux at the start codon is where α(i) is the initiation rate in the absence of any ribosome at the first ' þ 1 codon positions, and ρ(k, i) is the average ribosome occupancy at the k th codon position of the transcript [28]. Note well, Eq (2) assumes that the ρ(k, i)s for the first ten codon positions are independent of each other, and is therefore a first-order approximation of an exact solution and thus Eq (2) ignores higher order terms of α(i) [28]. This assumption may introduce inaccuracies in the calculation of F(1, i) for large α(i)s. Polysome profiling experiments [29] and computer simulations of protein synthesis [21], however, suggest that most S. cerevisiae transcripts are translated in the regime of low average ribosome density. As a consequence, ignoring higher order terms of α(i) in Eq (2) provides a reasonable estimate of F (1, i). Under a mean-field assumption [30], the ribosome flux at the j th codon position for copies of transcript i can be written as In Eq (3), ω(j, i) is the intrinsic translation rate of the j th codon position in the transcript (i.e., the rate that would be observed in the absence of any other ribosomes), and f ðj; j þ '; iÞ is the conditional probability that given that a ribosome is at the j th codon position there is no ribosome at the ðj þ 'Þ th codon position.
We used Eqs (1), (2) and (3) to derive (see S1 Text) the following expression for translationinitiation rate where hrðiÞi ¼ is the average ribosome density per codon on the i th transcript, and hT(i)i is the average time a ribosome takes to synthesize a full-length protein from the transcript. Eq (4) can measure the translation-initiation rate provided hT(i)i, hρ(i)i and the ρ(j, i)s are known. We calculated hρ(i)i and ρ(j, i) from ribosome profiling, RNA-Seq and polysome profiling data, and hT(i)i using a scaling relationship [31] between protein synthesis time and CDS length. Full details for determining these parameters are provided in S1 Text.
Measuring the average translation elongation rate across a transcriptome. In ribosome run-off experiments performed on eukaryotic cells translation-initiation is inhibited at time point t = 0 using the drug harringtonine [26]. This causes ribosomes to accumulate at start codons and thereby block new translation-initiation events. At time t = Δt, the cells are treated with cycloheximide, which stops translation elongation by the ribosomes. The experiment is repeated at different Δt values. A meta-gene analysis of ribosome profiles [32] is then obtained from each run-off experiment and used to quantify the depletion of ribosome density across the transcripts as a function of time, allowing the transcriptome-wide average elongation rate to be measured [26].
To model this non-steady state data, we assume that ribosome movement along transcripts is a form of mass flow along one dimension-a reasonable assumption given that ribosomes can move only in one direction along a transcript. In this case, a natural way to describe this phenomenon is the continuity equation which equates the decrease in density of a substance (ρ, the ribosome occupancy in our case) over time, t, with the outward flux J of that substance from a particular region of space. The equality with zero in Eq (5) is a manifestation of conservation of mass. In the ribosome run-off experiment this corresponds to a depletion of ribosome density along a segment of a transcript as ribosomes move out of that region with rate J, and are not replenished by new ribosomes since initiation was halted at time t = 0. According to the definition of divergence, r � J is the rate at which ρ exits from a given closed space [33]. For a system composed of L codons along a transcript that are being translated by ribosomes, r � J ¼ J L (see S1 Text, Eq. (S14)). Substituting this value of r � J into Eq (5) and then integrating over t from 0 to Δt yields rðt¼Dt; LÞ rðt¼0; LÞ where ρ(t = Δt, L) is the average ribosome density of the transcriptome within the first L codon positions of a sample at run-off time Δt.  (6) is calculated using the ribosome run-off data. Explicit details of this calculation are provided in the S1 Text.
We inserted the relative average ribosome density ρ(t = Δt, L)/ρ(t = 0, L) obtained from Eq (S16) into Eq (6) and plotted this against Δt. By fitting this data to a straight line we determined τ(L), the time at which ρ(t = Δt, L)/ρ(t = 0, L) equals zero. τ(L) is, therefore, the time at which the last translating ribosomes have crossed the L th codon position, on average. We calculated τ(L) in this way for different L values up to the minimum L value at which ribosome density depletion no longer occurs even at the longest run off time in the experiment. The average transcriptome-wide elongation rate hωi is therefore equal to Measuring individual codon translation rates. To derive a mathematical expression for extracting codon translation rates from ribosome profiling data we assumed steady state conditions in which the flux of ribosomes at each codon position is equal to the rate of protein synthesis In Eq (8), N ribo j;i and τ(j, i) are, respectively, the steady-state number of ribosomes and the average translation time of the j th codon position within copies of transcript i in a given experimental sample. The mean total time of synthesis hT(i)i of transcript i is, by definition, equal to Solving Eqs (8) and (9) for τ(j, i) (see derivation in S1 Text) yields As is the convention in the field [34], we assume that ribosome profiling reads at the j th codon position of transcript i, c(j, i), are directly proportional to N ribo ij . This relationship can be expressed as where a j,i is a constant of proportionality. a j,i values have not been experimentally measured, but they are commonly assumed to be constant for all codon positions in a single experiment [34]. That is, a j,i = a i for all i and j. Using Eq (11) with a j,i = a i in Eq (10) yields Eq (12) indicates that we can determine the individual codon translation rates from ribosome profiling reads provided we know the average total synthesis time of the transcript. Eq (12) can be connected to the expression for normalized ribosome density, derived in the SI of Ref. [16], where tðj;iÞ tðiÞ N c i ð Þ is the normalized ribosome density and is expressed as a function of c(j, i)s. Eq (12) is also related to a metric used in the simulations of Ref. [14] to estimate the codon translation rates. It is important to note that τ(j, i) is the actual codon translation time, which includes the time delay caused by ribosome-ribosome interactions and is distinct from the intrinsic translation rate of a codon ω(j, i). τ(j, i) is equal to the inverse of oðj; iÞf ðj; j þ '; iÞ [35].

Application
In silico validation of our methods. As a first step to test the accuracy of the measured translation rates from Eqs (4), (7) and (12), we applied them to artificial S. cerevisiae ribosome profiles generated by kinetic Monte Carlo [36] and Gillespie simulations [37] in which all of the underlying rates are known (see Methods). If our analysis methods are accurate then a necessary condition is that they reproduce these rates from the simulated profiles. We applied Eq (4) to the simulated, steady-state ribosome profiling data and find that it quantitatively reproduces the initiation rates used in the simulations (Fig 1A, slope = 0.97, R 2 = 0.84 and pvalue < 10 −60 ). We applied Eq (7) to the non-steady state ribosome run off profiles of S. cerevisiae (Fig 2A and 2B) and find that it accurately measures the transcriptome-wide average translation rate. Specifically, our method measures the transcriptome-wide average elongation rate at 3.8 AA/s against the real average elongation rate of 4.2 AA/s. Note well, that as predicted by Eq (6), we also observe a linear decrease in rðt¼Dt;LÞ rðt¼0;LÞ as a function of time (S1A Fig). Finally, we applied Eq (12) to the steady-state ribosome profiles and find that the individual codon translation times are accurately measured by our method (Fig 3, median R 2 = 0.96 and median slope = 1.00). Thus, the analysis methods we have created can in principle accurately capture the translation rate parameters.
There are several points worth noting concerning these tests. First, the rates used in the simulation model are realistic, having been taken from literature values [10,38]. Second, the depth of coverage in the simulated ribosome profiles is in the same range as experiments, e.g., having 26 million reads arising from 1,388 different coding sequences [16]. Third, Eqs (4) and (12) require knowledge of the average synthesis time of a protein, which is experimentally difficult to measure. Therefore, in the above analyses we used the approximation that the average synthesis time of a protein is proportional to the number of codons in its transcript, multiplied by the transcriptome-wide average codon translation time (Eq. (S10)) [26,31]. However, when we Chemical kinetic methods to measure translation rates from ribosome profiling data use the actual synthesis time of each transcript we achieve even better agreement between the estimated and true translation initiation rates (Fig 1B, slope = 0.99, R 2 = 0.95 and p-value < 10 −60 ). Similarly, when we increase our read coverage from 7.1 million to 35.5 billion reads and use the exact synthesis time of a protein, R 2 between the estimated and true codon translation rates goes to > 0.99. Thus, our model is reasonably accurate when approximate protein synthesis times are used (Eq. (S10)) and the coverage is similar to typical experiments, and highly accurate when the exact synthesis time is used and coverage is high.
Ribosome profiling data tend to exhibit a higher ribosome density near the start codon as compared to the rest of the transcript [16]. This feature might bias the in vivo measurement of initiation rate using our method, as Eq (4) is a function of the ribosome density of the first 10 codon positions. To assess the effect of this feature on the accuracy of our method we created in silico ribosome profiles by running synthesis simulations in which the translation rates of the first 100 codons were 50% slower for all S. cerevisiae 1,388 transcripts. This artificial decrease in codon translation rates introduced more ribosome density near the start codon. Using these in silico ribosome profiles, we find that Eq (4) still recapitulates the initiation rates with a similar level of accuracy as before (S2 Fig, R 2 � 0.85). Thus, higher ribosome density near the start codon does not substantially affect the accuracy of our method. Our simulations do not account for other possible biases associated with ribosome profiling experiments, including sequencing amplification biases [39]. Thus, the accuracy of our model as determined from these in silico ribosome profiles might represent an upper bound to the accuracy on actual in vivo profiles with similar sequencing depth.
Measuring the initiation rate from experimental data. We next applied Eq (4) to in vivo ribosome profiling and RNA-Seq data from S. cerevisiae reported in Ref. [16]. Calculation of the translation-initiation rate for a transcript requires knowledge of the average time a ribosome takes to synthesize the protein (hT(i)i) and the average occupancies of the first ten codon positions (ρ(j, i)). The average protein synthesis time was calculated using the scaling relationship (Eq. (S10)) where the transcriptome-wide average codon translation time was set to 200 ms as reported in the literature [40,41] (see S1 Text). ρ(j, i)s were calculated from Eq. (S9) using the ribosome profiling and RNA-Seq data from Ref. [16] and polysome profiling data from Ref. [42]. We then inserted these arguments into Eq (4) to calculate the translation-initiation rates for 1,287 S. cerevisiae transcripts that meet the filtering criteria (see Methods) and for which polysome profiling data is available. We find the translation-initiation rates vary from as low as 5.8 × 10 −2 s −1 (5 th percentile) to as high as 0.24 s −1 (95 th percentile) with the most probable rate being 0.1 s −1 (Fig 1C and S1 File). These translation-initiation rates are The true codon translation times in the simulations are plotted as green boxes, blue and black data points are the translation times measured using Eq (12). Blue data points were calculated using the average protein synthesis time measured from the simulations and relative ribosome density calculated using a large number of in silico ribosome profiling reads. Black data points were calculated using the average protein synthesis time estimated from the scaling relationship (Eq. (S10)) and the relative ribosome density calculated from the in silico reads which were equal to the reads aligned to the same transcript in the experiment [16]. Chemical kinetic methods to measure translation rates from ribosome profiling data significantly lower than the average elongation rate in S. cerevisiae which is consistent with previous studies indicating that initiation is the rate limiting step in translation [21]. The distribution of translation-initiation rates measured using Eq (4) is very similar to the distribution reported in Ref. [10], which was obtained using polysome profiling data.
To test the reproducibility of these results we calculated in vivo translation-initiation rates using the ribosome profiling and RNA-seq data measured by Nissley et al. [43] and polysome profiling data in Refs. [29,42]. These in vivo initiation rates are reported in the S1 File. We have found a statistically significant correlation between the initiation rates measured from the Weinberg et al. [16] and Nissley et al. [43] data sets (S3 Fig). The Pearson correlation between them was 0.53 (P = 10 −46 ) and 0.31 (P = 4 × 10 −15 ) when polysome profiling data of Mackay et al. [42] and Arava et al. [29] were used, respectively. We have also compared these in vivo initiation rates with the ones measured in the study of Dao Duc and Song [14] and found a statistically significant correlation between them with Pearson r varying from 0.51 to 0.80 (P < 10 −24 ), depending upon the data sets used (S4 Fig).
Reproducing known correlations with initiation rates. Next, we tested whether our measurements reproduce previously reported correlations between initiation rates and CDS length, sequence context upstream and around the start codon, folding energy near the 5 0 cap of mRNA molecule, and protein copy number. Initiation rates can depend on CDS length because the shorter a transcript, the more probable the termini will be in close proximity, allowing more efficient re-initiation of terminating ribosomes [44]. We used the initiation rates extracted from ribosome profiling and RNA-Seq data reported in Ref. [16], and polysome profiling data reported in Ref. [42] and find a moderate but statistically significant correlation (Pearson r = 0.51, p-value = 4 × 10 −59 ) between the translation-initiation rate and the inverse of CDS length (Fig 4A), as has been observed previously [10,14,16].
Folded mRNA structure near the 5 0 cap can disrupt the binding of initiation factor eIF4F and the scanning of 40S ribosomes in eukaryotes, causing a decrease in a gene's translationinitiation rate [14,16,45,46]. We calculated the mRNA folding energy near the 5 0 mRNA cap (see Methods) and find a statistically significant correlation between them and our translationinitiation rates (Fig 4B, Pearson r = 0.30 and p-value = 2 × 10 −9 ).
Sequence-based features also determine the rate of translation-initiation in eukaryotes. For example, an upstream open reading frame can potentially interfere with translation initiation at the canonical start codon by initiating translation prematurely resulting in a different protein product [47,48]. We tested whether upstream open reading frames affect initiation rates by comparing the median initiation rate in transcripts with at least one upstream AUG site against transcripts that do not contain any upstream AUG. We find that the transcripts with at least one upstream AUG codon have a median initiation rate that is 15% slower (0.095 s −1 versus 0.112 s −1 , Mann-Whitney U test, p-value = 0.006). Weinberg et al. [16] also demonstrated this effect with their measure of initiation efficiency. Additionally, the 12-nt Kozak sequence [49] is enriched in highly expressed genes and mutations in the Kozak sequence have been shown to drastically effect protein abundance levels [50]. We find that initiation rates are 16% faster with a median value of 0.116 s −1 in transcripts with Kozak-like sequences around the start codon as compared to those that do not contain these mRNA sequence motifs (Mann-Whitney U test, p-value = 0.005). Hence, the presence of Kozak or Kozak-like sequences around the start codon facilitate translation initiation. This is consistent with results of Pop et al. [13] who have shown a strong correlation between the presence of the Kozak sequence and translation efficiency.
Initiation is typically the rate limiting step of translation [21]. Therefore, a faster initiation rate should increase the rate of protein synthesis and thus the steady-state level of proteins in a cell. Indeed, we find a statistically significant correlation between the rate of translation-initiation and protein copy number in a cell (Fig 4C, Pearson r = 0.29 and p-value = 1 × 10 −5 ). Next, we tested whether the missing variation in Fig 4C is explained by the variation in mRNA copy number. We did this by measuring the correlation between the protein copy number and the multiplicative product of a transcript's copy number and its initiation rate. We find a stronger correlation between this quantity and protein copy number (Fig 4D, Pearson r = 0.70 and p-value = 5 × 10 −33 ). These results suggest, as has been found previously [21,51], that the translation-initiation rate is a major determinant of protein abundance inside a cell.
We further measured the correlation between the mRNA copy number and initiation rates and found a moderate level of correlation between them (Fig 4E, Pearson r = 0.35 and pvalue = 1 × 10 −7 ). This suggests that higher copy number transcripts have a faster translationinitiation rate, consistent with previous studies [52]. We also find statistically significant correlations between the aforementioned quantities when we calculated the translation-initiation rate from other published data, using all possible combinations of ribosome profiling, RNA-Seq [16,43] and polysome profiling datasets [29,42] (S5-S7 Figs, S1 and S2 Tables).
In summary, we find that the initiation rates we measured correlate with factors that have been established to influence translation speed. This suggests our initiation rates are reasonable.
Measurement of the average codon translation rate in a cell. Next, we applied Eq (7) to measure the average codon translation rate inside mouse stem cells from ribosome run-off experiments [26]. We calculated the average normalized reads R T ðDt; jÞ from three different samples prepared by allowing the ribosomes to continue their elongation for 90, 120, and 150 seconds after the inhibition of initiation, and plotted R T ðDt; jÞ as a function of codon  (Fig 2C). By following the analysis procedure described in the Theory Section, we fitted rðt¼Dt;LÞ rðt¼0;LÞ against Δt curves (S1B Fig) with a line and calculated the depletion time τ(L), where rðt¼Dt;LÞ rðt¼0;LÞ equals zero for L varying between codon positions 800 and 1000. Plotting τ(L) against L yields a straight line (Fig 2D). The gradient of this line is the average elongation rate, which we find equals 5.2 AA/s.

Measurement of individual codon translation rates.
To extract individual codon translation times along a coding sequence we applied Eq (12) to 117 and 364 high-coverage transcripts from ribosome profiling data reported, respectively, in Refs. [43] and [53] (see Methods). The number of transcripts in both of these datasets are small as compared to the size of S. cerevisiae transcriptome. Therefore, to determine whether these subset of transcripts are representative of the whole transcriptome we compared the distributions of different physicochemical properties in these two sets to the total transcriptome. We find that the subset of transcripts from Nissley et al. [43] have 6.6% higher mean GC content but a very similar mode of length distribution and codon usage relative to the total transcriptome (S8A- S8C Fig). For Williams et al.'s ribosome profiling dataset [53] we again find that the mode of the length distribution and codon usage is similar to the S. cerevisiae transcriptome, with 5.3% higher mean GC content (S8D- S8F Fig). This indicates that the set of transcripts we analyze are largely representative of the properties of the transcriptome, but have a bit higher GC content.
Upon extracting individual codon translation times from these ribosome profiling data, we first characterized the distribution of translation times for the 61 sense codons (S1 File). We find around three-fold difference between the median translation times of the fastest and slowest codons in the Nissley dataset [43]. The fastest and slowest codons are AUU and CCG codons that are translated in 127±2 and 344±37 ms (median ± standard error), respectively. The variability in translation times for a given codon type is even larger, as illustrated by wide distributions of their translation times in the Nissley dataset (Fig 5A, S9A Fig). Fig 5B shows an example Chemical kinetic methods to measure translation rates from ribosome profiling data where the AAG codon is translated with translation times ranging from 59 ms at codon position 413 to 363 ms at codon position 196 in YAL038W S. cerevisiae transcript. We find a 16-fold variability in codon translation times across the transcriptome even if we ignore the extremities of the distributions by only considering the translation times between the 5 th and 95 th percentiles of all codon types. Similar ranges are found in the Williams dataset where there is a 26-fold variability in translation times and 3.9 fold-difference in median translation times of the fastest (AUC) and slowest (CGA) codons, which are translated with median time of 128±2 and 496±61 ms, respectively (S1 File, S9B Fig). The translation time distributions are well correlated between the above two datasets (S10A and S10B Fig). The study of Dao Duc and Song [14] also infers the individual codon translation rates and a very high correlation is observed between the rates obtained using our method and the rates found in that study (S10C Fig). Molecular factors flanking the A-site shape the variability of individual codon translation rates. A number of molecular factors have been shown or suggested to influence the translation rate of a codon in the A-site, including tRNA concentration, mRNA structure, wobble-base pairing, and proline residues at or near the ribosome P-site [16,[18][19][20][22][23][24]39]. Here, we test whether the presence or absence of these factors correlate with changes in translation speed that we measure. We first examined whether the cognate tRNA concentration correlates with our translation times. We find that the median codon translation times negatively correlates with the abundance of cognate tRNA (Fig 6A and 6B, ρ = −0.51 (pvalue = 0.0006) and ρ = −0.50 (p-value = 0.0009), respectively), indicating that codons with lower cognate tRNA concentrations typically are translated more slowly.
The presence of a proline amino acid at the ribosome's P-site can slowdown translation due to its stereochemistry [55]. We tested whether such an effect was present in our data set by calculating the percentage difference in median translation time at the A-site when proline is present at the P-site versus when it is not present at the P-site. We find a 19% increase in median translation time when proline is present (Fig 6C, Mann-Whitney U test, pvalue = 2.2 × 10 −32 ) indicating that proline does systematically slowdown translation in vivo.
It has been found that the presence of downstream mRNA secondary structure can slow down the translation at the A-site [23,24,56,57]. To test for this effect, we plotted the difference in the median translation time at the A-site when mRNA secondary structure is present versus when it is not present at a given number of codon positions downstream of the ribosome Asite. Structured versus unstructured nucleotides were identified using DMS data [58]. We find that when secondary structure is present 4 codons downstream of the A-site, placing that structure directly at the leading edge of the ribosome, there is on average a 6.7% increase in codon translation time at the A-site (Fig 6D, Mann-Whitney U test, p-value = 2.7 × 10 −14 ). A slowdown is also found when we cross-reference our codon translation times with mRNA structure data from PARS [59], which measures the presence of mRNA structure in vitro ( Fig  6E, Mann-Whitney U test, p-value = 5.6 × 10 −9 ).
Wobble base pairing between the codon and anti-codon tRNA stem-loop has been found to slowdown translation speed as compared to Watson-Crick base pairing in bacteria [60] and metazoans [61]. For each pair of codon types that are decoded by the same tRNA molecule, by Watson-Crick base pairing in one instance and wobble base pairing in the other, we tested whether two codon types are translated with different rates. We find that there is no systematic difference in median translation times between codons that are decoded by either mechanism (Fig 6F, Wilcoxon signed-rank test, p-value = 0.46), indicating that, at least in S. cerevisiae, wobble base pairing does not slowdown in vivo translation elongation.

Discussion
We have presented three methods for measuring initiation rates and elongation rates from ribosome profiling data. What distinguishes our approach from many others is that it uses simple equations derived from chemical kinetic principles, it does not require simulations or a large number of parameters, and it yields absolute rather than relative rates. We demonstrated that our approach provides accurate results when applied to test data sets (Figs 1, 2 and 3), and Chemical kinetic methods to measure translation rates from ribosome profiling data reproduced previously reported correlations between translation speed and various molecular factors (Figs 4 and 6, S5-S7 and S11 Figs), suggesting the rates obtained by these methods are reasonable.
A novel finding concerning elongation rates is that in S. cerevisiae the translation time of a codon depends dramatically on its context within a transcript. In S. cerevisiae, the range of individual codon translation time spans up to 26-fold, from 45 to 1,194 ms, even after discarding the top and bottom 5% of this distribution as possible outliers. The codon AAG in gene YAL038W, for example, occurs 36 times along this gene's transcript. At the 196 th codon position AAG is translated in 363 ms, and at the 413 th position AAG is translated in 59 ms. Thus, the same codon in different contexts can be translated at very different speeds. Characterizing the distribution of mean times of translation of different occurrences of the same codon reveals a broad distribution (Fig 5A), whose coefficient of variation is often close to 0.5 (S1 File). This means that the standard deviation is half of the average translation time of a codon. This leads to the important finding that the average translation rate of a codon type is not representative of the rate at which most instances of that codon type are translated. These results are consistent with the findings that a large number of molecular factors determine codon translation rates in vivo [62], thus giving rise to a broad distribution of rates ( Fig 5A, S9 Fig). These factors have been shown to cause a bias towards slower translation in the first 200 codons of many transcripts [23].
A molecular factor that has not been quantified in this study is ribosome queuing. Currently, the conventional Ribosome profiling protocol isolates only monosomes and the monosome-protected fragments are extracted and sequenced. However, should ribosomes queue along a transcript, disomes and trisomes are likely to be produced that are not accounted for in current datasets. Recent studies [14,63] have attempted to quantify the extent of ribosomal queuing but several challenges remain. One of the central challenges is to correctly identify the location of A-sites of ribosomes translating disome-and trisome-protected mRNA fragments. Current ribosome profiling datasets that include disomes have very sparse coverage, which limits the application of our method but more importantly suggests that the occurrence of disomes, and hence of queuing, may be rather rare under normal growth conditions. However, under stress conditions, ribosome queuing has the potential to become frequent for some genes and potentially decrease the accuracy of our method unless the disomes and trisomes fragments are included. As advances in ribosome profiling experiments are made to generate high coverage data and improve the A-site identification on disomes and trisomes, our method will be able to more accurately quantify the rates of translation elongation under non-standard growth conditions.
To determine the transcriptome-wide average elongation rate, Eq (7) was derived from the principles of mass flow and the continuity equation. Eq (7) accurately determines the average elongation rate in simulated data sets (Fig 2B). Applying Eq (7) to ribosome run-off experiments revealed that in mouse stem cells the average codon translation rate is 5.2 AA/s which is similar to the elongation rate measured in a previous study [26].
Measuring initiation rates using Eq (4) reproduced previously reported correlations ( Fig  4A-4C), and also revealed a statistically significant correlation between the rate of translationinitiation and mRNA copy number (Fig 4E). This correlation indicates that genes with a higher mRNA copy number tend to have a higher translation efficiency, suggesting that transcriptional and translational regulation of gene expression can act synergistically to maximize the protein copy number of highly expressed genes.
Other methods have determined initiation rates by varying the initiation rate in simulations until the average ribosome density of a transcript matched the experimentally measured value [10,12,14,63]. These methods require knowledge of individual codon translation rates. Thus, small errors in these rates have the potential to accumulate and lead to large errors in the estimated initiation rates. This might contribute to conflicting conclusions reported in the literature. For example, three [10,14,16] out of four published methods, as well as our method, found a negative correlation between initiation rate and CDS length, whereas Gritsenko et al. [12] found no such correlation. In contrast to these methods, our method is based on a simple chemical kinetic equation that is easy to implement and does not require any detailed assumptions about codon translation rates. Apart from these simulation-based approaches, other methods [19,21,64,65] measure the protein synthesis rate, which represents a lower bound to the rate of translation-initiation (Eq (2)). One of these methods uses chemical kinetic modeling [21], but unlike Eq (4), this method requires a number of biophysical parameters, including the diffusion constant of tRNAs and ribosomes, and cell volume. The difference between the initiation rate and protein synthesis rate increases with increasing initiation rate [66]. Therefore, such methods could exhibit larger errors for transcripts that have higher translation-initiation rates, which are often found in highly expressed transcripts (Fig 4C and 4E).
A number of approaches have been developed to measure codon translation times including simulation based approaches [12,14] that extract rates by comparing the local distribution of ribosome profiling reads with simulated ribosome densities, others that optimize an objective function [13] or fit a normalized-footprint-count distribution of a codon to an empirical function [11], and yet others that measure relative codon translation times by quantifying the enrichment of ribosome read density using a variety of procedures [18,19]. In contrast, Eq (12) allows individual, absolute codon translation rates to be calculated directly from the ribosome profile along the transcript. Another distinction is that a number of these methods [11][12][13]19] assume that all occurrences of a codon across the transcriptome must be translated at the same rate. This assumption cannot be correct as it is known that non-local aspects of translation (such as mRNA structure) can influence the translation speed of individual codons. Eq (12) does not make this assumption, and therefore its extracted rates can better reflect the naturally occurring variation of codon translation times across a transcript.
The codon usage in a transcript, and associated translation rates, can affect various co-and post-translational processes involving nascent proteins [9]. Therefore, the accurate knowledge of codon translation times measured using Eq (12) will help provide a better quantitative understanding of how translation speed can impact the efficiency of co-translational processes, such as protein folding, chaperone binding, and numerous other processes involving the nascent protein. Coupled with molecular biology techniques that can knock out various genes and their functions in cells, Eq (12) provides the opportunity to quantitatively examine whether co-translational processes can cause translation speed changes.
Ribosome profiles have ill-quantified sequencing biases [27] that can potentially produce reads that are not proportional to the underlying number of ribosomes at a particular codon position. This could lead to errors in the extraction of translation rate parameters using our methods [67]. It has been demonstrated that using translational inhibitors like cycloheximide leads to distortion of ribosome profiles due to inefficient arrest of translation [39,68]. This was one of the primary reasons why initial studies using cycloheximide did not observe a correlation of codon translation rates and cognate tRNA concentration. While there is often a strong correlation between the total number of mapped reads per transcript between datasets from different studies, the correlation is often poor at the individual nucleotide level [69]. This "noise" at this resolution has been attributed to sparse read coverage [69], choice of ribonuclease for digestion [70], and the methods used to halt elongation in the ribosome profiling protocol [39,68]. Restricting our analyses to transcripts with high coverage contributes to more reproducible results, as can be seen by the high correlation between the two datasets used in this study (S10 Fig). Experimental improvements that minimize bias have been developed [16,34,70,71], such as using flash-freezing for arresting translation and utilizing short micro-RNA library generation techniques [72], but sequence-dependent biases can still exist, for example due to varying efficiencies of linker ligation [73]. As experimental techniques are improved to minimize bias, the accuracy of the rates extracted using our methods will also increase.
The absence of accurate translation rate parameters is an impediment to quantitatively modeling the process of translation. By measuring translation rate parameters using a chemical-kinetic framework, our methods can contribute to ongoing efforts [65,74] to understand how the sequence features of an mRNA molecule can regulate gene expression. More broadly, the approach we have taken in this study is to utilize ideas from chemistry and physics to analyze next-generation sequencing data; a branch of bioinformatics we refer to as physical bioinformatics. We expect that this physical-science-based approach will prove useful in understanding other large biological data sets concerning translation and compliment conventional computer science approaches to bioinformatics.

Simulated steady state and non-steady state ribosome profiling data
We carried out protein synthesis simulations using the inhomogeneous '-TASEP model [36,63,66,74,75]. In this model, with ℓ = 10 and the A-site of the ribosome located at the 6th codon within the ribosome-protected mRNA fragment, a new translation-initiation event stochastically occurs on transcript i with rate α(i) when the first six codons of the transcript are not occupied by another ribosome [15]. The ribosome then stochastically moves along the transcript from codon position j to j + 1 with rate ω(j, i) if no ribosome is present at the ðj þ 'Þ th codon position. A ribosome stochastically terminates the translation process with rate β when its A-site encounters the stop codon. Note that our simulation model does not account for other processes such as ribosome recycling [44] and drop-off [76]. 1,388 S. cerevisiae mRNA transcripts were selected to test our translation-initiation rate measurement method. They were chosen based on the filtering criteria that at least 95% of their codons have non-zero read density in the ribosome profiling data reported in Ref. [16]. The list of these genes are provided in S1 File. We used the translation-initiation rates reported in Ref. [10] in our simulations for 1,236 of the transcripts. The initiation rates for the other 152 transcripts were not reported in Ref. [10]. Therefore, we randomly assigned (with replacement) the initiation rate to those 152 transcripts from the same database. We used codon translation rates from Fluitt and Viljoen's model for all 61 sense codons [38] and set the translation-termination rate to 35 s −1 [10]. We set ' ¼ 10 codons in our simulations because it is the canonical mRNA fragment length that is protected by ribosomes against nucleotide digestion in ribosome profiling experiments [15].
We simulated the translation of these 1,388 S. cerevisiae mRNA sequences using the Gillespie's algorithm [37] to generate the in silico ribosome profiling data. During the simulations, we recorded ribosome locations across the transcript every 100 steps, which we found minimized the time-correlation between successive saved snapshots. The codon positions of the ribosome's A-site in each of these snapshots, summed over all snapshots, constituted the in silico generated ribosome profile for the transcript. We ran the simulations until the total number of in silico ribosome profiling reads were equal to the total number of reads aligned to the same transcript measured from experimental ribosome profiling data reported in Ref [16]. This allowed us to create a realistic level of statistical sampling in our in silico ribosome profiles. Each of the uncorrelated snapshots can be thought as a separate copy of the mRNA transcript. Thus, the total number of these snapshots were equal to the mRNA copy number in our in silico experiment which we used to calculate ρ(j, i)s.
Non-steady-state ribosome run-off experiments were simulated for 4,617 S. cerevisiae transcripts by using a Monte Carlo simulation method whose procedure is described in Refs. [77][78][79]. Like Gillespie's algorithm [37], Monte Carlo simulation method also use exponentially distributed first passage times for each step in our simulation. We used Fluitt-Viljoen codon translation rates [38] and in vivo initiation rates reported in Ref. [10] to perform these nonsteady-state simulations. These transcripts were chosen based on the filtering criteria reported in Ref. [26]. To perform the in silico run-off experiment described in Ref. [26], first we waited until the translation system achieved a steady-state and then set α = 0, which stopped translation-initiation in our simulations. However, we allowed ribosomes to continue elongation for a time Δt, after which elongation was halted. Next, we recorded the ribosome A-site positions across the transcript that were defined as in silico ribosome profiling reads. We repeated this in silico experiment for the run-off times of Δt = 5, 10, 15, 20, 25 and 30 seconds until the in silico average read per codon for each transcript became equal to the ribosome profiling reads reported in Ref. [16].
To test our method for measuring codon translation rates using Eq (12) we selected 85 transcripts that meet the filtering criteria (see sub-section Selection of genes for codon translation rates for details) from the experimental ribosome profiling data reported in Ref. [16]. We simulated protein synthesis on those 85 transcripts to generate in silico ribosome profiles. The number of in silico ribosome profiling reads for each of those 85 transcripts were equal to the experimental ribosome profiling reads reported in Ref. [16].

In silico measurement of average protein synthesis and codon translation times
To calculate the translation-initiation rate (Eq (4)) and codon translation times (Eq (12)) from in silico ribosome profiles we need the average time a ribosome takes to synthesize a protein from a given transcript. We measured this quantity from our simulations by recording the time it takes a ribosome to traverse from the start codon to the stop codon in the transcript. The average synthesis time of a protein was then calculated from 10,000 individual ribosomes.
We also calculated the average synthesis time of a protein using a scaling relationship that uses the transcriptome-wide average codon translation time (S1 Text, Eq. (S10)). To calculate this quantity, we first computed the average codon translation time for each transcript by dividing the average protein synthesis time of a transcript by its CDS length. We then calculated the transcriptome-wide average codon translation time using the average codon translation time of each transcript.
Testing the accuracy of Eq (12) requires the real codon translation times which we measured by setting a separate clock at each codon position of a transcript in our simulations. These clocks measured the time difference between successive arrival and departure of a ribosome at each codon position. To calculate the average codon translation time at each codon position at least 10,000 such measurements were made.

Calculation of the folding energy near the 5 0 mRNA cap and estimation of other relevant parameters
We calculated the folding energy near the 5 0 mRNA cap of S. cerevisiae transcripts to measure its correlation with in vivo translation-initiation rates calculated using Eq (4). The folding energy of the first 70 nucleotides after the 5 0 mRNA cap correlates the most with the translation-initiation rate [16]. To identify the 5 0 UTR sequences included in those 70 nucleotides, we used the database reported in Ref [80]. We calculated the folding energy of each segment by using the software RNAfold 2.0 [81].
We used the mRNA and protein copy numbers reported in Ref. [82] to measure their correlation with translation-initiation rates. The mRNA and protein copy number reported in this paper are the average copy numbers, which were each averaged using three independent studies [83][84][85][86][87][88].

Analysis of ribosome profiling and RNA-Seq data
Datasets. To calculate the translation-initiation rates we applied our methods (Eq (4)) to in vivo ribosome profiling data published in Refs. [16] and [43] with NCBI accession numbers GSM1289257 and GSM1949551, respectively. The RNA-Seq data we used for Ref. [16] has NCBI accession number GSM1969535. For Ref. [43], we make public the RNA-Seq sample at GSM3242263 which we performed simultaneously with the ribosome profiling experiment. We chose these two data sets because they contain both ribosome profiling and RNA-Seq data of a sample, allowing us to calculate the average ribosome density of a transcript (S1 Text, Eq. (S9)). To calculate the codon translation rates, we apply our method to high-coverage ribosome profiling datasets of wild type S. cerevisiae reported in Refs. [43] and [53] with NCBI accession numbers GSM1949551 and GSM1495503, respectively. In our analysis, reads were preprocessed and mapped to sacCer3 reference genome as described in Ref. [43]. To maintain the accuracy of read assignment, transcripts in which multiple mapped reads constitute more than 0.1% of total reads mapping to the CDS region were not considered in the analysis. A-site positions in ribosome profiling reads were assigned according to the offset table generated using an Integer Programming algorithm which maximizes the reads between the second and stop codon of transcripts [89]. The offset table for S. cerevisiae is taken from Table 1 of ref. [89]. RPKM values were calculated for transcripts in RNA-Seq data by counting the reads whose 5 0 ends were within the coding region of the transcript.
Selection of genes for translation-initiation rates. To apply our method (Eq (4)) for extracting translation initiation rates, we restrict our analysis to genes in which 95% of codon positions of a transcript have non-zero read density. 1,388 transcripts meet this criterion. This threshold reduces the statistical uncertainty in the estimation of codon position dependent ribosome density used in Eq (4).
Selection of genes for codon translation rates. To extract individual codon translation rates, we restrict our analysis to genes that have at least 3 reads at every codon position of the transcript. We find that 117 and 364 genes meet this criterion in the data set of Nissley et al. [43] and Williams et al. [53], respectively. This stringent requirement is necessary since Eq (12) would predict codons with zero reads to be translated in zero time. Reads at the start codon and the second codon have contributions from the translation initiation process; therefore, we ignored these codon positions in our calculations of translation time distributions and correlation with molecular factors. As stated before, transcripts containing multiple mapped reads greater than 0.1% of the total reads mapped to the transcript were discarded. Genes with overlap of coding sequence regions as well as those containing introns (which is less than 6% of S. cerevisiae genome) were not considered in the analysis to avoid overlap of ribosome profiles.
Miscellaneous. (a) Since experimental measurements of hT(i)is are not available for S. cerevisiae we use Eq. (S10) to estimate hT(i)i with hτ A i = 200 ms, as reported in the literature [40,41]. (b) In vivo ribosome profiling data for the mouse stem cells were processed using the method described in the original paper [26]. (c) The measures for tRNA abundance based on gene copy number and RNA-Seq measurements were obtained from Table S2 of Ref. [16]. (d) Transcript leader (or 5 0 UTR) sequences were obtained from Ref. [80] and upstream AUG were identified in these sequences for the transcripts for which we calculate the translation initiation rates. (e) For Kozak sequence analysis, 12 nt sequence was identified around the start codon starting from -6 with respect to Adenine of start codon (position +1) to +6. The consensus Kozak sequence for S. cerevisiae is (A/T)A(A/C)A(A/C)AATGTC(T/C) [49]. For the 12 nt sequence for every transcript, a similarity score is calculated based on its match with the Kozak sequence. The score ranges from 1 to 10 in the order of its increasing similarity with Kozak sequence. If all the 9 nt around the start codon are same as the Kozak sequence and the start codon (scored as 1), the score will be 9+1 = 10. If only start codon is same while all other 9 nt are different, the score is just 1. If start codon is same and only 4 nt positions are same, then the score will be 4+1 = 5. To determine the effect of Kozak sequence, two subsets of transcripts were created, the first with transcripts having context around start codon closer to Kozak sequence (score > 7) and the second subset for transcripts with context farther away from Kozak sequence (score < 5). Mann-Whitney U test was used to determine the statistical significance of the difference between the translation-initiation rate distributions of the two subsets of genes.

Assignment of mRNA secondary structure
Both DMS and PARS data provide information about base-paired nucleotides within an mRNA molecule. We considered a codon to be structured if at least two of its three nucleotides were base-paired or one nucleotide was base-paired and the structure information for the other two nucleotides was not available.
DMS data for S. cerevisiae were downloaded from GEO database with accession number GSE45803 [58]. The reads from all in vivo replicates were pooled together and then aligned to the ribosomal RNA sequences using Bowtie 2 (v2.2.3) [91]. The reads which did not align to the ribosomal RNA sequences were then aligned to the Saccharomyces cerevisiae assembly R64-2-1 (UCSC: sacCer3) using Tophat (v2.0.13) [92]. In our analysis, A and C nucleotides were considered base-paired when the DMS signal was below the threshold of 0.2 and considered unstructured if the DMS signal was greater than 0.5. A and C nucleotides with DMS signal between 0.2 and 0.5 are considered ambiguous and classified together with U and G nucleotides, which do not react with DMS. Codons involving such nucleotides were not considered in our analysis.
PARS data were downloaded from genie.weizmann.ac.il/pubs/PARS10 with PARS scores available for all transcripts, except YDR461W and YNL145W, which were excluded from our analysis. Nucleotides with a PARS score greater than 0 were considered base-paired [59]. -initiation rates measured from two independent data sets. α 1 and α 2 are the translation-initiation rate calculated by Eq (4) using the ribosome profiling and RNA-Seq data reported in Nissley et al. [43] and Weinberg et al. [16], respectively. Polysome profiling data reported in Mackay et al. [42] and Arava et al. [29] were used in (A) and (B), respectively. (TIFF) S4 Fig. Comparison between translation-initiation rates measured from Eq (4) with those of Dao Duc and Song. [14]. In vivo initiation rates calculated by Eq (4) using the ribosome profiling and RNA-seq data from Weinberg et al. [16] were compared with the ones reported in Dao Duc and Song [14] in (A) and (C); In vivo initiation rates calculated by Eq (4) using the ribosome profiling and RNA-seq data from Nissley et al. [43] were compared with the ones reported in Dao Duc and Song [14] in (B) and (D). Polysome profiling data reported in Mackay et al. [42] were used to calculate in vivo initiation rates in (A) and (B); Polysome profiling data reported in Arava et al. [29] were used to calculate in vivo nitiation rates in (C) and  [43] and Williams et al. [53]. (B) The standard deviations of these translation time distributions are also highly correlated for the two datasets indicating that the variability of translation times is reproducible across datasets. (C) The codon translation rates obtained using Eq (12) for the dataset from Weinberg et al. [16] is correlated with codon translation rates inferred in the study of Dao Duc and Song [14] on the same dataset. when a proline is present in the P-site (green) or when a proline is not present in the P-site (blue). (D-E) Percentage difference in median translation times when mRNA structure is present relative to when it is not present as a function of codon position after the A-site. Grey bars indicate results that are not statistically significant. Error bars are the 95% C.I. calculated using 10 4 bootstrap cycles; significance is assessed using the Mann-Whitney U test corrected with the Benjamini Hochberg FDR method for multiple-hypothesis correction. mRNA structure information used in (D) and (E) were taken from in vivo DMS and in vitro PARS data, respectively. (F) Scatter plot of the median translation times of pairs of codon types that are decoded by the same tRNA molecule. The red line is the identity line. The list of tRNA molecules and which codon they decode were taken from Ref. [54]. Error bars are standard error about the median calculated with 10 4 bootstrap cycles. (TIFF) S12 Fig. Average ribosome density on a transcript as a function of translation efficiency. Translation efficiency in (A) and (B) are calculated using the ribosome profiling and RNA-Seq data reported in Ref. [16]; Translation efficiency in (C) and (D) are calculated using ribosome profiling and RNA-Seq data reported in Ref. [43]. Ribosome density used in (A) and (C) are from the polysome profiling data reported in Ref. [42] whereas the ribosome density in (B) and (D) are provided by Ref. [29]. The solid line in all these figures represent the best fit of y = ξx line. (TIFF) S1 Table. Transcripts containing at least one upstream AUG (uAUG) have lower median translation-initiation rates. For all possible combinations of ribosome profiling and RNA-Seq data [16,43] and polysome profiling data [29,42], we see a consistent result that the median translation-initiation rate is lower for transcripts with at least one uAUG. The result is not statistically significant for combination of Refs. [43] and [42] (p-value = 0.052). (PDF) S2 Table. Transcripts with sequence context around start codon similar to Kozak sequence have higher translation-initiation rates. For all possible combinations of ribosome profiling and RNA-Seq data [16,43] and polysome profiling data [29,42], we see a consistent result that the median translation-initiation rate is higher for transcripts with sequence context similar to Kozak sequence (see Methods for details). The result for combination of Refs [43] and [29] is however not statistically significant (p-value = 0.065). (PDF) S1 File. Initiation rates for genes discussed in the study as well as statistics for codon translation times for 64 codon types. (XLSX)