mRNA structure regulates protein expression through changes in functional half-life

Significance Despite widespread recognition that RNA is inherently structured, the interplay between local and global mRNA secondary structure (particularly in the coding region) and overall protein expression has not been thoroughly explored. Our work uses 2 approaches to disentangle the regulatory roles of mRNA primary sequence and secondary structure: global substitution with modified nucleotides and computational sequence design. By fitting detailed kinetic expression data to mathematical models, we show that secondary structure can increase mRNA half-life independent of codon usage. These findings have significant implications for both translational regulation of endogenous mRNAs and the emerging field of mRNA therapeutics.

For all primary mouse hepatocytes in vitro assays, cryopreserved primary mouse hepatocytes (ThermoFisher cat. no. HMCPIS) were thawed and immediately plated for use in CHRM (ThermoFisher cat. no. CM7000), Williams Medium E supplemented with Hepatocyte Plating Supplement Pack (Serum-Containing). Plates were incubated at 37 °C in a humidified incubator at 5% CO2 atmosphere for 5 hours before changing to serum free media (William's E Maintenance Media -Without Serum). Plates were incubated at 37 °C in a humidified incubator at 5% CO2 atmosphere for all periods between active use.
Mice models: 3 In vivo protein expression experiments for hEpo and Luc mRNAs were performed using CD-1 and BALB/C mouse models.

Method Details.
Sequence Design eGFP variants (G1-G4) were stochastically generated using only frequently used codons. In all cases, the coding sequence was flanked by identical 5¢ and 3¢ untranslated regions (UTRs) capable of supporting high levels of protein expression ( Figure 1B). Thus, total protein expression from these exogenous RNAs is determined by the combined impact of the primary coding sequence and the nucleotides used. For simplicity and ease of analysis, we designed firefly luciferase mRNA sequences based on simple one-to-one codon sets (i.e. each amino acid is encoded by the same codon at every instance of the amino acid (Table S3) that disfavored the use of rare codons). Regions of increased rare codon frequency have been shown to decrease protein expression and mRNA stability (1, 2). The hEpo protein contains a 9-amino acid (27 nucleotide) signal peptide sequence that is removed from the mature protein after targeting the protein to the endoplasmic reticulum (ER) for secretion. To evaluate whether codon choice had different effects in 4 the signal peptide region, we tested sequence designs for hEpo in which a leader region of 30 amino acids was encoded using two distinct codon sets: L1 (an AU-rich codon set) and L2 (a GC-rich codon set) ( Figure 1C). eGFP mRNAs were computationally designed by an algorithm written to stochastically sample the sequence space of a given peptide sequence.
mRNA Preparation: mRNA was produced by in vitro transcription (IVT) using T7 RNA polymerase using a linearized DNA template encoding the RNA polymerase promotor followed by 5' UTR, ORF, 3'UTR, and poly(A) tail. Cap 1 was installed to improve translation efficiency. The following combinations of nucleotides were used: all unmodified nucleotides; unmodified adenosine, cytidine, and guanosine with pseudouridine (Y), N 1 -methylpseudouridine (m 1 Y), or 5-methoxy-uridine (mo 5 U); or unmodified adenosine and guanosine with pseudouridine and 5-methyl-cytidine (Y/m 5 C). After purification, the mRNA was buffer exchanged into sodium citrate buffer and stored at -20 °C until use. Single-endpoint eGFP expression assays were conducted 24 hours post transfection, unless otherwise specified. Fluorescence was measured at an excitation wavelength of 488nm and emission wavelength of 509nm on a Synergy H1 plate reader.
Single endpoint interferon-beta (IFN-β) expression assays were conducted on cell supernatant 48 hours post transfection. The Human IFN-Beta ELISA kit (R&D Systems cat. no. 41410) was used as per manufacturer's suggested protocol.
Luc mRNAs with m 1 Y, mo 5 U, and a negative control mRNA lacking a poly(A) tail were electroporated into AML12 cells and both protein expression and RNA abundance was assayed at 1, 2, 3, 5, 18, and 24 hours. Luciferase expression was also determined for electroporated RNA at every hour from 1 to 6 hours post electroporation in order to ensure that delivery did not dramatically change the expression phenotype.

6
In vivo studies: We measured reporter protein expression from exogenous mRNA in CD-1 and BALB/C mouse models.
Lipid nanoparticle formulation of mRNA was performed through ethanol drop nanoprecipitation by mixing acidified RNA and lipids dissolved in ethanol at a 3:1 ratio (aqueous:ethanol) at a lipid molar ratio of 50:10:38.5:1.5 (ionizable : fusogenic : structural : PEG). After pH adjustment, the mRNA-loaded lipid nanoparticles were buffer exchanged into 1x PBS and stored at 4 °C until use. Final particle size and encapsulation were <100nm and >80%, respectively, with endotoxin below 10 EU/mL. Luc mRNAs were formulated in lipid nanoparticles at a concentration of 0.03mg/mL, administered intravenously to CD-1 mice at a dose of 0.15mg/kg of body mass and measured for expression by whole body Bioluminescence Imaging (BLI) at 6 hours post injection.
hEpo mRNAs were formulated in lipid nanoparticles at a concentration of 0.01mg/mL, administered intravenously to BALB/C mice at a dose of (0.05mg/kg of body weight and measured for serum hEpo concentration using Human Erythropoietin Quantikine IVD ELISA kit (R&D Systems cat. no. DEP00) at 6 hours post injection.

UV Melting
Absorbance was measured at 260nm on the Cary100 UV Vis Spectrometer as RNA, in 2mM Sodium citrate buffer (pH=6.5), was heated from 25˚C to 80˚C at a rate of 7 1˚C/minute, and then cooled from 80˚C to 25˚C at a rate of 1˚C/minute. This cycle was repeated three times in total. First derivative of absorbance values were then analyzed as a function of temperature.

SHAPE-MaP
All purified mRNAs were denatured at 80˚C for 3 minutes prior to analysis. After denaturation, RNAs were folded in 100mM HEPES, pH 8.0, 100mM NaCl and 10mM MgCl2 for 15 minutes at 37˚C. All RNAs were then selectively modified with 10mM 1methyl-6-nitroisatoic anhydride (1M6) (Sigma-Aldrich cat no. S888079-250MG) for 5 minutes at 37˚C. Background (no SHAPE reagent) and denatured (SHAPE modified fully denatured RNA) controls were prepared in parallel.
After SHAPE modification, RNA was purified and fragmented using 15mM MgCl2 at 94˚C for 4 minutes. Purified fragments were then randomly primed with N9 primer at 70˚C for 5 minutes. Primer extension was carried out in 50mM Tris-HCl, pH 8.0, 75mM KCl, 1mM dNTPs, 5mM DTT and 6.25mM MnCl2 with Superscript II reverse transcriptase (ThermoFisher cat. no. 10864014) for 3 hours at 45˚C. RNA-seq library prep was done with the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England Biolabs cat. no. E7420) per the manufacturer's standard protocol.
RNA-seq libraries were sequenced on the Illumina MiSeq. Raw sequencing data was analyzed using the publicly available ShapeMapper software (3). The resulting reactivity data were analyzed using a sliding window (median SHAPE) approach to quantify the degree of structure at each position in the RNA, as has previously been described (4).

8
Quantification and Statistical Analysis.

Comparison of codon effects on translation
Luc expression values from 39 Luc variants were used in 865 pairwise comparisons between synonymous codons to yield p-value testing whether inclusion of specific codons impacted protein expression by ANOVA. Graph Pad software was used to determine p-values and p-values < 0.05 were considered significant.

Determination of Nearest-Neighbor Thermodynamic Parameters
UV-melting experiments were performed on 39 synthetic RNA duplexes with Y, m 1 Y, and mo 5 U instead of uridine. The duplex sequences were designed to enable the full thermodynamic parameters for the nearest neighbor free energy contributions for each modified nucleotide to be determined.
Raw data were collected through absorbance versus duplex melting temperature profiles over six different synthetic oligonucleotide concentrations in 1M NaCl, 10mM Na2HPO4, and 0.5mM Na2EDTA, pH 6.98 salt buffer. These data were then processed using Meltwin v.3.5 to obtain a full thermodynamic parameter set through two different methods, those methods being the LnCt/4 vs. Tm -1 method and the Marquardt non-linear curve fit method.
Determination of structure function relationship in SHAPE data 9 The log normalized values for sliding window average of SHAPE reactivites from every position within the RNA were compared to the expression levels determined in HeLa cells. Linear regression was used to determine the degree of correlation between SHAPE and protein expression. A bootstrapping approach was used to determine the limits of statistical significance at each position.
Computational modeling of eGFP expression data Time-course data was collected from HeLa and AML12 cells transfected with the computationally designed eGFP-degron mRNAs. The eGFP-degron construct reduces protein half-life to under 1 hour, so that changes in eGFP fluorescence over time directly correlate to the kinetics of translation and mRNA decay (5). Fluorescence from the transfected cells was monitored over a 20-hour time course, and total active protein levels were calculated by integrating the area under the curve. These data were used to fit a the computational model of active protein production and degradation ( Figure 6C) in which rate terms for protein maturation and degradation were held constant and the translation efficiency and rate of RNA degradation were allowed to vary to find the best fit to the experimental data. Data fitting was done in python using the Scikit learn module.

Data availability.
Raw sequencing files (.fastq) and processed reactivities from the SHAPE-MaP structure probing experiment are deposited into GEO under the ID codes XXXXXXXXXX.

Contact for Reagent and Resource Sharing.
Further information and requests for resources and reagents should be directed to    Values are the same as in Figure 2A.

Pearson correlations for percent U (red) and A (blue) vs protein expression in HeLa cells
(y-axis) across 30 nucleotide regions (x-axis) for computationally designed eGFP sequence variants. Values are the same as in Figure 6B. mRNA half-lives (t1/2 RNA) were calculated from the observed mRNA decay rates.
(C) Correlation between the computationally determined eGFP-degron mRNA half-lives determined by computational models that excluded ( Figure 6C) or include ( Figure   S11B) a term for delivery of the RNA. Table S1. Nearest neighbor base pairing energies for modified nucleotides, related to Figure 3 Nearest-neighbor thermodynamic parameters along with the experimentally determined error for Watson-crick base pairs containing unmodified uridine (values from (7)), Y, m 1 Y, or mo 5 U. The modified nucleotide(s) for each nearest neighbor pair is highlighted in red. Parameters were derived by linear regression of UV-melting data from X short oligonucleotides containing global substitutions, as described in (7). Table S2. Table of     C.

Figure S5
A.