Integrated workflow for discovery of microprotein-coding small open reading frames

Summary Small open reading frame (smORF)-encoded microproteins, proteins containing less than 100–150 amino acids, are an emerging class of functional biomolecules. Here, we present a protocol for identifying translated smORFs in mammalian systems genome wide. We describe steps for generation of ribosome profiling (Ribo-seq) data, in silico translation of a transcriptome assembly to create an ORF database, and computational analysis of Ribo-seq to score individual smORFs for translation. Identification of translated smORFs is the first step to studying the functions of microproteins. For complete details on the use and execution of this protocol, please refer to Martinez et al.1

Note: The longer incubations with CHX in the PBS wash can cause accumulation of ribosomes near the start site of ORFs, representing an artifact of the CHX treatment rather than native translation conditions.For suspension cell cultures, we recommend minimizing the incubation time with CHX by spinning down the cells in PBS with CHX for as short a time as possible, e.g., 3 min.
2. Add 400 mL of ice-cold lysis buffer to the plate and scrape cells off in the lysis buffer and collect the buffer and cells into a fresh tube.
Note: Prepare lysis buffer on the day of harvesting samples.The lysis buffer volume can be adjusted in order to achieve the optimal concentration of lysate RNA for digests and will depend on the cell type and number of cells (troubleshooting 1).
3. Vortex and/or pass your sample through a 26-gauge needle to homogenize the cells and let it incubate on ice for 10 min.Use of a probe sonicator is not recommended.4. Clarify the lysate by spinning it at 15,000 3 g for 10 min at 4 C. 5. Transfer supernatant to a new RNase-free tube.
CRITICAL: RNA in cellular lysate is vulnerable to degradation and ribosome dissociation.Handle samples on ice and proceed with digestion the day of lysis if possible.Samples that need to be stored for later, they should be flash frozen in liquid nitrogen and stored at À80 C for up to one week.

Timing: 1 h
Prior to column purification of monosomes, the column resin is equilibrated with wash buffer.This step should be started just before the digest with RNase I. 6. Prepare one S-400 HR column per 100 mL of lysate.For 60 mg of input RNA in 300 mL lysis buffer for RNase I digestion, three columns will be used for monosome isolation.7. Remove the cap and seal from each column and begin washing with 500 mL S-400 column wash buffer.a. Discard the flow-through after each wash.b.Perform six washes per column for 3 mL total wash buffer per column.
Note: Prepare column wash buffer on the day of digest.The wash buffer passes through the column very slowly with gravity.To facilitate faster flow-through of the buffer, push air through the top of the columns by pressing a gloved finger over the top of the column.Ensure that there are no bubbles in the column resin.
CRITICAL: On the last wash, allow enough buffer to pass through the column such that there is about 15% volume of buffer sitting on top of the resin bed.
8. Spin the column at 600 3 g for 4 min.
CRITICAL: Do not let the resin bed in the columns dry out during the wash steps.Doing so will result in low yields of RPFs.Ideally the column washes will be complete just prior to RNase I digest step.

Protocol RNase I digest
Timing: 1 h and 15 min Polysomes are subjected to RNase I digestion, which degrades unprotected RNA to yield monosomes containing RNA footprints that only span the width of the single ribosome.Note: Optimal RNase I concentration needs to be determined for each cell or tissue type being studied (troubleshooting 2).
12.After completing the digest, add 10 mL of (200 U) of SuperaseIn Rnase Inhibitor and keep the samples on ice.

Monosome isolation and acid Phenol:Chloroform extraction
Timing: 2 h and 30 min Digested monosomes are separated from free digested RNA and unprotected RNA using a size exclusion spin column.
Monosomes are isolated from free-digested RNA fragments in the cell lysate using MicroSpin S-400 HR columns.The small RNA RPFs are then extracted using acid phenol:chloroform (troubleshooting 3).
13. Add 100 mL of digest in a dropwise manner directly to the resin of each spin column.Do not pipette the digest onto the walls of the column.14.Centrifuge the columns at 600 3 g for 2 min.15.Collect $125 mL of monosomes from each column and combine the eluates from all columns per sample.

Note:
The extra $25 mL volume collected is from the remaining wash buffer in the column.
16. Add an equal volume of acid phenol: chloroform to each sample, vortex the samples, and then incubate for 5 min at 20 C-25 C. 17. Spin the samples at >15,000 3 g for 10 min at 4 C. Collect the aqueous phase from each sample after centrifuging while being careful not to pipette the interphase or organic phase (Figure 1).18. To each sample, add 1:10 volume of 3 M NaOAc, pH 5.2 and 2 mL of glycogen (15 mg/mL).Next, add 1 volume of ice-cold isopropanol.Crash out RPF RNA at À20 C for >2 h.
Note: Glycogen is used as a carrier to trap nucleic acids during its precipitation, creating a more visible pellet.
19. Spin down the RPF RNA samples at >15,000 3 g for 15 min at 4 C to pellet the RNA.Decant the supernatant, being careful not to disturb the pellet.20.Wash the pellet once with ice-cold 80% ethanol.Centrifuge the samples again at >15,000 3 g for 10 min at 4 C. 21.Decant the supernatant and allow the pellet to air dry for about 5-10 min.Do not over-dry the pellet while ensuring that the ethanol completely evaporates.
22. Resuspend the pellet in 15 mL of nuclease-free H 2 O. Use 1 mL of resuspended RNA to measure RNA concentration with a Nanodrop UV-Vis Spectrophotometer.
Pause point: Purified RNA can be stored for up to a week at À20 C or long-term at À80 C.

rRNA depletion with siTOOLs Biotech riboPOOLs kit
Timing: 1 h Contaminating ribosomal RNA (rRNAs) are removed from the sample in order to improve the depth of RPF sequencing.While there are many rRNA depletion kits commercially available, we recommend the siTOOLs riboPOOLs rRNA depletion kit for ribosome profiling.
rRNA probe hybridization 23.Adjust the sample to a final volume of 14 mL containing 1-5 mg of RNA with water.24.Add 1 mL of rRNA probe to the sample.
Note: siTOOLs riboPOOLs rRNA depletion kits have been observed to contain a varying concentration of rRNA probes between lots.It is recommended to measure the concentration of the probe prior to usage.We have found that 1.23-1.53concentrations of probe to RPF RNA is optimal for depleting rRNA (troubleshooting 4).Note that despite executing this step the majority of sequenced reads are likley to still be derived from rRNA sequences.
25. Add 5 mL of hybridization buffer (HB) and 1 mL of Promega RNasin Plus.Pipet up and down to mix well.26.Incubate sample(s) for 10 min at 68 C in a thermocycler to denature RNA and allow samples to cool at a rate of 3 C/min to 23 C.

Bead preparation
27.While your sample(s) are cooling down, resuspend the streptavidin-coated magnetic beads by vortexing them at medium speed.
Note: For multiple samples, magnetic beads may be prepared by batch processing as in the manufacturer's protocol.
28. Transfer 90 mL of the bead suspension per sample into a fresh microcentrifuge tube and place the tube on a magnetic rack for 1 min.29.Aspirate and discard the supernatant.Remove the tube from the magnetic rack and resuspend the magnetic beads in 80 mL of depletion buffer (DB).Place the tube back on the magnetic rack for 1 min.Aspirate and discard the supernatant.30.Add 80 mL of DB to the magnetic beads and mix by pipetting.Do not place the tube back on the magnetic rack.

rRNA depletion
31.Once your sample has cooled, add the 20 mL of sample into your magnetic bead suspension and pipette up and down several times to mix.Let sit at 20 C-25 C for 5 min.Place onto a thermomixer set to 37 C for 5 min.Skip the 50 C incubation step.32.Spin down droplets and place the tube on a magnetic rack for 2 min.Transfer the supernatant ($85-90 mL) to a new tube.33.Adjust the volume of each sample to 100 mL with nuclease-free water and proceed to cleaning RPF RNAs using the Zymo RNA Clean and Concentrator kit.a. Add 200 mL RNA binding buffer and 450 mL 100% ethanol to each RPF sample.b.Load each column with the RPF sample, spin for 1 min at >10,000 3 g and discard the flowthrough.c.Wash each column with 400 mL RNA prep buffer and spin at >10,0000 3 g for 1 min and discard the flow-through.d.Wash each column with 700 mL RNA wash buffer and spin at >10,000 3 g for 1 min and discard the flow-though.e. Wash each column with 400 mL RNA wash buffer and spin at >10,000 3 g for 2 min and discard the flow-through.Transfer each column to a fresh, nuclease-free centrifuge tube.f.Elute RPF RNA samples with 11 mL nuclease-free water and keep the samples on ice or store at À20 C for up to a week.Expect to recover $ 10 mL.

PAGE purification of ribosome-protected fragments (RPFs)
Timing: 3 h and 30 min RPF fragments corresponding to lengths between 27 and 30 nt are extracted from a denaturing polyacrylamide gel.
36.Prepare 800 mL of fresh 13 TBE buffer to run the gel.37. Prepare a 15% TBE-urea polyacrylamide gel by rinsing the wells with 13 TBE buffer and prerunning the gel for 5 min at 180 V prior to loading the samples.38.Load 10 mL of the 20/100 oligo ladder, 10 mL of the marker, and 20 mL of each sample across two lanes (10 mL per lane).
Note: It is recommended to load markers next to each sample to help with cutting the gel.
39. Run the gel at 180 V for 65 min or until the bromophenol blue bands reach the bottom of the gel.40.Stain each gel with 3 mL of SYBR gold and 30 mL of 13 TBE buffer on a shaker at 20 C-25 C for 20 min.41.Create a hole in the bottom of a 0.5 mL tube using an 18-gauge needle.Place that 0.5 mL tube in a 1.5 mL tube.This will be used for shredding the gel.42.Using the 27 and 30 nt marker bands on the gel as a reference, cut the 27-30 nt region for each sample and place the excised bands in a 0.5 mL tube that was prepared in the previous step (Figure 2).43.Centrifuge the tube(s) at >15,000 3 g for 2 min to shred the gel slices.Gel pieces should all be in the 1.5 mL microcentrifuge tube.

Note:
The gel slice may be cut up further into smaller pieces before centrifuging if you find that it does not pass through the 18-gauge hole in the 0.5 mL tube easily.
44. Remove and discard the 0.5 mL tube.45.Add the following components to each 1.5 mL microcentrifuge tube with gel pieces and rotate the samples for 2 h at 20 C-25 C to extract RPFs.a. 400 mL nuclease-free water.b. 40 mL 5 M ammonium acetate.c. 2 mL 10% sodium dodecyl sulfate (SDS).d. 1 mL Rnase inhibitor (Superasin).46.Trim the end of a 1 mL pipette tip with a fresh razor and use that tip to transfer the gel slurry to a centrifugal filter.Centrifuge the tube(s) at 2,000 3 g for 3 min.47.Transfer the RNA solution to a fresh 1.5 mL microcentrifuge tube and add the following reagents to each tube.a. 2 mL Glycoblue.b. 700 mL 100% isopropanol.48.Mix well and incubate for >2 h at À20 C. 49.Centrifuge the tube(s) at 15,000 3 g at 4 C for 20 min to pellet the RPF RNA.50.Decant the supernatant without disturbing the pellet.51.Wash the pellet with 500 mL of ice-cold 80% ethanol and centrifuge at 15,000 3 g at 4 C for 10 min.52.Decant the supernatant without disturbing the pellet and air dry for about 10 min.53.Resuspend the pellet in 3.5 mL of nuclease-free water.
Pause point: Sample(s) can be stored at À20 C for up to a week.

Timing: 1 h and 30 min
Prior to the ligation of the adapter to the RPF, the 5 0 end of the 3 0 adapter is adenylated to allow for ligation with RNA ligase.

Note:
In this protocol, we use the 3 0 adapter sequence from McGlincy et al., 10 but do not include the barcode sequence.54.Combine the following components in a tube.55.Incubate the tube at 65 C for 1 h followed by heat-inactivation of the enzyme at 85 C for 5 min.56.Adjust the volume of each sample to 50 mL by adding 30 mL of nuclease-free water.Purify the sample(s) with the Zymo RNA Clean and Concentrator kit.a. Add 100 mL RNA binding buffer and 150 mL 100% ethanol.b.Load the column with the sample, spin for 1 min at >10,000 3 g, and discard the flowthrough.c.Wash the column with 400 mL RNA prep buffer and spin at >10,0000 3 g for 1 min, and discard the flow-through.d.Wash the column with 700 mL RNA wash buffer and spin at >10,000 3 g for 1 min, and discard the flow-though.e. Wash the column with 400 mL RNA wash buffer and spin at >10,000 3 g for 2 min, and discard the flow-through.Transfer the column to a fresh, nuclease-free centrifuge tube.f.Elute with 7 mL nuclease-free water and keep the sample on ice or store at À20 C for up to a month.Expect to recover $6 mL.57.Store the adenylated adapter at À20 C.
Note: Avoid repeated freeze-thaw of the adenylated adapter.Make aliquots prior to storing in the À20 C. Adapters should be prepared fresh after one month of storage at À20 C.

End repair of RPFs
Timing: 1 h The 3 0 phosphate at the end of each RPF is removed to allow for ligation with the adenylated 3 0 adapter.
58. Add the following to the 3.5 mL RPF samples.Do not make a master mix and add each component individually.
59. Mix well and incubate the samples at 37 C for 1 h.
Pause point: Sample(s) can be stored at À20 C for up to a week.

Timing: 5 h and 15 min
The end-repaired RPFs are ligated to the adenylated 3 0 adapter.Leftover adapters are then removed using deadenylase and exonuclease.b.Load each column with the sample(s), spin for 1 min at >10,000 3 g and discard the flow-through.c.Wash each column with 400 mL RNA prep buffer and spin at >10,0000 3 g for 1 min and discard the flow-through.d.Wash each column with 700 mL RNA wash buffer and spin at >10,000 3 g for 1 min and discard the flow-though.e. Wash each column with 400 mL RNA wash buffer and spin at >10,000 3 g for 2 min and discard the flow-through.Transfer each column to a fresh, nuclease-free centrifuge tube.

Reagent Amount
Promega RNasin Plus inhibitor 0.5 mL immediately after.74.Prepare 800 mL of fresh 13 TBE buffer to run the gel.75.Prepare a 10% TBE-urea polyacrylamide gel by rinsing the wells with 13 TBE buffer and prerunning the gel for 5 min at 180 V prior to loading the sample(s).76.Load 10 mL of the 20/100 oligo ladder or 20 mL of sample per lane.77.Run the gel at 180 V for 65 min or until the bromophenol blue bands reach the bottom of the gel.78.Stain the gel with 3 mL of SYBR gold and 30 mL of 13 TBE running buffer on a shaker at 20 C-25 C for 20 min.79.Create a hole in the bottom of a 0.5 mL tube using an 18-gauge needle.Place that 0.5 mL tube in a 1.5 mL tube.This will be used for shredding the gel.80. Cut out the 85-105 nt bands using a fresh razor blade (Figure 3).Note: a band at this size range may not be visible due to the low amount of cDNA, but successful libraries can sometimes still be made in these cases.
81. Transfer the slice(s) to the 0.5 mL tube(s) and centrifuge at 14,000 3 g for 2 min.82.Remove and discard the 0.5 mL tube(s).83.To each sample add the following components: a. 400 mL nuclease-free water.b. 40 mL 5 M ammonium acetate.c. 2 mL 10% SDS.84.Incubate the sample(s) in a thermomixer at 37 C for 2 h at 900 rpm.85.Trim the end of a 1-mL pipette tip with a fresh razor and use that tip to transfer the gel slurry to a 1.5 mL centrifugal filter.Centrifuge the tube(s) at 2,000 3 g for 3 min.86.Transfer the cDNA solution to a fresh 1.5 mL microcentrifuge tube and add the following to each tube.a. 2 mL Glycoblue.b. 700 mL 100% isopropanol.87.Mix well and incubate the sample(s) for >2 h at À20 C. 88.Centrifuge the tube(s) at >12,000 3 g (4 C) for 20 min to pellet the cDNA.89.Decant the supernatant without disturbing the pellet.90.Wash the pellet with 500 mL of ice-cold 80% ethanol and centrifuge (4 C) at >12,000 3 g for 10 min.91.Decant the supernatant without disturbing the pellet and air dry for about 10 min.92.Resuspend the pellet in 10 mL of nuclease-free water.

Protocol
The final dsDNA RPF library is extracted from a non-denaturing polyacrylamide gel to separate the library from unwanted side products such as adapter dimers (troubleshooting 5).If the library is already a clean single peak of the correct library size, then skip this step and proceed to sequencing.
113.Prepare the sample(s) and ladder: a. Add 6 mL of 53 native TBE loading dye to each 25 mL of PCR products.b.Use 6 mL of the 100 bp-1 kb ladder.114.Prepare 800 mL of 13 TBE running buffer to run the gel.115.Prepare an 8% native TBE polyacrylamide gel by rinsing the wells with 13 TBE buffer and prerunning the gel for 5 min at 180 V before loading the sample(s).116.Load 30 mL of each sample across two lanes (15 mL per lane) or load 6 mL of ladder in one lane.117.Run the gel at 200 V for 40 min.118.Stain the gel with 3 mL of SYBR gold and 10 mL of 13 TBE running buffer on a shaker at 20 C-25 C for 20 min.119.Create a hole in the bottom of a 0.5 mL tube using an 18-gauge needle.Place that 0.5 mL tube in a 1.5 mL tube.This will be used for shredding the gel.120.Using a fresh razor blade, cut out the band at 140-160 bp and transfer it to the 0.5 mL tube(s) (Figure 4).121.Centrifuge the tube(s) at 14,000 3 g for 2 min.Discard the 0.5 mL tube(s) that was used for shredding the gel.122.To each 1.5 mL tube, add the following components: a. 400 mL nuclease-free water.b. 40 mL 5 M ammonium acetate.c. 2 mL 10% SDS.123.Incubate the sample(s) on a thermomixer at 37 C (900 rpm) for 2 h.124.Trim the end of a 1-mL pipette tip with a fresh razor and use that tip to transfer the gel slurry to a 1.5 mL centrifugal filter.Centrifuge the tube(s) at 2,000 3 g for 3 min.125.Transfer the PCR product solution to a fresh 1.5 mL microcentrifuge tube and add the following to each tube: a. 2 mL Glycogen.b. 700 mL 100% isopropanol.126.Mix each tube well and incubate >2 h at À20 C. 127.Centrifuge the tube(s) at >12,000 3 g (4 C) for 20 min to pellet the PCR products.128.Decant the supernatant without disturbing the pellet.129.Wash the pellet with 500 mL of ice-cold 80% ethanol and centrifuge (4 C) at >12,000 3 g for 10 min.130. Decant the supernatant without disturbing the pellet and air dry for about 10 min.131.Resuspend the pellet in 25 mL of nuclease-free water.132.Measure the concentration of PCR products using the Qubit dsDNA high-sensitivity kit.133.Submit samples for BioAnalyzer quality control and sequencing (Figure 5).
Pause point: Sample(s) can be stored at À20 C

Protocol
Processing ribosome profiling sequencing data

Timing: 4-5 h
We use an in-house integrated pipeline including RibORF 0.1, a software package used to score ORFs for translation based on Ribo-seq coverage.There are other ORF prediction tools that are newer than RibORF 0.1 that may also be used in addition or instead of RibORF.
Note: It is recommended that users create a conda environment to manage the specific dependencies and tools needed to run the workflow.The pipeline will run optimally in a highperformance computing cluster environment.Users can create a conda environment using the following commands after installation of Anaconda or Miniconda.
134.Obtain FASTQ files from the sequencer.
Note: If the sample is paired end, then proceed with processing with the read 1 fastq file.The length of time for processing the sequencing files will depend on the size of the fastq file.
135.Install the required software tools of the pipeline and load the conda environment (see materials list above).136.Download the fasta files for your reference genome.We will use the mouse mm10 reference genome for our example (https://hgdownload.soe.ucsc.edu/goldenPath/mm10/chromosomes/).

Note:
We only include primary chromosomes in our alignments.
137.Download the gtf file for the reference annotation for your species.We will use the mouse GENCODE annotation in our example, which can be downloaded from the GENCODE website (https://www.gencodegenes.org/mouse/).138.Download the fasta files for the rRNA and tRNA sequences using the UCSC genome browser's Table Browser tool to pull these sequences from the Repeat Masker tracks.139.Download the RefSeq gene refFlat file and create a new file including only the coding genes annotated as ''NM''.In our case we used the mouse mm10 version (http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/).
Note: Newer reference annotations may be used as available.
140.Generate the STAR index for both the reference genome annotation and the tRNA and rRNA files.

Protocol
Note: A de novo transcriptome assembly can be used as an option in place of a reference annotation if there are RNA-seq samples sequenced in parallel with Ribo-seq data.
141.For Ribo-seq fastq processing, the first step is to trim the 3 0 adapter sequence using the FASTX-toolkit.Untrimmed reads and reads less than 20 nt after trimming are discarded.
142.The trimmed reads are then aligned to rRNA and tRNA genes, which represent unwanted contaminants in Ribo-seq samples.Reads that align to these regions are discarded and the remaining reads are aligned to the entire genome.In these alignments we allow for multimapping reads for user flexibility, but later on we will filter out mutlimappers and use only uniquely aligned reads in our analyses.
Note: STAR is our preferred splice-aware read alignment tool, but other similar tools can be substituted.
143.The resulting bam file Sample_trimmed_filteredAligned.sortedByCoord.out.bam is filtered for primary alignments using samtools to keep only the best scoring aligned multimappers.
144.The resulting .bamfile is used to generate bedgraphs through HOMER that can be uploaded and visualized in the UCSC genome browser or Integrative Genomics Viewer to look at Ribo-seq coverage.There are two bedgraph files that are generated.The Sample_trimmed_filtered_noSecondaryAlign.bedGraph file includes all reads from the .bamgenerated in step 143 including multimappers.The Sample_trimmed_filtered_noMulitMap. bedgraph file includes only the uniquely aligned reads.145.After generating the bedgraph files, samtools is then used to remove the multimappers from the read file generated in step 143.This creates a read alignment file with only uniquely aligned reads and is used in the subsequent steps.Users can also choose to use the read alignment file that retains multimappers.
146.Determine the RPF read length distribution (Figure 6).One million reads are randomly sampled from the bam file and sorted by the read length.
Note: High quality Ribo-seq data should have the majority of the reads centered around 28-29 nt read lengths.You can plot the length frequency values in a graphing software tool such as R, Excel, or Prism to visualize the frequency distribution of the reads.
147.Visualize the aligned RPFs of each length relative to annotated CDS regions to choose the best offset correction such that the 5 0 end of reads are aligned to ribosomal A-sites within coding regions.
Note: A-site reads are defined by the first position of the codon located in the A-site of the RPF.For this example, we use the RefSeq coding gene refFlat file for our annotated coding regions.First, the .bamfile from step 145 is converted to a .samfile using samtools.The pipeline will take the .samfile and run the readDist.plscript from the RibORF 0.1 software package to generate metagene plots to show the average read distributions from 30 nt upstream to 50 nt downstream of start sites and vice versa for stop sites.The step will generate plots for a given read length that the user selects.In the example below, an uncorrected metagene plot is generated for 30 nt reads.Repeat this step for all read lengths across the length 148.Choose the read lengths to include for offset correction based on the metagene plots generated in step 147.Only read lengths with good 3 nt periodicity should be included for offset correction.These will be the final set of reads used in smORF translation scoring.
Note: During offset correct each, the number of bases to be shifted such that the 5 0 end of the read aligns with the first position of the A-site codon is determined.For example, when analyzing high-resolution datasets, 28 nt foot prints usually require a 15 nt offset to align with the A-site.An offset parameters text file will need to be created as input into the offset-Correct.plscript.The offset parameters file needs to be formatted into two columns separated by a tab.The first column is the read lengths to be offset corrected and the second column is the offset distance.
149.Run the readDist.plscript again using the corrected.samfile generated in step 148 to plot the corrected read locations of all lengths chosen by the user (Figure 7).The read length should be set to 1 as all reads now represent the first position of the ribosomal A-site.
150.Translate the reference transcriptome in all three reading frames using the GTFtoFasta Java script.Note: This script will parse all possible open reading frames, finding the most upstream canonical start codon and an in frame stop.If there is no canonical start codon, then the ORF will be defined by stop site to stop site.The output contains a .gtffile of all possible ORFs as well as peptide and nucleotide sequences for each ORF.Usage of the GTFtoFasta script is shown below using the GENCODE reference gtf.
151.Use the gtfToGenePred script from the UCSC tool collection to convert the ORFs gtf file generated in the previous step to a genePred file format.A custom python script is then used to convert the resulting genePred file to refFlat format.
152.Run the RibORF.plscript to score each ORF for translation using the offset corrected .samfile from step 148 and the ORF refFlat file from the previous step as inputs.
Note: The R package required to be used with RibORF 0.1, e1071, is a support vector machine classifier that needs to be installed in the user's PATH to run RibORF.pl.The resulting output file of predicted ORFs is the pred.pvalue.parameters.txt.We use an ORF length cutoff of 12 nt and a minimum read coverage cutoff of 10 reads.Note: The output pred.pvalue.parameters.filtered.txtfile from step 153 is the list of smORFs that are predicted by RibORF to be translated.There are additional filtering steps that can be applied to remove smORFs that are already annotated or overlap with known coding genes.Additional tools that are recommended to be used are bedtools intersect and NCBI blast.The bedtools intersect tool can be used to filter out smORFs that are overlapping annotated coding regions.NCBI-blast can be used to filter out smORFs based on amino acid sequence similarity to annotated proteins.

EXPECTED OUTCOMES
A successful Ribo-seq experiment should yield a library that is $155 bp in length.There should be a single peak without any broad shoulders on the bioanalyzer trace (Figure 5).However, like other sequencing library preparations adapter dimers can form, resulting in a PCR product with no RPF included.This adapter dimer will appear as a smaller peak of $120 bp and can bind to the flow cell and undergo sequencing, generating unusable data (troubleshooting 5).Prior to submitting samples for sequencing, library concentrations should be measured using the Qubit high sensitivity dsDNA assay kit.Yields after PCR amplification can typically range anywhere from 2 ng to $30 ng.
While a properly prepared Ribo-seq library peaks on a bioanalyzer at around 155 bp, this is not always indicative of successful enrichment of RPFs.Poor depletion of rRNAs can result in a low percentage of usable reads.Typically, the expected percentage of usable reads from the total amount of sequenced reads is around 20%, but this number may vary depending on the cell line or tissue type used.
During the processing of sequencing data, the generation of QC plots reveal the quality of Ribosome profiling data.The RPF read length distribution for high-quality data should show a peak at 28-29 nt (Figure 6).
In addition to an optimal read length distribution, the combined metagene plot of A-site offset corrected RPF reads should show 3-nt periodicity with enriched peaks at the translation initiation and termination sites (Figure 7), which indicate translating ribosomes within a CDS.

LIMITATIONS
Ribo-seq sensitively detects the translation of smORFs.However, translated smORF detection by Ribo-seq does not report on the stability of translated proteins and therefore cannot guarantee that the smORF encoded microproteins are stable, functional biomolecules that can regulate biology.In addition, the detection of lowly translated smORFs can be noisy and it is recommended to collect biological replicates of Ribo-seq datasets. 8stly, the integrated analysis pipeline has some limitations compared to other more recently developed tools that predict smORF translation.RibORF v0.1 relies on candidate ORFs to be predefined to predict active translation, while more recent versions of RibORF and other newer Ribo-seq analysis tools have the ability to parse transcript databases for open reading frames and identify predicted translated ORFs de novo using Ribo-seq data.

Figure 1 .
Figure 1.Acid phenol:chloroform extraction of RPFs Acid phenol:chloroform extraction separates RPFs and other RNA molecules from proteins, lipids, and DNA.Retrieve RPFs from the aqueous phase that settles in the top layer.

Figure 2 .
Figure 2. 15% TBE-Urea Gel (A) Example RPF gel showing digested RNA post-rRNA depletion.20 mL of sample is loaded across two lanes.(B) RPF gel with 27-30 nt regions cut out for Ribo-seq samples.
Pause point: Sample(s) can be stored at À20 C 70. Add 1 mL Hybridase to each sample and incubate at 55 C for 5 min and hold at 4 C or place the samples immediately on ice.71.The volume should now be 31 mL.Adjust the volume of each sample to 50 mL by adding 19 mL of nuclease-free water.Purify the samples with the Zymo RNA clean and concentrator kit.a. add 100 mL RNA binding buffer and 150 mL 100% ethanol.
cDNA samples with 11 mL nuclease-free water and keep the samples on ice or store at them À20 C. Expect to recover $ 10 mL.Pause point: Sample(s) can be stored at À20 C. PAGE purification cDNA Timing: 3 h and 30 min cDNA fragments are extracted from a denaturing polyacrylamide gel to separate from potential side reaction products.72.Prepare the 20/100 oligo ladder and sample(s): a. Add 10 mL 23 TBE-urea dye to each sample.b. 20/100 oligo ladder: 1 mL 20/100 oligo ladder (0.1 mg/mL), 19 mL nuclease free water, 20 mL 23 TBE-urea dye.73.Denature the samples and ladder by boiling them at 95 C for 5 min.Place the sample(s) on ice

Figure 3 .
Figure 3. 10% TBE Urea Gel (A) Example cDNA gel.20 mL of sample is loaded into one lane.Proceed with gel extraction regardless of whether a band is visible between 85-105 nt.(B) cDNA gel after extraction showing region cut.Note: The band near 70 nt corresponds to no insert reverse transcription products.

Timing: 2 h
Pause point: Store the sample(s) at À20 C.cDNA circularization Circularize cDNA to flank both sides of the RPF cDNA with primer binding sites for PCR amplification.93.Prepare the following CircLigase mix for each sample: Note: Do not create a master mix.Prepare each mix individually per sample.94.Add the 10 mL CircLigase mix to each sample and mix by pipetting.Centrifuge the sample(s) briefly.95.Incubate the sample(s) at 60 C for 2 h.Next, hold the samples at 4 C or immediately place them on ice.Pause point: Sample(s) can be stored at À20 C PCR amplify the circularized cDNA products to yield a dsDNA RPF library that is ready for Illumina sequencing.Note: Reverse primer sequences used at this step in the protocol are from the now discontinued Illumina TruSeq Ribo Profile kit.96.Pick a unique index primer for each sample while considering the recommended combinations for optimal index color balancing.97.Prepare the following PCR mix for each sample.Note: Do not create a master mix, prepare each mix individually per sample.98.Add 5 mL of the cDNA to the PCR master mix and mix thoroughly.99.Place the tubes in a thermocycler and use the following program:

Figure 5 .
Figure 5. Bioanalyzer profile of RPF library after PCR amplification and gel extraction cleanupThe final RPF library is 150-160 bp and is a clean single peak as expected.

Figure 6 .
Figure 6.Length distribution plot of RPF reads RPF lengths peak around 28-29 nt as expected for fully digested ribosome footprints.

Figure 7 .
Figure 7. Combined metagene plot of ribosome A-site offset corrected RPF reads Metagene plot shows good coverage and 3 nt periodicity within the translation initiation and termination sites.The different colored bars represent the three possible reading frames for translation.Red is reading frame 1 and is the coding frame for our metagene, blue is frame 2, and green is frame 3. RPM, reads per million.
10nvert RNA samples into single-stranded cDNA using a reverse transcription (RT) primer that recognizes complementary sequences on the adapter-ligated to the RPFs in the previous step.Following reverse transcription, the RNA template is degraded.Note:We use the reverse transcription primer from McGlincy et al.,10but do not include the first two random nucleotide sequences.
60. Add 1 mL of adenylated adapter to each end repaired RPF sample and heat denature the sample(s) at 65 C for 2 min.Note: Hold the temperature at 4 C immediately after heat denaturation.61.Add the following components to each heat-denatured sample.Do not make a master mix and add each component individually.63.Incubate the sample(s) at 23 C for 3 h.Pause point: Sample(s) can be stored at À20 C. 64.Add 1 mL of RecJ Exonuclease and 0.5 mL of Deadenylase to each sample and mix thoroughly by pipetting.66. Prepare the following mix for each sample: Note: Do not create a master mix.Prepare each mix individually per sample.67.Add the 13 mL reverse transcription mix to each sample and mix well by pipetting.68.Incubate the sample(s) at 50 C for 30 min.69.Add 1 mL of Exonuclease I to each sample and incubate at 37 C for 30 min followed by 15 min of incubation at 80 C. Hold samples at 4 C or place the samples on ice.
Protocol153.Filter the ORF list for predicted translated smORFs by selecting for candidates that have a minimum p-value cutoff of 0.7 (based on the RibORF authors' suggestion), maximum nucleotide length cutoff of 450 (representing ORFs of 150 codons and smaller), and a minimum read coverage cutoff of 10.