A Small RNA Isolation and Sequencing Protocol and Its Application to Assay CRISPR RNA Biogenesis in Bacteria

Next generation high-throughput sequencing has enabled sensitive and unambiguous analysis of RNA populations in cells. Here, we describe a method for isolation and strand-specific sequencing of small RNA pools from bacteria that can be multiplexed to accommodate multiple biological samples in a single experiment. Small RNAs are isolated by polyacrylamide gel electrophoresis and treated with T4 polynucleotide kinase. This allows for 3’ adapter ligation to CRISPR RNAs, which don’t have pre-existing 3’-OH ends. Pre-adenylated adapters are then ligated using T4 RNA ligase 1 in the absence of ATP and with a high concentration of polyethylene glycol (PEG). The 3’ capture step enables precise determination of the 3’ ends of diverse RNA molecules. Additionally, a random hexamer in the ligated adapter helps control for potential downstream amplification bias. Following reverse-transcription, the cDNA product is circularized and libraries are prepared by PCR. We show that the amplified library need not be visible by gel electrophoresis for efficient sequencing of the desired product. Using this method, we routinely prepare RNA sequencing libraries from minute amounts of purified small RNA. This protocol is tailored to assay for CRISPR RNA biogenesis in bacteria through sequencing of mature CRISPR RNAs, but can be used to sequence diverse classes of small RNAs. We also provide a fully worked example of our data processing pipeline, with instructions for running the provided scripts.


Background
Genetic modules associated with Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR) confer adaptive immunity in diverse prokaryotic hosts (Barrangou et al., 2007). Memories of invasive elements (such as viruses, plasmids, and other mobile elements) are stored interspersed between directed repeats of the CRISPR arrays in the host genome in the form of 'spacers' comprising the nucleic acid sequence of the molecular parasite (Brouns et al., 2008;Jackson et al., 2017). In order to identify subsequent infections by the same invader, the information contained in CRISPR spacers must be communicated to CRISPR-associated (Cas) endonucleases (Plagens et al., 2015). For the vast majority of CRISPR-Cas systems (phylogenetically grouped as 'type I' and 'type III' [Makarova et al., 2011 and), this occurs through the activity of a family of CRISPR-associated endoribonucleases known as Cas6 Hochstrasser and Doudna, 2015). The entire CRISPR array is transcribed as a precursor CRISPR RNA (pre-crRNA) molecule from the genome, and the Cas6 protein domain helps to process this transcript into a collection of mature CRISPR RNAs (crRNA) consisting of one CRISPR spacer each, flanked by portions of the CRISPR repeat sequence (Carte et al., 2008 andHaurwitz et al., 2010). This mechanism is known as crRNA biogenesis. Cas6 endoribonucleases promote crRNA biogenesis through site-specific cleavage of the CRISPR repeat sequence, which generates 5'-OH and 2'3'-cyclic phosphate termini  Hochstrasser and Doudna, 2015). Site-specific cleavage at every CRISPR repeat results in the pre-crRNA molecule being chopped at regular intervals into almost equal-length crRNAs, each with a different spacer sequence Hochstrasser and Doudna, 2015). Mature crRNAs are then loaded onto Cas effector complexes and serve as molecular guides that direct Cas enzymes to target DNA or RNA parasites based on sequence complementarity (Deveau et al., 2008;Marraffini and Sontheimer, 2008). The presence or absence of mature crRNAs isolated from bacterial cell populations can be used as a proxy for Cas6 activity. While biochemical methods have been developed to detect crRNAs (Carte et al., 2008 andHaurwitz et al., 2010), high-throughput RNA sequencing can be used to assay for Cas6 activity unambiguously (Heidrich et al., 2015).
Whole transcriptome sequencing is expensive and can be biased against specific classes of RNAs depending on the specific method of library preparation. Therefore, various small RNA sequencing protocols have been developed to preferentially detect mature crRNAs (Juranek et al., 2012;Richter et al., 2012;Heidrich et al., 2015).
Here, we present a multiplexed small RNA sequencing method to enable facile and reproducible comparisons of crRNA maturation between many different biological conditions at once, such as mutations in the Cas6 protein to assess the mechanism of Cas6 activity. This protocol builds on previous work on small RNA sequencing and ribosome profiling (Lau et al., 2001;Ingolia et al., 2009;Guo et al., 2010;Kwon, 2011;Kivioja et al., 2011). The assay features high sensitivity and dynamic range without expending a lot of sequencing bandwidth on other cellular RNAs, with the caveat that the full-length precursor transcript is not observed by small RNA sequencing.

22.
Ammonium Note: More indexing primers may be added as needed.

Procedure Duration
The protocol can be performed comfortably in 4 days (including RNA isolation from bacteria) as follows: Steps A1-A16 on day 1, A17-C5 on day 2, D1-E12 on day 3, and E13 onwards on day 4. The flowchart below summarizes the major steps in the protocol ( Figure  1).

RNA isolation from bacteria
RNA extraction methods will depend on the bacteria under study. The extraction method must avoid any column-based or size-dependent purification steps that could lead to preferential loss of small RNAs. We follow the manufacturer's instructions provided with Trizol reagent for our model system Marinomonas mediterranea, a gamma-proteobacterium (like E. coli). We use no more than 200-500 μl of saturated M. mediterranea culture in Marine Broth 2216 for RNA isolation.

A.
Small RNA isolation by denaturing polyacrylamide gel electrophoresis (PAGE)
Note: Remember to remove the gel comb and the green tape at the bottom of the gel cassette before assembling the electrophoresis cell.

2.
Fill the inside and outside chambers with 1x TBE running buffer, and pre-run the gel at 180 V for at least 30 min.

3.
Prepare samples of at least 5-10 μg total intact RNA and 0.1 μg Ultra Low Range DNA ladder in 2x formamide gel loading dye at a final concentration of 1x. We suggest keeping the total volume of each sample < 15 μl.

4.
Denature the samples and ladder by heating in a thermocycler with a pre-heated lid at 94 °C for 5 min, then immediately place in an icewater slurry.

5.
While the samples are denaturing, thoroughly flush urea out of each gel well using a 100 μl pipette with the running buffer from the inner chamber several times.

6.
Load samples carefully with gel-loading pipette tips.
Note: Leave 1-2 lanes between different RNA samples to reduce the amount of cross-contamination between experiments. We recommend including no more than 4-5 RNA samples (and one lane for the approximate sizing ladder) in a 10-lane gel.

7.
Run at 180 V until the bromophenol blue dye front (bottom band ~25 nt) reaches close to the end of the gel (about 35 min).

8.
While the gel is running, prepare gel elution tubes by making a small cross-shaped incision at the bottom of a 0.6 ml tube with a clean razor blade (see diagram for details) and placing it inside a 1.5 ml siliconized centrifuge tube. Do not remove the caps of either the 0.6 ml or 1.5 ml tubes ( Figure 2).
Note: Use of siliconized tubes is critical to avoid the loss of RNA due to non-specific binding to tube walls.

9.
Carefully disassemble the cassette and remove the gel. Stain with SYBR Gold diluted 1:5,000 in 1x TBE running buffer.
Note: We typically use 3 μl of SYBR Gold in 15 ml running buffer and stain on a slowly rocking nutator in a small plastic dish for about 5 min at room temperature. Wear appropriate protective equipment to prevent exposure to SYBR Gold, and also to prevent contamination of samples with extraneous biological material.

10.
Transfer the gel onto a clear plastic film and place on a UV transilluminator (set at 365 nm wavelength).

11.
Carefully excise out gel fragments for each sample from the 25-nt marker upto the 75-nt marker, which should be just below a bright band corresponding to cellular tRNAs ( Figure 3).
Note: For a non-degraded RNA sample, there will most likely be no visible RNA in the excised gel fragment. We often include a small portion of the lowest visible tRNA band to serve as a carrier in subsequent steps. Intact tRNAs typically do not reverse transcribe efficiently and should not result in overwhelming contamination in the final dataset.

12.
Place each gel fragment in a separate elution tube.

13.
Centrifuge each elution tube at 20,000 × g at room temperature in a tabletop microcentrifuge for 1-3 min to force the gel fragment through the incision in the 0.6 ml tube and into the 1.5 ml siliconized tube. Carefully remove any leftover gel pieces in the 0.6 ml tube with a clean pipette tip, and place in the corresponding 1.5 ml siliconized tube. Discard the 0.6 ml tube.

14.
Add 300 μl of polyacrylamide gel elution buffer (see Recipes) into each 1.5 ml siliconized tube containing pulverized gel fragments, and vortex vigorously to make a uniform slurry.

15.
Place the tubes at −80 °C to freeze, then in a 37 °C water bath for 2 min to thaw. Vortex vigorously, and then repeat this step 2-3 times.

16.
Place the samples on ice for 1 min to cool, then incubate at 4 °C with shaking in a rotisserie tube rotator overnight to elute RNA from gel fragments.

17.
Centrifuge briefly to collect gel slurry at the bottom of the tube.

18.
Prepare filtration tubes by placing 0.45 μm sterile cellulose acetate filters in new 1.5 ml siliconized tubes.

19.
Widen the bore of 1,000 μl pipette tips using clean scissors, and transfer gel slurry to the filtration tubes.

20.
Collect RNA eluate by centrifugation at 16,000 × g for 2 min at room temperature, discard the filters, and add (in order) 1 μl (20 mg/ml) glycogen and 1 ml 100% ethanol to each sample.

21.
Precipitate nucleic acids by placing the tubes at −80 °C for 30 min.

22.
Centrifuge at 20,000 × g at 4 °C for 30 min, and discard the ethanol while taking care not to dislodge the pellet.

23.
Wash with 1 ml freshly prepared 70% ethanol, taking care to flush out the cap by inverting several times.
Note: It is not necessary to vortex aggressively at this step. Vortexing can be helpful in dislodging the pellet, but excessive agitation, as well as use of more concentrated ethanol for washing will lead to pellet fragmentation and reduction in yield.

24.
Centrifuge at 20,000 × g for 2 min at room temperature to collect the pellet, pour off the ethanol and repeat the wash.

25.
Centrifuge briefly at room temperature to collect residual ethanol after the second wash step is complete. Remove remaining ethanol using a 10 μl pipette, taking care not to touch the pellet. Air dry for 3 min.
Note: After removing residual ethanol with a pipette, 3 min is sufficient to dry the pellet. We typically dry under a flame to prevent dust from accidentally settling in the tubes.
Note: The added glycogen from Step A20 should result in a clearly visible pellet that may become translucent upon drying. The pellet will be easy to resuspend provided it has not been over-dried.

1.
Denature RNA at 90 °C for 1 min in a heated-lid thermocycler, then plunge in ice for 1 min.
Note: We use the entire RNA sample from the previous step, and do not attempt to measure its concentration since the amount of RNA is often below the detection limit of commercial assay kits.

2.
To 17 μl of the RNA sample, add (in order) 2 μl 10x PNK buffer and 1 μl PNK enzyme, and mix well by pipetting.
Note: The added glycogen from Step A20 should result in a clearly visible pellet that may become translucent upon drying. The pellet will be easy to resuspend provided it has not been over-dried.

2.
Transfer RNA samples to 0.2 ml PCR tubes, and add 4 μl of the mixture to each RNA sample. Mix well by pipetting.
Note: PEG 8000 is viscous and pre-mixing with 5x adenylation buffer helps to reduce viscosity and make dispensing to sample tubes easier. Mix by pipetting for as long as necessary until the solution appears uniform.

3.
Heat sample to 98 °C for 1 min, plunge in ice for 1 min, then place at room temperature for the next step.
Note: We keep the pre-adenylated 3' adapter oligo at −80 °C and thaw on ice before use.

5.
Incubate in a thermocycler at 22 °C for 6 h. The reaction can be stored at 4 °C if performing this step overnight.

D.
Excess adapter digestion

2.
Incubate RNA samples at 95 °C for 1 min, allow to cool and then add 88 μl of buffer mixture.
Note: The 5' deadenylase removes the /5rApp/ group from the free 5' ends of un-ligated pre-adenylated adapters, thereby exposing the excess adapter molecules to digestion by the single-stranded-DNA-specific 5' → 3' exonuclease RecJ f .

5.
During the digestion step, pre-spin a heavy phaselock gel tube for each sample at 16,000 × g for 2 min at room temperature.

6.
Add 100 μl RNase-free water to each RNA sample, mix well, and transfer to a pre-spun phaselock tube.

7.
Add 200 μl acid-phenol:chloroform to each sample and mix by shaking vigorously by hand.

8.
Centrifuge at 16,000 × g at room temperature for 5 min.

9.
Add 200 μl chloroform to each sample in the same tube and mix gently by inversion.

10.
Centrifuge at 16,000 × g at room temperature for 5 min.
Note: The added glycogen from Step D11 should result in a clearly visible pellet that may become translucent upon drying. The pellet will be easy to resuspend provided it has not been over-dried.

3.
Heat samples to 90 °C for 1 min, then plunge on ice for 1 min.

4.
Add 3.5 μl of reverse-transcription master mix to each sample.
Note: Also maintain a 'no-template' control, which will allow for visualization of the reverse-transcription primer during the subsequent gel purification step.

5.
Add 0.5 μl SuperScript II reverse transcriptase to each reaction and mix well by pipetting.

6.
Incubate at 42 °C for 30 min in a heated-lid thermocycler to synthesize complementary DNA (cDNA).

7.
During this incubation step, set up and pre-run a Novex 10% TBE-Urea denaturing polyacrylamide gel in the XCell SureLock Mini-Cell Electrophoresis System at 180 V for at least 30 min as described in Steps A1-A2.

9.
Incubate at 70 °C for 15 min in a heated-lid thermocycler to hydrolyze RNA.

11.
Prepare 0.1 μg of Ultra Low Range DNA ladder in 2x formamide gel loading dye at a final concentration of 1x.

12.
Denature and run cDNA on pre-run gels as in Steps A4-A25, with the following modifications: a. In Step A7, run the gel until the Xylene Cyanol dye front (top band ~55 nt) reaches close to the bottom of the gel (about 45-60 min).

b.
In Step A11, excise gel fragments in the 100-to 160-nt range (processed CRISPR RNAs are generally in the ~50-100-nt range and the reverse transcription primer adds ~65-nt to the size of the desired small RNAs). Use the no-template control as a visual guide during gel excision, and avoid the bright bands formed in this lane (typically no higher than 90-nt) (Figure 4).

c.
In step A16, cDNA elution should be carried out at room temperature.

13.
Resuspend the cDNA pellet in 17 μl water. Reserve half the sample and 60 °C for 10 sec 72 °C for 10 sec c. hold at 10 °C Note: We typically perform titrations with N = 12, 15, 18, 21, and 24 cycles for each ccDNA sample.

5.
Add 8 μl of 6x DNA gel loading dye to each reaction.

6.
Load all 5 titrations for each sample side-by-side on the agarose gel. We suggest using the Ultra Low Range DNA ladder (~0.5 μg/lane) to demarcate sets of titrations of different ccDNA samples.

8.
Place the gels on a UV transilluminator (set at 365 nm wavelength). Choose the appropriate number of PCR cycles for each ccDNA sample by visually assessing the PCR titration (see Note below) and excise a gel slice containing the PCR amplicon corresponding to the size of the desired product using a 100-1,000 μl pipette fitted with gel excision tips. Expel each gel slice into a separate 1.5 ml centrifuge tube.
Note: A bright band corresponding to the 'empty' circularized ccDNA product (i.e., without a small RNA insert) should be visible in each lane. This may appear as a doublet as the number of PCR cycles (N) is increased. For most small RNA sequencing applications, the desired product will be ~50 bp above this bright band/doublet. For CRISPR RNA sequencing, we rarely ever see a visible smear at this size range, and cut 'blindly' using the DNA ladder and the location of the bright band/doublet (~125 bp) as a visual guide. We typically aim for the highest number of PCR cycles for each ccDNA sample while still safely avoiding the upward-smear from the bright ~125 bp band/doublet ( Figure 5).

9.
Extract DNA from the gel slices according to manufacturer's instructions using the QIAGEN MinElute Gel Extraction kit.

10.
Quantify each purified DNA sample according to manufacturer's instructions using the high-sensitivity double stranded DNA quantification kit accompanying the Qubit fluorometer.

11.
Calculate the approximate concentration of each sample according to the following formula:

12.
Pool the samples in equimolar amounts. The pooled library can be sequenced according to the specifications of your Illumina highthroughput sequencing services provider. We typically use the singleread configuration for 80 cycles for small RNA sequencing applications. For assessing pre-crRNA processing in Marinomonas mediterranea, we sequence no more than 1 million reads per sample, but this will depend on the level of expression of pre-crRNA in the species of interest.

Data analysis
We include a worked example with sample data, which requires the following programs to be installed: cutadapt (tested on v1.14; likely compatible with most other versions) Python 2.7 (with numpy, matplotlib for plotting) The usage formats of the provided python scripts are in bold italics, followed by the specific commands in bold for the worked example with sample data. Start by downloading the worked example, and navigating to the worked_example/ directory in a unix terminal.

a.
Sample and index reads will be in files Undetermined_S0_L001_R1_001.fastq and Undetermined_S0_L001_I1_001.fastq respectively, with the first read corresponding to the first index, the second read corresponding to the second index, and so on. A sample dataset is provided in the sample_data directory.

b.
To segregate reads corresponding to each index, prepare a demultiplexing 'key'-a tab separated text file with the first column containing the desired sample name, and the second containing the reverse complement of the corresponding TruSeq LT index (AD001-8).
A sample file deMultiplexKey_sample.dat is provided.

2.
Trim adapters: The high-throughput sequencing data will contain Illumina adapter sequences. These are parts of the molecule that were necessary for sequencing on the Illumina flowcell.

3.
Collapse reads to eliminate amplification bias: The assay design includes a random hexamer (NNNNNN) in the 3' adapter sequence, which is ligated to every RNA molecule before reverse-transcription. This helps eliminate amplification bias in downstream steps and helps ensure that every read corresponds to a distinct RNA molecule in the biological sample.
Both Steps 2 and 3 (trimming and collapsing) are performed in the provided example with the dirRNAseqAnalyse.py script (using the readCollapser2.py function, which must be in the same directory as the script) as follows: python dirRNAseqAnalyse.py <path_to_directory> <maximum_read_length> e.g., python dirRNAseqAnalyse.py sample_demultiplexed/ 80 The program produces a log file dirRNAseqAnalyseLog.txt which contains details of the adapter trimming step.

4.
Convert to fasta: The fastq2fasta.sh script has been prepared anticipating the files that will be generated in Step 3 for sample data. Each line in this script processes one input fastq file to one output fasta file. Modify this file with your input files (with .trimmed.collapsed.fastq extensions) and output files (with .fasta extensions) as desired. Convert using the following commands:

a.
Identify CRISPR derived reads: First, identify sequencing reads containing the 5' end of the CRISPR direct repeat sequence. We require at least 5 contiguous bases in the sequencing read to match the first five bases of the CRISPR repeat. The CRISPR repeat of interest is supplied in the 1 st line of the parameters file crRNAfigureMaker_params.txt.
Any sequence upstream of the start of the CRISPR repeat is removed.

b.
Remove short matches: If the resulting processed repeat is shorter than 12 bases, also check to see if the 5 bases preceding the CRISPR repeat in the original read match one of the possible spacer endings from the CRISPR arrays in the bacterial genome. In this way, we require at least 12 bases from the CRISPR repeat, or 10 bases across the spacer-repeat junction for any read to qualify for downstream analysis. A dictionary of all possible native spacer endings from the type III-B CRISPR locus in the Marinomonas mediterranea MMB-1 genome is provided in spacerEnds.dict.
Note: The spacerEnds.dict file can be modified in any text editor, but its formatting must be preserved to prevent parsing errors in python.

c.
Assess match fidelity: If the read passes initial filtering, the processed repeat is then matched to the expected CRISPR repeat sequence. We require the repeat to be a left-anchored substring of the CRISPR repeat (i.e., the processed repeat may be shorter than the CRISPR repeat, but it must match at the 5' end and cannot contain mismatches).

6.
Measure levels of a reference gene: Next, count reads containing 25-nt substrings of a reference gene that is highly expressed and does not vary with the biological conditions under study. We use the isoleucine-tRNA sequence as a reference in M. mediterranea datasets, but this may need to be empirically determined based on your RNAseq data for your model. This sequence must be provided in the 2 nd line of the crRNAfigureMaker_params.txt file.

7.
Plot a histogram of lengths of trimmed reads: Finally, plot a histogram of the lengths of the processed CRISPR repeats normalized to the reference gene. The 3 rd line of the crRNAfigureMaker_params.txt file is an arbitrary scaling parameter that controls the height of the Y axis in the plot. It can be changed to accommodate the levels of processed crRNAs relative to the reference gene in your dataset.
Steps 5, 6, and 7 are performed by the crRNAfigureMaker.py script as follows: python crRNAfigureMaker.py <path_to_fasta_files> <keyword> e.g., python crRNAfigureMaker.py sample_fasta/ 8 The keyword option specifies which files should be included in the analysis. The keyword can be any part of the file name. For instance, using the keyword '8' will only process sample8.fasta in the worked example, while using the keyword 'mpl' will include all 4 sample files for processing, and using the keyword 'sem' will result in no files being included.
The crRNAfigureMaker_param.txt file must be in the same directory as the code. Running the above command (i.e., only processing sample8.fasta) should generate Figure 6 below.  A cross-shaped incision is made at the bottom of 0.6 ml centrifuge tube using a clean razor blade. The polyacrylamide gel fragment is pulverized as it is forced through the incision and into a 1.5 ml siliconized centrifuge tube by centrifugation. Four intact total RNA samples (6% denaturing TBE-Urea PAGE). Two biological replicates for each experiment were run side-by-side, with approximate DNA sizing ladders. Images cropped and brightness/contrast adjusted in Microsoft Word. B. Size selection of 25-to 75nt RNAs, including lowest tRNA band. The Invitrogen 10 bp DNA ladder was used in this gel but has since been discontinued by the manufacturer.  Processed crRNA levels assayed by high throughput small RNA sequencing. This dataset has been artificially supplemented with sequences matching expected CRISPR-derived RNAs. The CRISPR repeat sequence from the 1 st line of the parameters file crRNAfigureMaker_params.txt is on the X-axis. The height of the bar at each base along the X-axis represents the relative proportion of crRNAs with 3' ends at that base, normalized to the levels of the reference RNA (isoleucine tRNA; consistently the most abundant species encountered in our M. mediterranea datasets). The presence of a distinct 3' end sequence in the population of CRISPR repeat containing RNAs indicates site-specific cleavage and processing of pre-crRNA.