RATA: A method for high-throughput identification of RNA bound transcription factors

Long non-coding RNAs (lncRNAs) regulate critical cellular processes and their dysregulation contributes to multiple diseases. Although only a few lncRNAs have defined mechanisms, many of these characterized lncRNAs interact with transcription factors to regulate gene expression, suggesting a common mechanism of action. Identifying RNA-bound transcription factors is especially challenging due to inefficient RNA immunoprecipitation and low abundance of many transcription factors. Here we describe a highly sensitive, user-friendly, and inexpensive technique called RATA (RNA-associated transcription factor array), which utilizes a MS2-aptamer pulldown strategy coupled with transcription factor activation arrays for identification of transcription factors associated with a nuclear RNA of interest. RATA requires only ~5 million cells and standard molecular biology reagents for multiplexed identification of up to 96 transcription factors in 2–3 d. Thus, RATA offers significant advantages over other technologies for analysis of RNA-transcription factor interactions.


BACKGROUND
Dysregulation of lncRNAs has recently been linked to multiple human diseases, especially cancer [1]. While the exact mechanism of many lncRNAs remains unknown, it is becoming increasingly clear that many lncRNAs regulate gene expression through a variety of mechanisms, including direct association with transcription factors to regulate transcription factor activity and/or sub-cellular localization [2]. Therefore, technologies that interrogate the fundamental biology of lncRNAs, especially interactions with transcription factors, could accelerate biological discovery across many disciplines.
Unbiased identification of RNA-bound transcription factors following RNA immunoprecipitation (RNA-IP) is challenging for several reasons: (1) RNA-IPs are generally inefficient, requiring large numbers of cells to generate sufficient eluate for downstream analysis; (2) many transcription factors are expressed at very low levels and (3) the standard technique for unbiased identification of associated proteins is mass spectrometry, which requires a minimum threshold level of protein for detection. For example, the recent methods of ChIRP-MS [3] and RAP-MS [4] comprehensively identify proteins that interact with endogenous RNAs in cells through crosslinking of protein-RNA interactions and precipitation of endogenous RNAs by capture of biotinylated antisense DNA oligos. Due to low purification efficiency, these protocols require 100-800 million cells for each condition tested in order to generate sufficient eluate for mass spectrometry analysis. Thus, these protocols are costly and labor-intensive and may prohibit many scientists from performing them. Alternatively, other methods increase protein detection by increasing RNA purification efficiency. For example, incubation of in vitro transcribed, biotinylated RNA with cell lysate significantly increases efficiency of RNA purification [5]. However, because this method bypasses normal processing of endogenous RNAs, it may capture non-physiological protein-RNA interactions resulting from addition of RNAs post-lysis.
To overcome these obstacles, we developed a sensitive and unbi-ased technique for identifying RNA bound transcription factors called RATA [6]. This technique couples an RNA-IP with a high-throughput transcription factor activation array, enabling identification of up to 96 transcription factors in a single assay. Briefly, a mammalian cell line is co-transfected with plasmids encoding a nuclear localized FLAG-tagged MS2 protein and an RNA of interest tagged with the MS2 RNA aptamer. Following IP of the MS2-tagged RNA with anti-FLAG antibody, the immunoprecipitate is gently and efficiently eluted with FLAG peptide, retaining the native confirmation of any associated proteins. Finally, the eluted immunoprecipitate is directly input into a commercially available transcription factor activation array. Importantly, the IP is performed in parallel using cells transfected with MS2-tagged RNA or the untagged RNA control, enabling identification of transcription factors that are enriched in the RNA precipitate compared to the control precipitate. Because RATA amplifies the transcription factor signature through streptavidin-horseradish peroxidase, the amount of transcription factor required for detection is significantly reduced. Indeed, RATA sensitively and reproducibly enriches transcription factors from ~5 million cells (1/2 of a 10 cm dish). Furthermore, RATA requires only a plate reader for analysis, which is widely available to many scientists. Following cell harvest, only 2 to 3 days are required for analysis of up to 96 transcription factors. While multiple methods exist for RNA-IP from cells, we have developed this technique using the well-established MS2-based system. The bacteriophage MS2 coat protein binds with high affinity (K d of 7 nM) in a sequence-specific manner to a viral RNA stem-loop sequence (MS2 stem-loop) [7]. The V75E;A81G MS2 protein variant, selected for its lack of capsid formation in E. coli, has been widely adapted for localization and/or purification of tagged RNAs. In these approaches, a plasmid is generated to express the MS2 protein fused to a FLAG and/or GFP tag(s). The MS2 fusion expression plasmid, along with a second plasmid expressing an RNA engineered to contain multiple copies of the MS2 stem-loop structure, are transiently transfected into cultured cells. Highly-specific, in vivo binding of the MS2 fusion protein to the MS2-tagged RNA enables GFP-based localization or anti-FLAG IP of the RNA of interest [8,9]. The IP conditions used in this protocol are based on the MS2-IP protocol developed by Gong and Maquat, with several notable modifications [10]. First, we have designed an exclusively nuclear version of the MS2 fusion protein, enabling purification of nuclear RNA-associated transcription factors. Second, the IP is performed in the absence of any cross-linking. Third, to ensure protein compatibility with downstream transcription factor activation arrays, associated proteins and RNAs are gently eluted from beads using FLAG peptide.
Following FLAG peptide elution, the immunoprecipitate is directly input into a transcription factor activation array capable of monitoring multiple transcription factors simultaneously. This technology relies on a series of biotin-labeled DNA probes corresponding to consensus sequences of target transcription factors. When cellular lysate, or RATA immunoprecipitate, is incubated with this mixture of labeled probes, specific transcription factor/probe complexes will form. These protein/DNA complexes can be separated from unbound proteins and probes through spin-column separation. The retained probes are then released, hybridized to a plate in which each well is pre-coated with DNA complementary to a specific probe, and detected upon addition of horseradish peroxidase-conjugated streptavidin. While these arrays typically quantify activation (i.e. nuclear localization or expression) of transcription factors before and after a particular treatment, RATA uses the arrays to quantity the amount of transcription factors present in a control or RNA-specific IP.
High sensitivity, low cost, fast turnaround time, and the relative ease of use of RATA will allow scientists from a wide-range of disciplines to study RNA-transcription factor interactions. Although we have used RATA to identify transcription factors bound to lncRNAs, this protocol is compatible with other types of coding and non-coding RNAs. This is especially important as transcription factors have been shown to associate with a variety of RNAs implicated in many cellular processes and diseases, including non-coding and viral RNAs [11,12]. In summary, RATA offers significant advantages over existing technologies and is critical for scientists otherwise hindered from studying RNA-transcription factor interactions.

PROCEDURE
This protocol was optimized for the adherent A375 melanoma cell line. Expression, lysis and elution conditions may need to be optimized for each cell line, and steps requiring optimization are highlighted throughout this protocol and shown schematically in Figure 1. MS2-based IP of the RNA should be verified before completing the RATA protocol for an RNA of interest. To verify IP, complete the protocol through the washes at step 6.6.2, and split the eluate for western blot analysis of MS2 IP, and RT-qPCR analysis of RNA-IP. For all RNA isolations, lysate or eluate can be added directly to TRIzol ® /TRIzol ® LS or QIAGEN's buffer RLT, and RNA isolation should be completed according to manufacturer's instructions.

Plasmid generation
1. Design MS2-tagged and control expression vectors for RNA of interest

1.1.
Clone your RNA of interest into an appropriate mammalian expression vector (such as pcDNA3.1, Invitrogen) and confirm expression in your cell line of choice via RT-qPCR. This will serve as the untagged control vector.

1.2.
Insert 12 copies of the MS2 stem-loop RNA sequence at the 5' or 3' end of your RNA in the control vector. The MS2 stem-loop sequences may be cloned and inserted from preexisting vectors, such as Addgene plasmid # 27119 [8]. If possible, confirm that the insertion of the MS2 stem-loop sequences do not interfere with normal functionality of the RNA. For example, differential gene expression or phenotypic changes may be quantified following transfection of either the untagged or tagged RNA.

CAUTION:
Many lncRNA annotations include an AT-rich transcription termination sequence. Inclusion of this sequence upstream of the MS2 stem-loops may prematurely terminate transcription and exclude the MS2 stemloop tag from the transcript. The RNA sequence of interest should be examined and, if present, the transcription termination sequence should be excluded from the MS2 expression vector. To ensure similar expression levels between the tagged and untagged transcripts, we recommend cloning the same sequence for expression of both the

POL Scientific
Protocol control and MS2-tagged RNA.

NOTE:
As with any screening methodology, the possibility of false positive hits should be considered. In RATA, false positive signals may result from non-specific interactions of transcription factors with the MS2 stem-loops.
To account for this possibility, we recommend performing a control RATA experiment using cells transfected with the MS2 fusion expression plasmid and either an empty or MS2 stem-loop expression plasmid, both lacking the RNA of interest. This will identify any transcription factor signals resulting from non-specific interactions with the MS2 stem-loop RNA tag. False positive signals may be reduced by optimizing the RATA protocol (for example, decreasing the amount of plasmid used for cell transfection). Additionally, signals from the control RATA experiment may be subtracted from RATA results using MS2 stem-loop expressing plasmid containing the RNA of interest.

Prepare plasmids
2.1. Culture E. coli containing RNA or MS2 expression plasmids overnight in a shaking incubator at 37°C in LB with ampicillin (100 µg/ml) or correct antibiotic for selection.

2.2.
Purify plasmids using QIAGEN's plasmid maxi kit according to manufacturer's instructions.

2.3.
Measure DNA concentration and purity using DeNovix.

Preparation of mammalian cell extract
All reagents and materials used during tissue culture manipulation should be used in a sterile tissue culture hood equipped with ultra-violet light for decontamination. Nuclear localization of the MS2 protein should be confirmed for each new cell line via fractionation and western blot analysis, or via microscopy.

3.1.
Seed cells at ~40% confluency (cell number depends on cell type) in 10 cm dishes in appropriate media (such as DMEM + 10% FBS).

Detach cells by scraping into 5 ml of ice cold 1 × PBS.
NOTE: Cells may also be detached by trypsinization (incubate cells with ~3 ml of trypsin for ~5 min or until cells detach, transfer to 15 ml conical tube with ~7 ml of media, and collect cells by centrifugation at 300 × g for 3 min) followed by an additional rinse in 5 ml of 1 × PBS.

4.3.
Collect cells by spinning at 300 × g for 3 min, remove PBS.
TIP: Cells may also be snap-frozen (using liquid nitrogen or crushed dry ice) and stored at −80°C if not proceeding directly to cell lysis and IP.

Cell lysis and IP
To prevent loss of RNA, we recommend the use of low-adhesion/retention tubes and pipette tips for all steps of IP.

5.1.
Add 500 µl of cold lysis buffer. Pipette up and down to resuspend cells, and transfer to a 1.5 ml tube.

5.2.
Rotate at 4°C for 10 min. TIP: (optimization): These lysis conditions do not completely disrupt the nucleus, but rather allow for 'leaking' of nuclear proteins. NaCl concentrations may need to be optimized for each cell line. To optimize NaCl concentration, divide equal volumes of cell suspensions into 1.5 ml tubes, add varying amounts of NaCl to each tube that covers a range between 150 mM to 300 mM final NaCl concentrations, and complete protein lysis. Western blot analysis and probing with an anti-FLAG antibody will confirm the concentration of NaCl required for maximal isolation of the FLAG-NLS-MS2-eGFP protein.

5.4.
Incubate on ice for 10 min, with occasional and gentle swirling.

5.5.
Centrifuge at 17000 × g at 4°C for 10 min, and transfer supernatant to new tube.

8.2.
Follow manufacturer's instructions to complete the transcription factor array.

CRITICAL STEP:
The transcription factor activation array must be completed for both the tagged RNA-IP and the untagged RNA-IP negative control simultaneously in order to allow determine IP-specific transcription factor enrichments. The use of the untagged RNA control will account for any changes in expression (and therefore background signal) of any transcription factors affected by expression of the RNA.

8.3.
Following completion of the protocol and quantification of the signal from each well on a microplate reader, analyze the data as follows (and as exemplified in Table 1):

8.3.1.
Normalize raw values by calculating the ratio of each transcription factor specific signal over the signal of a non-specific probe (i.e. the signal corresponding to a transcription factor that is not bound by the RNA of interest). Transcription factor II D (TFIID) can be used in most cases.

8.3.2.
Calculate individual transcription factor enrichment by taking the ratio of the normalized signal of the specific (tagged) RNA-IP compared to the normalized signal of the control (untagged) RNA-IP.

NOTE:
To account for fluctuations in transcription factor levels between samples, we recommend repeating the assay for a total of 2-3 biological replicates to confirm specific signals.

ANTICIPATED RESULTS
The protocol proved here offers low-cost, highly-sensitive identification of transcription factors associated with an RNA of interest. Our method uses a slightly modified MS2-based IP, which has successfully been used to purify RNAs, including lncRNAs [9,10]. We have designed a plasmid expressing an exclusively nuclear version of FLAG-tagged MS2-eGFP fusion protein (Fig. 2). While fractionation and western blot analysis of A375 cells transfected with this plasmid confirms nuclear localization of the FLAG-NLS-MS2-eGFP fusion protein (Fig. 3), nuclear localization of the fusion protein should be confirmed for each new cell line prior to RATA. As an alternative to cellular fractionation, localization may be verified through microscopy-based visualization of the eGFP tag.
Recently, we used RATA to identify transcription factors associated with the lncRNA SLNCR1 (LINC00673) [6]. As shown in Figure 3, lysis conditions used in this protocol efficiently recover the nuclear fusion protein, and the associated MS2-tagged SLNCR1 RNA is routinely enriched ~100-fold over an untagged control ( Fig. 4A and 4B). As opposed to recovery of bound proteins and RNAs through boiling in 2 × Laemmli buffer (Fig. 4A and 4B), elution using FLAG peptide recovers slightly lower amounts of protein and RNA, usually between 5-and 40-fold enrichment of the tagged RNA ( Fig. 5A and 5B). Despite this lower recovery, RATA is sensitive enough to detect transcription factors present in eluate from FLAG peptide elution ( Table 1) even when the transcription factor (in this example, the androgen receptor (AR)) is not detectable via western blot analysis of the MS2-tagged SLNCR1 immunoprecipitate (Fig. 4A). In addition to being very sensitive, enrichments detected via RATA are highly-reproducible between biological replicates. For example, RATA of SLNCR1 shows reproducible enrichment of Brn3a (25-fold in replicate 1, and 12-fold in replicate 2), AR (10-fold in replicate 1, and 8-fold in replicate 2), and PAX5 (8-fold in replicate 1, and 11-fold in replicate 2) ( Table 1 and data not shown).
Thus, because levels of transcription factors may naturally fluctuate between samples, IP-specific signals may be confirmed through 2-3 biological replicates of RATA.    Because our methodology requires overexpression of the tagged RNA of interest, there are important potential caveats to consider when using this approach. First, inclusion of an RNA tag may affect secondary structure of the RNA, and therefore may also affect interactions with partner proteins. If possible, phenotypic or gene expression changes should be quantified following overexpression of either the untagged or tagged RNA to determine if the tag interferes with normal functionality of the RNA (Step 1.2.1). Ideal placement of the RNA tag may need to be determined empirically. Second, overexpression of RNA may increase/ induce non-specific binding with proteins that do not interact with the RNA under physiological conditions. Thus, candidate RNA-transcription factor interactions identified through RATA should be validated through orthogonal assays. For example, RIP assays, in which an RNA binding protein of interest is immunoprecipitated and associated RNAs are identified through RT-qPCR, may be used to confirm if a candidate transcription factor interactor identified through RATA maintains its interaction with endogenous RNAs [13]. Indeed, RIP assays confirmed the two top SLNCR1 interactors (Brn3a and AR) [6].
While RATA is limited to the identification of transcription factors included in the arrays, there are multiple versions of commercially available transcription factor activation arrays compatible with this technique. Signosis offers 12 different arrays, each including various combinations of over 150 unique transcription factors. Panomics offers 5 different transcription factor activation arrays (TranSignal™ Protein/ DNA Array Systems), with their largest array (Combo Array) able to quantify 345 transcription factors in a single assay. Collectively, the Panomics and Signosis Arrays cover over 400 unique transcription factor in varying combinations. Importantly, these transcription factors represent the most biologically relevant and well-characterized transcription factors, covering approximately 25% of the 1672 currently identified human transcription factors [14]. Additionally, because the TranSignal™ Arrays rely on membrane-based chemiluminescent detection, with DNA probes spotted on a membrane rather than a 96-well plate, these arrays may be ideal for scientists who do not have access to a plate reader. For scientists with access to a Luminex machine, Panomics also offers two Luminex ® based assays capable of quantifying 40 transcription factors in up to 96 samples (such as controls, technical-and/or biologicalreplicates) in a single plate. The wide variety of transcription factor arrays available enables great flexibility in assay design. The selection of a transcription factor array depends on the RNA being studied and the type of information desired. For example, if a scientist is interested in testing the hypothesis that a lncRNA binds to transcription factors, Signosis' Array I or II, which contain 48 or 96 of the most well-studied transcription factor, respectively (many of which have been shown to bind to lncRNAs), may be suitable. Alternatively, if the lncRNA in question is expressed exclusively in cancer and implicated in tumorigenesis, Signosis' Cancer Stem Cell Array (containing 4 sets of 23 transcription factor specific probes) may be appropriate. For a more comprehensive analysis of associated transcription factors, scientists may choose to screen multiple arrays, or use the 345 probe Panomics Combo Array. It may also be helpful to consider which transcription factors are expressed in the cell line being tested; however, the arrays are very sensitive and are capable of detecting transcription factors even when they are not detected via western blotting.
This report details a novel protocol for the efficient and low cost identification of transcription factors associated with an RNA of interest, including lncRNAs. We have designed an exclusively nuclear POL Scientific Protocol MS2 fusion protein and optimized conditions to enable highly efficient isolation of ribonucleoprotein complexes. This easily implemented protocol will enable scientists from a range of fields to more efficiently and comprehensively investigate the role that RNAs play in modulating transcription factor activity and regulating gene expression.  refers to the first elution, E2 refers to a second elution from the same IP. Total protein input or eluted proteins were subjected to western blot analysis, and the blot was probed with 1:5000 ANTI-FLAG M2 (Sigma) and 1:5000 α-GAPDH (Cell Signaling). (B) Relative enrichment of the indicated transcripts from E2 as measured by RT-qPCR compared to RNA enriched from cells expressing SLNCR1 without MS2 stem-loops.

POL Scientific
Protocol TROUBLESHOOTING Potential problems and troubleshooting suggestions are listed in Table 2.