A highly sensitive trap vector system for isolating reporter cells and identification of responsive genes

Abstract We devised a versatile vector system for efficient isolation of reporter cells responding to a certain condition of interest. This system combines nontoxic GAL4-UAS and piggyBac transposon systems, allowing application to mammalian cells and improved expression of a fluorescent reporter protein for cell sorting. Case studies under conditions of c-MYC gene induction or endoplasmic reticulum (ER) stress with thapsigargin on mouse or human cell lines confirmed easy and efficient isolation of responsive reporter cells. Sequence analyses of the integrated loci of the thapsigargin-responsive clones identified responsive genes including BiP and OSBPL9. OSBPL9 is a novel ER stress-responsive gene and we confirmed that endogenous mRNA expression of OSBPL9 is upregulated by thapsigargin, and is repressed by IRE1α inhibitors, 4μ8C and toyocamycin, but not significantly by a PERK inhibitor, GSK2656157. These results demonstrate that this approach can be used to discover novel genes regulated by any stimuli without the need for microarray analysis, and that it can concomitantly produce reporter cells without identification of stimuli-responsive promoter/enhancer elements. Therefore, this system has a variety of benefits for basic and clinical research.


Introduction
Reporter cells are among the most important tools to promote the development of research [1]. They can be utilized for many valuable purposes, such as drug or toxicity evaluation [2][3][4], connecting signal pathways [5,6], and the selection of cells that survive a specific condition [7]. Among many alternatives, gene trap-based technology is the most powerful method to obtain reporter cells accompanied by responsive gene identification [8]. This technology uses a vector, which integrates randomly into a genome and is designed to express a reporter gene driven by the near cis-acting promoter/enhancer elements.
By taking advantage of this, it is possible to conduct broad genome-wide screening without using a microarray and to identify even a minor change of gene expression buried in a sea of other genes. It also allows direct usage of isolated clones as reporter cells without the difficulty of determining the promoter/enhancer region.
A green fluorescent protein (GFP)-reporter-based retroviral gene trap vector system was previously developed [9]. This system works well but could be improved upon, mainly because of a low level of fluorescent reporter gene expression making it difficult, particularly in transient conditions, to segregate stimulus-dependent clones from the enormous population of other negative cells. In addition, the efficiency of viral vector packaging would be lower when an introduced trap cassette sequence is made longer or more complicated, such as for transcription termination, to improve its sensitivity. It is also known that retroviral genomic integration preferentially occurs upstream of actively transcribed genes [10][11][12], restricting the number of gene targets. To resolve these problems, a GAL4-UAS system was adopted to improve the sensitivity of the trap vector. In addition, the piggyBac transposon vector system was employed as a delivery vehicle for rapid and easy transfection, creating almost no limitations regarding the delivered DNA elements and the length of the trapping cassette, and practically overcoming the limitation of there being a preferential site for genomic integration. Here, we report the dramatic improvement of the trap vector system, especially in its sensitivity, and case studies of the isolation of reporter cells responding to c-MYC gene induction and endoplasmic reticulum (ER) stress by thapsigargin.

Vector construction
All used vectors are listed in Supplementary Table S1. polymerase chain reaction (PCR), enzymatically digested, and/or annealed oligo DNA fragments were joined by either ligase reaction (Nippon Gene, Tokyo, Japan, or Takara, Kyoto, Japan) or recombinant reaction by In-Fusion (Takara).

Cell culture, viral infection, and plasmid transfection
NMuMG cells (kindly provided by K. Miyazawa, University of Yamanashi, Japan) were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum (FBS), 100 mg/ml streptomycin sulfate, 100 U/ml penicillin G potassium (SMPG), 10 mg/ml insulin, and 0.45% glucose. HeLa cells were cultured in DMEM (10% FBS, SMPG). One day before transfection, 2 Â 10 5 cells/well for NMuMG cells or 7 Â 10 4 cells/well for HeLa cells were plated on 12-well plates. Transposon helper and donor vectors were introduced by pouring the mixture (the ratio is shown in Tables 1 and 2) of 1-2 mg of DNA and 4-6 mg of polyethylenimine (PEI) (Polysciences, Warrington, PA) into 100 ml of Opti-MEM for a culture volume of 1 ml. The procedure for the preparation of retroviral and lentiviral vectors and their infection was previously described [13]. Pantropic VSV-G was used as their envelope. The HeLa cells used for screening thapsigargin-responsive cells were yielded by three repeated treatments of 10 nM thapsigargin overnight and expansion. For analysis or screening, cells were treated overnight or for the time indicated with 100 nM thapsigargin (Sigma, St Louis, MO), 10 mg/ml tunicamycin (Sigma), or 200 mM forskolin (Wako). When IRE1a inhibitors [4l8C (Sigma), and toyocamycin (Cayman Chemicals, Ann Arbor, MI)] or a PERK inhibitor (GSK2656157, Adooq Bioscience, Irvine, CA) were used, cells were treated with these reagents for 1.5-2 h before the addition of 100 nM thapsigargin and collected after a further 15 h or the time indicated.

Real-time PCR
RNA was extracted from the cultured cells using Isogen (Nippon Gene). Reverse transcription was performed using the SuperScript III First-Strand Synthesis System (Thermo Fisher Scientific) with random hexamer as a primer mix. THUNDERBIRD SYBR qPCR mix (TOYOBO, Osaka, Japan) was used for real-time PCR, which was executed using the StepOnePlus Real Time PCR System (Applied Biosystems). Human HPRT1 was used as a relative control. Primers are listed in Supplementary Table S3.

Luciferase assay
After washing once with PBS, cells were lysed with LCb buffer (Toyo Inki, Tokyo, Japan) at room temperature for over 30 min with intense agitation. Luciferase activity in the cleared lysates was measured by mixing a luciferin substrate (Promega, Madison, WI) using TriStar 2 S LB942 (Berthold Technologies, Bad Wildbad, Germany). The values of luciferase activity were standardized by the protein concentration titered by the BCA protein assay kit (Thermo Fisher Scientific).

Construction of highly sensitive trap vector
To increase the sensitivity of promoter/enhancer trap vectors, we utilized a nontoxic GAL4-UAS system developed for vertebrate species [16,17]. The modified Gal4, called GAL4FF, consists of an extremely trimmed minimal DNA-binding site of yeast Gal4 transcription factor and a few repeats of minimal transcription activation module from VP16. This engineered Gal4 can bind to an upstream activating sequence (UAS) and strongly activates downstream reporter genes. We placed eight repeats of 5 0 -cggagtactgtcctccgag-3 0 UAS upstream of the EGFP gene (Fig. 1A).
For a vehicle for delivering a trapping cassette, we chose a transposon vector system [18], which allows fairly random integration events compared with viral vector systems [11,19], loading of a long DNA element (over 100 kb) [20][21][22], as well as rapid and easy manipulation upon introduction into cells. We inserted trapping modules between the TR (terminal repeat)/IR (inverted repeat) sequences of piggyBac transposon in the 3 0 -to-5 0 direction of its original designation to reduce leaky expression activity [23]. By coexpression of helper transposase vector [here, we use the hyperactive variant of piggyBac transposase (hyPBase) [24]], DNA elements between TR/IRs can be integrated into the host genome via a cut-and-paste mechanism. At the head of the trapping module, a strong synthetic splicing branch and acceptor sites were placed, aiming to receive the splicing donor of an endogenously expressed transcript. To express GAL4FF at trapped loci, we designed two types of vector structure. One is a three-frame stop codon followed by IRES (10 repeats of Gtx-m3 [25,26], Fig. 1A, GAL4-UAS, IRES). The other has three-frame patterns of porcine teschovirus-1's 2 A selfcleaving peptide configuration [27][28][29] (Fig. 1A, GAL4-UAS, three-frame P2A). In our observation, the latter (three different frame vectors) is more stable (Supplementary Figure S1) and has lower, although occasionally appropriate, trapping efficiency. Besides the EGFP reporter gene, a firefly luciferase (Fluc) gene is introduced with the aim of proceeding to a quantitative assay after the isolation of reporter cells. Here, we introduced the Luc2CP gene (from pGL4.16; Promega) encoding an unstable version of Fluc by hCL1 and hPEST destabilization peptide sequences in front of GALFF, linked by a P2A sequence.
To test whether the GAL4-UAS trapping vectors work in human cells, we cotransfected the helper and donor vectors into HeLa cells. Compared with non-GAL4-UAS vector (Fig. 1A, DGAL4-UAS, IRES), our developed vector dramatically enhanced the expression of EGFP reporter protein (about 100-fold of max. EGFP signal) to ensure easy differentiation from EGFP-negative cells (Fig. 1B). Importantly, the EGFP expression levels were evenly distributed from low to high, suggesting that the donor sequence is inserted into myriad loci without a strong bias of gene expression and of transcriptional amplification by the GAL4-UAS system.
Case studies using the highly sensitive trap vector system for isolating reporter cells

Isolation of mouse reporter cells responsive to c-MYC gene expression
To demonstrate the practicality of our system, we first attempted to isolate cells responding to the expression of a gene of interest.
Here, we tested a transcription factor, c-MYC. c-MYC is one of the most well-known oncogenes, which, in most known typical mechanisms, functions in heterodimeric form to activate many genes involved in cell cycle progression and survival. We chose the mouse mammary epithelial cell line, NMuMG, because it is known to be regulated by c-MYC expression in tumor aggressiveness [30] and can expand from a single cell, which is essential for the cloning step. To be able to switch the gene expression on and off repeatedly, which is important for extracting highly responsive cells, we introduced the Tet-On system into the NMuMG cells, where c-MYC expression is monitored by Keima (hmKeimaRed) [P EF1 -Tet3G-IRES-neo R or hygro R , TRE3G-c-MYC-IRES-Keima, Fig. 2A). To obtain cells that can strongly induce c-MYC en masse, we performed several cycles of positive and negative cell sorting under doxycycline (Dox(þ) and Dox(À) conditions, respectively (Supplementary Figure S2)]. After the strongly c-MYCinducible cells had been established, the cells seeded at approximately 7 Â 10 4 per sample for transfection were co-transfected with the donor (Fig. 1A) and hyPBase helper vectors. After another repeated session of negative and positive sorting under Dox(À) and Dox(þ) conditions, respectively, we obtained two clones from a total of about 2 Â 10 6 seeded cells for transfection (Figs 2B and 3A, Table 1). Specific reaction upon c-MYC expression was confirmed by a retroviral expression system other than the Tet-On system (Fig. 3B). Doxycycline-induced c-Myc protein expression was also confirmed by immunoblot analysis (Fig. 3C). The expression level of induced c-Myc protein was declined at 24 h after induction probably by degradation mechanism or doxycycline inactivation. Time course analyses of these clones after doxycycline induction by flowcytometer revealed that the time lag between EGFP expression and Keima expression was hardly recognized (Fig. 3D). Despite single cell sorting for the clone isolation, #B3F8 clone apparently had nonresponsive cell population even at 24 h after Dox induction. In contrast, #E-H1 clone showed responsive in almost all cells. Note that in our drug-responsive reporter cell collections, we observed some clones showed decreased reactivity after multiple passage, which was not recovered even after a further cloning procedure, while other clones inversely increased reactivity after enrichment of a responsive fraction or after a further cloning procedure. The mechanism of the determinants of this cell fate decision is still unclear, but one possibility is a reversible or irreversible feedback mechanism by epigenetic regulations such as DNA methylation. Genomic mapping of the integration sites by splinkerette PCR [14] and subsequent sequence analyses identified two known c-Myc-regulated genes (Supplementary Figure S3). One is Hsph1 (heat shock protein H1; also known as Hsp105), known to be expressed by c-Myc in human leukemia cells (e.g. see ref. [31]), which was recently demonstrated to physically bind to c-Myc protein as a chaperone and is required for aggressiveness in human lymphoma [32]. Another is Ddx21 [DEAD (Asp-Glu-Ala-Asp) box helicase 21], which has been demonstrated to be a coexpression marker with c-MYC in colorectal cancer [33] and is known to be directly transcribed by c-Myc [34]. These results demonstrate that our system easily isolates responsive clones and has great potential for identifying downstream or target genes by chance.

Isolation of HeLa reporter cells responsive to external stimuli
We next investigated whether this system could also isolate a cell clone responding to certain external stimuli. For this  Figure S2), one of the trapping vectors (Fig. 1A) was co-introduced with a helper (transposase) vector (here, we used hyPBase [24]). (B) Procedure for isolation of cells responsive to c-MYC expression. The cells transfected with the trapping vectors were treated without (À) or with (þ) 100 ng/ml doxycycline (Dox) for 1 day. Cells were collected and fluorescence-activated cell sorting was performed. The collected area is indicated by yellow arrows. In some cases, steps (3) and (4) were conducted repeatedly until decreases in the proportion of stably expressing EGFP(þ) cells, which were unwanted, no longer occurred. purpose, we tested two arbitrarily chosen reagents administered to HeLa cells. One is thapsigargin, best known for its activity of inhibiting Ca 2þ ion pump ATPases residing in intracellular membranes. The other is forskolin, known for activating adenylate cyclase. The introduction of donor and helper vectors into HeLa cells, followed by a few cycles of negative and positive cell sorting under stimulus (À) and (þ), respectively, resulted in the successful isolation of two independent clones for forskolin and six independent clones for thapsigargin, from totals of 1.2 Â 10 6 and 2.2 Â 10 6 seeded cells, respectively (Table 2). Sequence analyses of its integration sites (Supplementary Figure S4) revealed that one of the corresponding genes for thapsigargin is HSPA5 (also known as BiP or GRP78) [35] and another is a novel gene for responding to thapsigargin, OSBPL9 (also known as ORP9), known for regulating Golgi structure and function through cholesterol binding and transfer activity [36][37][38]. Time course analysis revealed that these clones expressed EGFP reporter protein detected only 6 h after administration (Fig. 4A). These clones did not react with another unrelated reagent (Fig. 4B), confirming the specificity of responsiveness to each stimulus. In contrast, the thapsigarginresponsive clones responded to the glycosylation inhibitor tunicamycin as they had responded to thapsigargin, both of which are known to induce ER stress (Fig. 4B, right panels). We confirmed that endogenous OSBPL9 mRNA is induced by thapsigargin (Fig. 5A). To determine which pathway is involved in OSBPL9 expression upon ER stress, several known inhibitors were tested. OSBPL9 mRNA was repressed by the IRE1a inhibitors 4l8C and toyocamycin (Fig. 5B), but not significantly by the PERK inhibitor GSK2656157 (Fig. 5C and D), suggesting that OSBPL9 is regulated by the IRE1 pathway upon the unfolded protein response [39]. The EGFP reporter expression and the firefly luciferase reporter expression (Fig. 1A) by thapsigargin in the isolated cell clone (#B2) were also inhibited by 4l8C ( Fig. 5E and F), similar to the endogenous gene expression (Fig. 5B). These results demonstrate that this trapping system enables us to isolate reporter cells accompanied by the identification of novel regulated genes.

Improvement of the reporter gene expression of the trap vector
Isolating reporter cells by using trapping technologies has an advantage over producing them by a knock-in or transgenic method. Specifically, it does not require consideration of responsive genomic elements and their length as well as the distance from a reporter gene to which they are linked. For reporter cell isolation, both drug-resistant selection and cell sorting approaches would be considered. The drug-resistant approach requires prolonged culture during positive cell selection to remove negative cells, and thus is restricted to limited conditions leading to a long-lasting response of cells. In contrast, a cell sorting approach can be used more broadly, even in transient stimulus conditions, if its expression is sufficient to detect. Moreover, both weakly and strongly responsive cells can be selected directly. Genes encoding fluorescent proteins have been used as reporter genes in the cell sorting approach, but their expression levels in conventional systems using EGFP were too low to segregate positive cells from the enormous number of negative ones, especially in transient stimulus conditions. In this study, we demonstrated that the amplification of reporter gene expression by the GAL4-UAS system dramatically improved trapping sensitivity (Fig. 1B), solving this problem. Using EGFP as a reporter, the newly devised system can detect responsive signals as early as 6 h after stimulus (Fig. 4A).
This ability enables us to isolate reporter cells efficiently. In fact, our improved trap vectors identified the OSBPL9 gene as a gene that responds weakly but significantly to ER stresses. This gene was not found among the top 250 significantly regulated candidate genes (as determined by GEO2R analysis) among public microarray data available from the NCBI Gene Expression Omnibus (GEO) for samples of human cell lines treated with thapsigargin. Among the public microarray data, we found that, in human neuroblastoma cells, SH-SY5Y (GEO accession: GSE24497) and IMR-32 (GSE6976) [40], OSBPL9 expression is significantly upregulated upon thapsigargin treatment.  Figure S5 for splinkerette PCR analysis. However, its ranking of statistical significance (by P-value) for the difference in expression was too low to identify OSBPL9 as an ER-stress-responsive gene. This example illustrates the difficulty in identifying a regulated gene from many catalogs of microarray data. Genetic screening as performed in this study is thus considered to be a more attractive method to highlight novel regulated genes.

Technical considerations for efficient isolation of reporter cells and determination of responsive genes
To optimize the efficiency with which reporter cells can be isolated, an optimal repertoire of trapped cells needs to be prepared. Through analyzing integration sites of the isolated clones from the same transfection sample, we noted that, occasionally, several unique integration sites other than one common site (e.g. Hsph1 locus) were seen among probable sister clones. [In the Hsph1 clone, we identified at least seven probable sisters out of 14 isolated colonies, which share a common integration site in the Hsph1 gene (Supplementary Figure S5).] This suggests that, after the first genomic integration in the parent clone, transpositional events had continued during several cell divisions, presumably because of the presence of residual donor vector and transposase. If this is true, repertoire cell number could be unexpectedly but conveniently increased. This may be one reason why we can successfully isolate expected clones efficiently from a small-scale culture. Paradoxically, the large repertoire is not always desirable for reporter isolation. We occasionally observed that a small number of cells that express EGFP reporter could not be removed despite performing multiple cycles of negative sorting for removal of the population constitutively expressing EGFP reporter. One possible explanation for this is that a specific cell-cycle-dependent gene is trapped and cannot be removed from the bulk culture. This background could be a major obstacle to isolation of the target clone. Thus, for HeLa and NMuMG cells, we currently operate transfection using a 12-well plate [in which 7 Â 10 4 (NMuMG) or 2 Â 10 5 (HeLa) cells/well were seeded], preparing multiple lots, and discarding an undesired lot (about 1 out of 3-10 lots) containing a population of permanently unremoved cells expressing EGFP, and finally mixing desired lots (about 10-12) to obtain large repertoire. This kind of procedure may also be required to obtain responsive clones in other cell types. Determination of the number of trap vectors integrated per cell is also important because the more excessively the trap vectors are integrated into a cell, the more frequently the expression specific to a certain condition is canceled out by other types of expression independent of the stimulus condition (such as constitutive expression). Moreover, the identification of the integration site, if necessary, becomes more difficult. We referred to the constitutively expressing EGFP(þ)% of cells before the first negative sorting as an indicator for avoiding excessive multiple integrations and yet preserving a sufficient repertoire. This is a convenient approach that does not require Southern blotting or NGS analysis. The donor vector: helper vector ratio may be one of the most important conditions critically affecting the number of integrations per cell. In our experiments, if the donor vector ratio increased [up to 10: 1(OD 260 )], the EGFP(þ)% and thus the integration number increased. Adjusting transfection time or total DNA amount also controlled the value of the EGFP(þ)%. To avoid excessive  (Table 2) and used as a control. Thapsigargin-responsive clone specifically reacted with thapsigargin, but not with forskolin. Thapsigargin-reacted clones also responded to the glycosylation inhibitor tunicamycin, both of which are known to induce ER stress.
integrations, we prepared transfected cells with a proportion of roughly 0.50%-2.0% EGFP(þ)% when using PEI as a transfection reagent. Similarly, for other cell types or other transfection reagents, simultaneous preparation of transfected cells under several conditions and determination of the rough optimum range of EGFP(þ)% may help to successfully isolate reporter cells.
Multiple integrations make it difficult to determine responsive genes. Fortunately, here we succeeded in identifying a gene (OSBPL9) as a novel candidate for responding to thapsigargin by splinkerette PCR and confirmed the responsive endogenous gene expression by RT-PCR. However, other responsive genes remain elusive in some clones (Table 2) due to the multiple integrations, and also due to the limited application by splinkerette PCR, which uses restriction enzyme digestion. Recent reports described an alternative method, named semiquantitative insertion site sequencing (QIseq), allowing high-throughput analysis for transposon insertion sites using acoustic shearing of DNA and optimized NGS [41][42][43]. These strategies based on genomic DNA amplification, however, are not appropriate to identify novel or poorly annotated transcripts. Therefore, 5 0 RACE or RNA-seq analyses would be an alternative approach because they can directly identify the trapped gene even if it expresses unannotated novel transcripts. These kinds of protocol should facilitate the precise identification of responsive genes in isolated reporter cells. Even after identification of a candidate responsive gene, there still remains the possibility that another trapped gene may have additional effect on the reporter gene expression. To rule out this possibility, specific removal of trap vector from the candidate gene by a genome editing technique may help to confirm that the identified gene is the sole factor responsible for the reporter expression.
In this study, we used only the piggyBac transposon system as a delivery vehicle for the trapping cassette. Other transposon systems (such as Sleeping Beauty [44] and Tol2 [45]) with different properties are also available [46]. First, their target short recognition sequences differ (piggyBac targets TATA [47,48], while Sleeping Beauty targets TA [49] and Tol2 targets weak consensus sequences including AT-rich palindrome-like sequences [50]). In addition, integration preferences are known to differ among the systems. A report demonstrated that PiggyBac prefers to integrate into transcription start sites, while Sleeping Beauty displays more random integration [51]. Another report found difference in insertion preference among PiggyBac, Sleeping Beauty, and Tol2 in analyses based on various factors involved in the 3 D organization of chromatin [52]. Therefore, their parallel use may provide further opportunities to find nonredundant-responsive elements [53].

Vector variations for quantitative analysis and putative applications
Here, we used vectors with a degradative version of firefly luciferase gene for a quantitative assay. During the preparation of this manuscript, we also constructed additional vectors in which, e.g., a nondegradative type of Fluc gene is placed after UAS repeats to achieve high luciferase activities by accumulation. The availability of multiple vectors helps to optimize the from two experiments is shown. *P < 0.001, **P < 0.0001 by t-test of four repeated data measurements of the representative experiment. (C) RT-PCR analysis for the effects of the PERK inhibitor GSK2656157 (1.5 h) on Tg-induced OSBPL9 expression. n ¼ 3, **P < 0.0001 by t-test, NS, not significant. (D) Immunoblot analysis using anti-ATF4 for confirming the repression of the PERK pathway by GSK2656157 shown in (C). Anti-a-tubulin blot is shown as a loading control. (E) Evaluation of thapsigargin response with EGFP reporter expression. #B2 clone, in which the OSBPL9 gene was trapped (Fig. 4A), was treated with the indicated concentration of 4l8C for 2 h, followed by the addition of 100 nM thapsigargin for 15 h and analysis by a flow cytometer. (F) Evaluation of thapsigargin response with Fluc reporter expression. #B2 clone was treated with the indicated concentration of 4l8C for 2 h, followed by the addition of 100 nM thapsigargin for 4 h. Cell lysates were subjected to Luc assay. n ¼ 9 in three experiments, *P < 0.05, **P < 0.0001 by t-test.
evaluation system according to the purpose of the particular research for which it is used.
Basically, using this approach, it will be possible to isolate cells responding to a variety of other conditions, such as drug stimulation, tumor malignancy, and hypoxia. We believe that our developed tool is expected to be a powerful approach to directly and efficiently isolate reporter cells and identify responsive genes, which may be extremely useful for many basic and clinical applications.

Availability
All vectors that we produced and predicted full sequences are available upon request.

Supplementary data
Supplementary data are available at Biology Methods and Protocols online.