Technologies of high-throughput tissue-specific gene expression profiling

Gene expression profiling is an important strategy to study animal development, response to stimuli and diseases. RNAs measured in gene expression profiling experiments are frequently purified from mixture of multiple cell types. The resultant data have low resolution, incapable of distinguishing transcriptome of different cell types and likely biased towards up-regulated genes in dominant tissues. These problems can be solved by obtaining tissue-specific gene expression profile. For dozens of years, there have been several strategies developed to isolate specific tissues or purify RNAs from tissue of interest, and combined with high-throughput RNA assays to generate transcriptome of various specific tissues or cell types. This review will introduce basic principles of these methods and their application in large-scale transcriptome analysis, and discuss on their advantages and limitation.


Introduction
In multicellular organisms, vital processes including development, physiology and response to stimuli require precise regulation of genome on the molecular level, which is manifested as temporal and spatial expression patterns of genes.Gene expression profiling is a substantial measurement of gene expression patterns under various conditions that contribute to a global picture of gene transcription genome-wide.Tissue-specific transcriptome analysis on this basis provides us with comprehensive insights into the complicated nature of genome.
Fluorescence microscopy played a significant role among approaches of profiling tissue-specific gene expression.Fluorescence in situ hybridization (FISH) [1] uses fluorescent probes to bind specific nucleic acid targets, while immunofluorescence can use fluorophore-linked antibodies to target specific proteins [2] .Recombinant technology allows a fluorescent protein to ligate covalently to the product of target gene [3,4] .Generally, gene profiling technologies based on fluorescence microscopy are not only intuitive but also precise enough to reach single molecular level accuracy [5] .However, these technologies are expensive and labor intensive in large scale gene expression profiling projects, let alone that image analysis is not automated enough to REVIEW improve the throughput.Methods of large scale gene expression profiling have also been developed for decades.Microarrays are capable of producing large amounts of gene expression data based on hybridization of mRNA output to probes [6] .Meanwhile, approaches based on next-generation sequencing technology (SGS) such as serial analysis of gene expression (SAGE) [7] , RNA-Seq [8] , chromatin immunoprecipitation sequencing (ChIP-Seq) [9,10] , DNase I hypersensitive sites sequencing (DNase-Seq) [11,12] and assay for transposase-accessible chromatin with high throughput sequencing (ATAC-Seq) [13] can profile gene expression in the form of sequence read, which is feasible to quantification and automated analysis.In this way, gene expression patterns and regulatory networks can be characterized systematically.However, despite the fact that these methods can be scaled up, the biological meanings and regulation mechanisms can often be ambiguous.One of key reasons is that sample preparation process can not reach tissue level sometime, especially in isolation or cultivation of a specific cell type in small organisms, e.g.Caenorhabditis elegans and Drosophila melanogaster.
Recent work focusing on functional genomics have enabled leaps in methodology of efficient profiling of tissue-specific transcriptome, including cell sorting, RNA immunoprecipitation, RNA tagging.New designs are constantly developed with improvement in not only throughput but also precision.The focus of this review is to outline basic principles of these methods and their application in large-scale transcriptome analysis, and discuss on their advantages and limitation.

Cell sorting
The idea to obtain tissue-specific transcriptome based on cell sorting has been developed for decades.Depending on tissue types, frequently used sample dissection techniques include direct isolation [14] , enzymatic disaggregation [15] , laser-microdissection [16] and direct single cell RNA sequencing [17] .For samples that are easily dissociated, cells can be labelled with fluorescence using fluorescent protein reporter assay or immunofluorescence targeting tissue-specific antigens on cell membranes before fluorescence-activated cell sorting (FACS) is used to collect fluorescence labelled cells for further gene expression profiling.A recent study has proven that gene expression profiling based on cell sorting can reach the level of single cells in mammalian embryos.After the mouse embryo was dissected and dissociated into a single-cell suspension, cells from the epiblast and nascent Flk1+ mesoderm of gastrulating mouse embryos were immunofluorescence labelled before FACS.This way, researchers were able to investigate early mesoderm diversification through single-cell RNA sequencing [18] .Cell sorting is feasible not only in mammalian tissue and cell culture, but also in other model organisms.For instance, in order to identify genes related to touch receptor neurons in C. elegans, researchers generated worm strains expressing green fluorescent protein (GFP) only in mechanosensory neurons and their precursor cells.After disaggregating worm embryo into single cells, cells were cultured in vitro until some of them expressed GFP and then these cells were isolated by FACS.Comparing microarray analysis result from cells from wild type and mec-3 mutant, which fails to develop mechanosensory neurons, researchers were able to discover genes overexpressed or suppressed in the process of mechanosensory neuron development and unveil some regulators previously unknown [19] .Scientists also isolated cells from maize shoot apical meristem (SAM) using laser-microdissection.After next generation sequencing of the selected cells' cDNA library, new transcripts were discovered [20] .
However, the efficiency of methods based on cell sorting drastically depends on sample types.Thus, cell sorting is frequently used in embryo, blood, bone marrow, etc., while in other cases, additional manipulations are usually needed.Taking C. elegans as an example, in order to circumvent the tough cuticle barriers that larvae and adults carry, neurons and muscle cells can be cultured in vitro after isolation from embryo [21] .This result in obscure understanding of biological processes in vivo.In another study, researchers applied the isolation of Nuclei Tagged in A specific Cell Type (INTACT) technique [22,23] .In this method, nuclei were affinity-labeled through the transgenic co-expression of nuclear tagging fusion (NTF) carrying biotin ligase recognition peptide (BLRP) together with biotin ligase (BirA) in the cell type of interest.After total nuclei were isolated, biotin-labeled nuclei purified using streptavidin-coated magnetic beads were prepared for transcriptome and epigenetic analysis.INTACT has been successfully applied in many model organisms, e.g. C. elegans, D. melanogaster and A. thaliana [22][23][24] .

RNA immunoprecipitation
RNA immunoprecipitation (RIP) is the method focused on interaction between RNA and protein in vivo.The basic principle of RIP is generating protein-RNA cross-links between molecules in close proximity in vivo, then RNA molecules cross-linked with a given protein are isolated by immunoprecipitation of the protein [25] .RIP assay targeting poly(A)-binding protein (PAB) driven by tissue-specific promoter can achieve the collection of messenger RNAs from a certain group of cells.This method called PAB-RIP was first tested in C. elegans muscle [26] .PAB-RIP is very sensitive, even up to single-cell level.As proved in a recent study, two transgenetic worm strains separately expressing PAB in the amphid sensilla neuron ASEL and ASER was generated.Using PAB-RIP and genome-wide microarray, researchers compared the gene expression profiles of two functionally divergent neurons in the C. elegans gustatory neuron class ASE and discovered 188 genes with asymmetry expression in ASEL and ASER [27] .PAB-RIP also played an important role in C. elegans functional genomic studies.In the model organism ENcyclopedia of DNA Element (modENCODE) project, PAB-RIP was used to measure the transcriptome of 25 different tissues from distinct developmental stages.Tiling array profiles revealed the spatial and temporal expression pattern in 75% of genes while more than 200 long non-coding RNAs (lncRNA) were discovered, most of which also showed tissue-specific expression patterns [28] .Generally, PAB-RIP generates large scale tissue-specific transcriptome data, which establishes the foundation of high-throughput and precise gene expression profile.
The shortcoming of PAB-RIP lies in the insufficient signal-to-noise ratio when targeting a very small number of tagged cells among the whole tissue extracts in large organisms.RNA-IP also requires the non-covalent interaction between mRNA and PAB to be fixed by formaldehyde cross-linking, which is time consuming to optimize the protocol.In order to reduce the background noise, sample can be firstly dissected and selected to remove unnecessary input.In an enhanced technique named translating ribosome affinity purification (TRAP), researchers introduced fusions of ribosomal proteins with enhanced green fluorescent protein (EGFP) in several specific mouse neurons.Here EGFPs not only provided visualization of target cells, which are more conveniently isolated under a microscope, but are also easily purified by anti-EGFP-coated magnetic beads after the immunoprecipitation assay targeting ribosomal proteins.Using TRAP, researchers compared mRNA profiles from EGFP-bound ribosomes and those from unbound fractions to distinguish two morphologically identical, incorporated subclasses of medium spiny neurons at the transcriptome level [29,30] .

RNA tagging
Protozoan genomes encode a few nucleotide synthetases that are able to process nucleoside analogues into nucleotide analogues, which can be incorporated into RNA molecules.The principle of tissue-specific gene expression profiling using covalent RNA labeling is transforming one of those nucleotide synthetases driven by a tissue-specific promoter into cells of interest and label RNA molecules in these cells with nucleoside analogues.By isolating labeled RNAs, the cell type-specific transcriptome is obtained.For example, uracil phosphoribosyltransferase (UPRT) from Toxoplasma gondii can transform the thio-substituted compound 4-thiouracil (4tU) into 4tUMP [31] .Researchers used the GAL4/UAS system to drive UPRT expression specifically in glial cells of Drosophila larval brains.After total RNAs were extracted from transgenic larvae fed with 4tU, RNA molecules carrying 4tU were biotin labeled and then purified using streptavidin affinity chromatography.Results showed that glia-specific RNA were effectively isolate by TU-tagging without prior cell dissociation [32] .The bias that transcripts with more uracils had larger chance to be observed was removed by normalization using regression equation [32] .Gene expression profiling using the TU-tagging method depends on not only the specific expression of UPRT but also the 4TU treatment, which provides an opportunity for researchers to investigate temporal and spatial expression patterns through a chemical/genetic intersectional method.In this advanced design, the spatial specificity of gene expression is confined by cell-specifically expressed UPRT while the temporal specificity is confined by precise treatment of 4TU [33] .However, the TU-tagging method is not widely used currently because of its complexity.
Another RNA tagging method named trans-Splicing based RNA Tagging (SRT) was invented based on the mechanism of trans-splicing between the spliced leader (SL) RNA and pre-mRNA in C. elegans [34] .Through driving the expression of tagged SL1 transgenes by a tissue-specific promoter, cell type-specific transcriptome can be separated by tag.For example, the Illumina adaptor sequence was used to tag SL1 and the tagged SL1 was driven by the tissue-specific myo-3 promoter.Therefore researchers were able to profile the gene expression of body muscle and discovered some new transcripts in body muscle.However, trans-splicing between the spliced leader (SL) RNA and pre-mRNA does not exist in most organisms.Even in C. elegans, there are ~30% of genes that are not trans-spliced under the mechanism and thus can not be profiled.

Discussion
The main purpose of studying gene regulation is to investigate the precise time and place of gene expression.Tools of gene expression profiling have witnessed and proved the substantial progress in genomics.For example, SRT is based on full study in Splicing-Leader-guided trans-splicing, TU-tagging depends on the discovery of UPRT and PAB-RIP relies on the discovery of PAB and development in the immunoprecipitation assay.Therefore, with new discoveries and progress in genomics in the future, we can anticipate new methods in high-throughput gene expression profiling and further development in existing techniques.Together with advances in imaging, statistical methods and computational approaches, study of gene expression profiling will guided us to a deeper understanding of gene regulation.
In conclusion, precise measurement of amount, time and place of gene expression is a foundation of novel discoveries across all areas of biology.Tools of high-throughput tissue-specific gene expression profiling provide us with means to test our models in gene regulation that have been built in functional genomic studies.With further development in gene expression profiling techniques, we will be able to understand gene regulation in a more fundamental perspective.

Conflicting interests
The authors have declared that no conflict of interests exist.