In depth analysis of the Sox4 gene locus that consists of sense and natural antisense transcripts

SRY (Sex Determining Region Y)-Box 4 or Sox4 is an important regulator of the pan-neuronal gene expression during post-mitotic cell differentiation within the mammalian brain. Sox4 gene locus has been previously characterized with multiple sense and overlapping natural antisense transcripts [1], [2]. Here we provide accompanying data on various analyses performed and described in Ling et al. [2]. The data include a detail description of various features found at Sox4 gene locus, additional experimental data derived from RNA-Fluorescence in situ Hybridization (RNA-FISH), Western blotting, strand-specific reverse-transcription quantitative polymerase chain reaction (RT-qPCR), gain-of-function and in situ hybridization (ISH) experiments. All the additional data provided here support the existence of an endogenous small interfering- or PIWI interacting-like small RNA known as Sox4_sir3, which origin was found within the overlapping region consisting of a sense and a natural antisense transcript known as Sox4ot1.

in Ling et al. [2]. The data include a detail description of various features found at Sox4 gene locus, additional experimental data derived from RNA-Fluorescence in situ Hybridization (RNA-FISH), Western blotting, strand-specific reverse-transcription quantitative polymerase chain reaction (RT-qPCR), gain-of-function and in situ hybridization (ISH) experiments. All the additional data provided here support the existence of an endogenous small interfering-or PIWI interacting-like small RNA known as Sox4_-sir3, which origin was found within the overlapping region consisting of a sense and a natural antisense transcript known as Sox4ot1.
& 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Biology. More specific subject area RNA Biology or Neurogenetics.
Type of data Genbank file, table, bar charts, micrographs, MOV files and statistical analysis How data was acquired C57BL/6 mice, Artemis visualization tool, LightCycler s 480 System, Zeiss Axioplan 2 Imaging upright microscope with Axiovision software, ImageJ software, GraphPad Prism s .

Data format
Filtered and analyzed.

Experimental factors
Real-time/Reverse-transcription quantitative polymerase chain reaction (RT-qPCR), Western and Southern blotting analyses, rapid amplification of cDNA Ends, RNA-Fluorescence in situ Hybridization on different brain cells, LNA-ISH of the developing embryo/adult brain and overexpression analysis.

Experimental features
Multi-approach molecular and cellular characterization of Sox4 gene locus in experimental house mouse model (Mus musculus).

Data source location
Universiti Putra Malaysia, Selangor, Malaysia and University of Adelaide, South Australia, Australia.

Data accessibility
The data is available with this article.

Value of the data
The data describes the derivation of an endogenous small RNA via double-stranded RNA template in the mouse. This is a rare event within the mammalian genome but is common in the plant.
The data provides a modified method for brain cell fixation and immobilisation on glass slides for effective RNA-FISH analysis.
Comparison of two different Sox4 natural antisense transcripts, known as Sox4ot1 and Sox4ot2 in the production of Sox4_sir3 in vitro.
Compilation of all the information within the Sox4 gene locus allows clear, concise and easy visualisation of various features defined in the region by using Artemis software.
1. Data, experimental design, materials and methods

Genomics mapping of various features within Sox4 gene locus
The data reported here consists of information related to the Sox4 gene locus. The Sox4 gene locus is featured by multiple overlapping sense and natural antisense transcripts (NATs) [1,2]. Various efforts such as Serial Analysis of Gene Expression (SAGE) [1] and Rapid Amplification of cDNA Ends (RACE) in combination with strand specific Southern blotting analysis [2] were performed to characterize the locus. In silico data mining and mapping were also carried out to enrich the features within the locus and the detailed information is summarized in a GenBank file format as Supplementary GenBank File. A snapshot of the annotated Sox4 gene locus visualized using Artemis Genome Browser and Annotation Tool [3] is illustrated in Fig. 1. Information embedded within Supplementary GenBank File includes the sequences and loci for predicted NATs based on RACE-Southern analysis, probes/primers used, TATA box, poly-A site, mapped small RNAs, mapped FANTOM Paired-End Ditags (PET) sequences, which were obtained from the Ensembl website (www.ensembl.org), Sox4ot1, Sox4ot2, Sox4_sir3, untranslated regions, coding region and exons/introns.
The most important information within the Supplementary GenBank File is the mapped FANTOM Paired-End Ditags (PET) sequences. Twelve pairs of PET sequences were mapped to the locus indicating the presence of 6 different NATs. These NATs were named PET1-6 with 4 of them were successfully cloned and further analysed in Ling et al. [2]; PET2 (3214 bp), PET3 (1919 bp), PET5 (807 bp) and PET6 (1824 bp).

RNA Fluorescence in situ Hybridization (RNA FISH)
The data article also describes the results for RNA-FISH experiments performed on cells isolated from different regions of the mouse brain (Fig. 2). All cells presented here were treated with RNase A prior to hybridization step. From the micrographs, the signal of Sox4 sense was generally diffused all over the cytoplasm whereas Sox4 NATs were depicted as aggregates within the cytoplasm. Whenever Sox4 NATs aggregates were observed, Sox4 sense aggregates were found at the same loci within the cytoplasm.
To control for RNase A treated FISH experiments for Sox4, RNA FISH was performed on cells obtained from P1.5 olfactory bulbs using probes against the Hmbs housekeeping gene (Fig. 2). To avoid biases, fluorescent micrographs were captured using a fixed exposure time for all channels. Exposure time was set to 500 ms for both FITC (sense transcripts) and TexasRed (antisense transcripts), and 10 ms for DAPI (nucleus) channels. Three untreated and 3 RNase A treated cells are shown in Fig. 3. Multiple images were obtained at the Z-axis and compiled into 8 different movie files, which have been compressed and provided as Supplementary Movies.

ImageJ pixelation analysis of bands generated from Western blotting experiments
We used ImageJ software (http://rsb.info.nih.gov/ij) to quantitatively estimate the intensity of bands in Western blotting experiments. Pixels from each band from two independent experiments were calculated by using a fixed rectangular selection approach (Fig. 3). Only area below each peak (defined as shoulder-to-shoulder cutoff) above the background noise was considered for pixel calculation. Total pixels of the Sox4 band from each group were then normalized against total pixels calculated from actin of the corresponding group. Similar steps were repeated for trial 2 of the experiment. Unpaired T-test (2-tailed) was used to compare PET/pcDNA3 and control groups for any significant differences but none of the p-values were lesser than 0.05.

Mapping of small RNA sequences at Sox4 gene locus
To determine whether Sox4 overlapping gene locus give rise to any small RNAs, we compared each Sox4 gene sequence with $3.7 million small RNA sequences generated from a mouse E15.5 whole brain using a massively parallel sequencing platform, the Illumina Genome Analyzer II (GSE22653)   [4]. Only 7 small RNAs were matched and mapped to Sox4 gene locus (Table 1). All the mapped sequences were mapped to the sense strand of the Sox4 gene. The schematic diagram depicting the mapping of these small RNA at Sox4 gene locus is shown in Fig. 4A.

Transfection analysis involving PET3 and PET6 NATs
Of all the mapped small RNAs, only Sox4_sir3 was determined as legitimate small RNA which originated from Sox4 sense transcript. To determine whether Sox4_sir3 biogenesis may require the present of any Sox4 NATs, we transfected NIH/3T3 cells with plasmids expressing PET3 (NAT that does not overlap the Sox4_sir3 origin site) and PET6 (NAT that overlaps Sox4_sir3 origin site). The overexpression of PET3 and PET6 both did not alter the level of sense transcript expression. As expected, the expression of the Sox4 NAT at region overlapped by PET3 and PET6 were significantly upregulated ( Fig. 4B and C).
1.6. Full-length sequencing of unspliced PET6 (Sox4ot1) and spliced PET6 (Sox4ot2) PET6 NATs were isolated from PET6 transfected NIH/3T3 cells. Amplifications were performed using the paired-end ditags sequences as primers (see Supplementary GenBank File). Amplicons were analysed using agarose gel electrophoresis to estimate the size of PET6. The analysis showed that there were 2 forms of PET6 NATs, one is unspliced and the other one is spliced (Fig. 5). Sanger DNA sequencing of purified amplicons were performed and the outcome confirmed both forms of PET6 sequence variants. Subsequent transfection analysis using both forms of PET6 variants showed that only the unspliced PET6 was involved in the induction of Sox4_sir3 small RNA (Fig. 6).   It is important that all in situ hybridization experiments are appropriately controlled to avoid misinterpretation of noisy signals. Locked Nucleic Acid (LNA)-in situ hybridization (ISH) for small RNA is usually controlled with a scramble probe, a mutated antisense probe or a sense probe. As the control for Sox4_sir3 LNA-ISH reported in Ling et al. [2], all corresponding serial whole embryo or brain sections were probed with the scramble probe (Exiqon) at the same temperature, washing stringency and colour development duration set for Sox4_sir3 probe (Fig. 7). The scramble control experiments showed low background colour development suggesting a successful LNA-ISH experiment.

RT-qPCR and statistical analysis
We adopted reverse-transcription quantitative polymerase chain reaction (RT-qPCR) to determine the relative levels for various Sox4 sense and NATs expression. All RT-qPCR data presented here were conforming to the criteria described elsewhere [1,2,4]. In all relative quantification analysis, One-way Analysis of Variance (ANOVA) was used to compare the expression levels among groups, brain tissues or mouse organs. The detail statistical analyses for this data and other data presented in [2] is provided in Supplementary Results.