RNA-sequencing data highlighting the time-of-day-dependent transcriptome of the central circadian pacemaker in Sox2-deficient mice

SOX2 is a stem cell-associated pluripotency transcription factor whose role in neuronal populations is undefined. Here we present the RNA-sequencing based transcriptome profiles of control (Sox2fl/fl) and SOX2 conditional knock-out (Vgat-cre;Sox2fl/fl) mice at four time points in one 24-h circadian cycle. The raw sequencing data were deposited to ArrayExpress database at EMBL-EBI (https://www.ebi.ac.uk/arrayexpress) under the accession number E-MTAB-7496. Results of rhythmicity analysis, differential expression analysis, network prediction, and potential target identification stemming from the RNA-sequencing dataset are also given in this article. The interpretation and discussion of these data can be found in the related research article entitled “SOX2-dependent transcription in clock neurons promotes the robustness of the central circadian pacemaker.” Cheng et al. 2019.


a b s t r a c t
SOX2 is a stem cell-associated pluripotency transcription factor whose role in neuronal populations is undefined. Here we present the RNA-sequencing based transcriptome profiles of control (Sox2 fl/fl ) and SOX2 conditional knock-out (Vgat-cre;Sox2 fl/fl ) mice at four time points in one 24-h circadian cycle. The raw sequencing data were deposited to ArrayExpress database at EMBL-EBI (https://www.ebi.ac.uk/arrayexpress) under the accession number E-MTAB-7496. Results of rhythmicity analysis, differential expression analysis, network prediction, and potential target identification stemming from the RNAsequencing dataset are also given in this article. The interpretation and discussion of these data can be found in the related research article entitled "SOX2-dependent transcription in clock neurons promotes the robustness of the central circadian pacemaker." Cheng

Data
To understand the effects of Sox2 ablation within the central circadian pacemaker in mammals, the suprachiasmatic nucleus (SCN), we sequenced the SCN transcriptomes of Sox2 fl/fl (control) and Vgatcre;Sox2 fl/fl (SOX2 conditional knock-out) mice at 4 circadian times (CT0, 6, 12, and 18; n ¼ 5 per CT). Raw RNA sequencing data were deposited in the ArrayExpress database at EMBL-EBI (E-MTAB-7496). Alignment metrics can be found in Data S1.
Rhythmicity analysis of the SCN transcriptome of control and Vgat-cre;Sox2 fl/fl mice was conducted with MetaCycle and detailed results are summarized in Fig. 1 and Data S2. Phase, period, and relative amplitude of the rhythmic transcriptomes were compiled and shown in Fig. 1BeD. Expression profile of rhythmic genes are visualized by heatmaps in Fig. 1EeG. The changes in rhythmic component of rhythmic core clock genes (CCGs) were calculated and depicted in Fig. 1HeK.
Differential expression analysis for all transcripts was performed with DESeq2, yielding a list of 233 differentially expressed genes (DEGs) with FDR 0.05 (Data S3). Protein-protein interaction networks among the set of 233 DEGs was constructed with STRING and presented in Fig. 2 and Data S4. Potential targets of 11 differentially expressed transcription factors were identified with iRegulon and the resulting amalgamated metatargetome is shown in Fig. 3 and Data S5. Transcription factor networks with enriched regulatory features among the set of 233 DEGs were combined with the metatargetome of SOX2 and presented in Fig. 4 and Data S6.
The list of DEGs was then compared with a previously reported SOX2 ChIP-seq dataset from murine embryonic stem cells (ESCs) and ESC-derived neural progenitor cells (NPC). The gene overlap from this cross-referencing is documented in Fig. 5AeB and Data S7. The binding of SOX2 to 7 predicted genomic regions was validated in SCN tissues by ChIP-qPCR (Fig. 5CeI). Value of the data The dataset reported here is the first RNA-sequencing data of the murine suprachiasmatic nucleus with samplings throughout a complete 24-h circadian cycle The rhythmicity analysis of the suprachiasmatic nucleus transcriptomes provides foundation and crucial information for future studies of mammalian circadian rhythms Differentially expressed gene analysis, network prediction, and regulatory target identification provide insight into the role of Sox2 in circadian rhythm regulation 2. Experimental design, materials, and methods

Sample collection and RNA sequencing
All animal handling and experimental procedures were performed at the University of Toronto Mississauga (UTM) Animal Facility and were approved by the UTM Animal Care Committee, complying with guidelines established by the University of Toronto Animal Care Committee and the Canadian Council on Animal Care. Detailed procedures for tissue harvesting, RNA extraction, library preparation, high throughput sequencing, read trimming, read mapping, and transcript annotation were described in the related research article [1]. Briefly, SCN tissues from control and Vgat-cre;Sox2 fl/fl mice were harvested at circadian time (CT) 0, 6, 12, and 18. RNA were extracted and then submitted to the G enome Qu ebec Innovation Centre at McGill University (Montr eal, Canada) for RNA-seq library preparation. RNA libraries were sequenced on an Illumina HiSeq4000 platform and the reads were trimmed using Trimmomatic v.0.36. Reads for each replicate were aligned to the Mus musculus reference genome (GRCm38) using Bowtie2. The Mus musculus gene annotation from Ensembl (release 90) was loaded as gene model using makeTxDbFromGFF from the GenomicFeatures package. The function summa-rizeOverlaps from the GenomicAlignments package was used to generate a SummarizedExperiment object that contained metadata and number of reads per gene. Genes with counts-per-million (cpm) below 0.1 in at least 35 of the 40 libraries were not considered to be expressed in the SCN and were thus discarded from further analysis. Expression data of the remaining 26,798 genes were used for downstream analysis.

Bioinformatics and in silico analysis
To identify rhythmic genes, meta2d algorithm from the R package MetaCycle was used on the normalized counts of 26,798 expressed genes in the SCN [3]. Minimum and maximum period were set to 12 and 24 hours, respectively. "JTK" and "LS" integrated period, phase, amplitude, relative amplitude, and benjamini-hochberg q (BH.Q) values were used to compare control and Vgat-cre;Sox2 fl/fl SCN transcriptomes. Genes showing a significant profile (BH.Q < 0.05) were classified as rhythmic genes.
For constructing heatmaps and hierarchical clustering of rhythmic genes, variance stabilizing transformations (VST) were first applied to the normalized counts to remove the dependence of the variance on the mean [4]. The amount by which each gene deviates in a specific sample from the average VST transformed values across all samples was used to construct clustered heat maps.
Differential expression analysis was performed using the DESeq2 package with the assumption of negative binomial distribution for RNA-seq data [2]. For each time point, genes were considered differentially expressed in Vgat-cre;Sox2 fl/fl SCN compared to controls when the reported benjaminihochberg adjusted p values (PADJ) is less than 0.05.
Protein-protein interaction networks among the set of 233 DEGs was constructed with STRING [5]. Out of 211 mapped DEGs, 101 were predicted to interact with at least one other DEG. There was a total of 163 interactions, with the largest network consisting of 73 DEGs.
Using Cytoscape plugin iRegulon [6], we amalgamated the metatargetome of 11 differentially expressed transcription factors with positive queries. The occurrence count threshold and number of nodes to return were set to 5 and 1000, respectively. GeneSigDB, Ganesh clusters, and MSigDB were used as the databases of signatures/gene sets.
The 233 DEGs were submitted to i-cisTarget regulatory features and cis-regulatory modules prediction [7]. Minimum fraction of overlap, normalized enrichment score (NES) threshold, and AUC threshold were set to 0.4, 3.0, and 0.005, respectively. Enriched predicted regulatory motifs and the associated transcription factor were documented in Data S6. Number of potential regulatory features were calculated for each regulator-target pair and represented by the stroke weight in the regulatory network. DEGs with potential SOX2 binding sites were identified with iRegulon and mapped onto the same network.
ChIP-seq data of bound promoters, active enhancers, and poised enhancers for SOX2 in ESCs and NPCs were extracted from Table S2 of Lodato et al. (2013) [8]. Active enhancers and poised enhancers were grouped together for comparison. The list of SOX2 binding sites were cross-checked with our list  SCN in terms of the period for the entire set of rhythmic genes (Fig. 1C, gray þ teal vs. gray þ lilac bars). Among the set of common, rhythmic genes, there was a significant reduction in median period in Vgat-cre;Sox2 fl/fl animals (Mann-Whitney [M -W] U test, p ¼ 0.011) (Fig. 1C, gray bars). For the genotype-specific rhythmic genes, the median period was significantly longer (M -W U test, p ¼ 0.003), and the distribution of period was altered (K-S test, p ¼ 0.004), in Vgat-cre;Sox2 fl/fl animals (Fig. 1C, teal vs. lilac bars). The distribution of relative amplitude for the genotype-specific rhythmic genes was different between control and Sox2-deficient mice (K-S test, p ¼ 0.012) (Fig. 5D,    of DEGs. Seven genes from the set of DEGs that overlapped with the Lodato dataset were selected and binding of SOX2 to their reported regulatory region was confirmed with ChIP-qPCR using SCN tissues extracted at four CTs. Comprehensive procedure on tissue harvesting, ChIP, and qPCR were described in the related research article [1].

Statistical analysis
Data were analyzed using two-way ANOVA, Mann-Whitney U test, and Kolmogorov-Smirnov test with R version 3.5.0. Post hoc significance of pairwise comparisons was assessed using Tukey's Honest Significant Difference test with a set at 0.05.

Transparency document
Transparency document associated with this article can be found in the online version at https:// doi.org/10.1016/j.dib.2019.103909.