Estradiol and genistein effects on the sea bass (Dicentrarchus labrax) scales: Transcriptome dataset

Fish scales are mineralized structures that play important roles in protection and mineral homeostasis. This tissue expresses multiple estrogen receptor subtypes and can be targeted by estrogens or estrogenic endocrine-disrupting compounds, but their effects are poorly explored. The transcriptome data here presented support the findings reported in the research article “Genistein and estradiol have common and specific impacts on the sea bass (Dicentrarchus labrax) skin-scale barrier” [1]. Juvenile sea bass were exposed to estradiol and the phytoestrogen genistein for 1 and 5 days, by intraperitoneal injections, and the effects on scale transcript expression were analysed by RNA-seq using an Illumina Hi-seq 1500. The raw reads of the 30 libraries produced have been deposited in the NCBI-SRA database with the project accession number SRP102504. Mapping of RNA-seq reads against the sea bass reference genome using the Cufflinks/TopHat package identified 371 genes that had significant (FDR<0.05) differential expression with the estradiol or genistein treatments in relation to the control scales at each exposure time, 254 of which presented more than a 2-fold change in expression. The identity of the differentially expressed genes was obtained using both automatic and manual annotations against multiple public sequence databases and they were grouped according to their patterns of expression using hierarchical clustering and heat-maps. The biological processes and KEGG pathways most significantly affected by the estradiol and/or genistein treatments were identified using Cytoscape/ClueGO enrichment analyses.


a b s t r a c t
Fish scales are mineralized structures that play important roles in protection and mineral homeostasis. This tissue expresses multiple estrogen receptor subtypes and can be targeted by estrogens or estrogenic endocrine-disrupting compounds, but their effects are poorly explored. The transcriptome data here presented support the findings reported in the research article "Genistein and estradiol have common and specific impacts on the sea bass (Dicentrarchus labrax) skin-scale barrier" [1]. Juvenile sea bass were exposed to estradiol and the phytoestrogen genistein for 1 and 5 days, by intraperitoneal injections, and the effects on scale transcript expression were analysed by RNA-seq using an Illumina Hi-seq 1500. The raw reads of the 30 libraries produced have been deposited in the NCBI-SRA database with the project accession number SRP102504. Mapping of RNA-seq reads against the sea bass reference genome using the Cufflinks/TopHat package identified 371 genes that had significant (FDR<0.05) differential expression with the estradiol or genistein treatments in relation to the control scales at each exposure time, 254 of which presented more than a 2-fold change in expression. The identity of the DOI of original article: https://doi.org/10.1016/j.jsbmb.2019.105448. Table 1 presents the detailed RNA-seq sequence statistics for each of the thirty replicate libraries that were constructed from sea bass scale RNA, grouped according to treatment: fish exposed to estradiol, E2 (E1d and E5d libraries, corresponding to 1 or 5 days of exposure respectively); fish exposed to genistein, Gen (Gen1d and Gen5d) and control fish (C1d and C5d). The transcriptome data released at the NCBI SRA database (Project SRP102504) contains the raw data files of the 30 RNA-seq libraries that were produced from the scales of E2 or Gen-exposed sea bass for each of the six experimental conditions: C1d (experiment with accession SRX2673957, n ¼ 6 replicate libraries), E1d (SRX2673962, n ¼ 5 libraries), Gen1d (SRX2674307, n ¼ 5 libraries), C5d (SRX2674405, n ¼ 5 libraries), E5d (SRX2674467, n ¼ 5 libraries) and Gen5d (SRX2675383, n ¼ 4 libraries).

Data
A total of 749 differentially expressed (DE) genes were identified. DE genes in the scale transcriptomes of E2-treated or Gen-treated sea bass were identified by comparison with the corresponding controls at each sampling time and genes changing expression in the control groups when comparing 1 day vs 5 days of exposure. The expression levels, fold changes and identities of the 332 DE genes in sea bass scales, that had more than a two-fold change in expression ("DE ! 2 FC"), are listed in Supplementary and an abbreviated version is presented in Table 3. Table 4 presents the detailed results from the annotation of the total list of 749 DE genes (that were used for global enrichment analyses), as well as Table 1 Detailed RNA-seq statistics for each library, grouped by treatment. Individual libraries (n ¼ 4e6 from individual fish) were prepared from RNA extracted from scales of fish sampled 1 day after treatment with the vehicle (C1d), estradiol (E1d) or genistein (Gen1d) and scales sampled five days after each treatment (C5d, E5d and Gen5d, respectively). The number of raw and filtered reads (in millions), percentage of mapped reads (for each individual library or on average) and the total numbers of reads produced in the study are presented.  Table 2 Selected list (first 20 genes) of the genes differentially expressed with a !2-fold difference in sea bass scales from treated versus control animals. The complete list with the expression and annotation details for the 332 genes with significant differential expression (FDR < 0.05 and FC ! 2) in each comparison can be consulted in Supplementary Table 1. Normalized expression levels in each experimental group are presented in fragments per kilobase of exon model per million reads mapped (fpkm) and significant differential expression for each comparison is presented in Log2 fold change (FC). The final annotation reached using the preference order strategy ( Fig. 1 and Table 4) and revised by manual curation is presented, with the accession numbers from Swiss-Prot, Genbank or from the sea bass genome.  Table 3 Selected list (first 20 genes) of the genes differentially expressed at FDR < 0.05 and at < 2-fold change difference, in sea bass scales from treated versus control animals. The complete list with the expression and annotation for the 417 genes with differential expression under these limits (FDR < 0.05 and FC < 2) can be consulted in Supplementary Table 2. Normalized expression levels in each experimental group are presented in fragments per kilobase of exon model per million reads mapped (fpkm) and significant differential expression for each comparison is presented in Log2 fold change (FC). The final annotation reached using the preference order strategy ( Fig. 1 and Table 4) is presented, with the accession numbers from Swiss-Prot, Genbank or from the sea bass genome. for the selection of the 332 DE genes with ! 2-fold change. Fig. 1 summarizes the preference order strategy used to automatically annotate the 332 selected DE genes using multiple databases, which was followed by careful manual curation. More details about the annotation strategy can be found in the methods below and in the methods and results of the associated JSBMB manuscript [1]. Fig. 2 summarizes the number of genes that were DE in response to the treatments (E2 or Gen) or changed expression between sampling times, when considering different stringency levels: False Discovery Rate (FDR) < 0.05 and < or ! 2-fold change in expression. Fig. 3 presents a heatmap showing the grouping of the six treatment groups, according to the identified transcriptome changes in sea bass scales. Tables 5 and 6 present the significantly enriched GO Biological Processes (GO-BP) and KEGG pathways, respectively, of all genes that presented significant changes in expression in response to the treatments E2 and/or Gen (analysis "All"); Tables 7 and 8 present the significantly enriched GO-BP and KEGGs when directly comparing the E2-or Gen-responsive genes, irrespective of the sampling time (analyses "E2" vs "Gen"); Tables 9 and 10 list the significantly enriched GO-BP and KEGGs when comparing responsive gene lists between 1 day and 5 days (analyses "1d" vs "5d").

Experimental set-up and sampling
The experimental set-up generating the analysed RNAs has been previously described by Pinto et al. [1,2]. Immature sea bass (n ¼ 10/experimental group) received intraperitoneal injections of 5 mg/kg E2 or 5 mg/kg Gen in coconut oil or injection of coconut oil alone (control groups). Individual scales were plucked with forceps from the same region of the skin (below the dorsal fin) in each fish, frozen in liquid nitrogen and stored at À80 C until total RNA extraction.

Total RNA extraction
An automated Maxwell 16 Instrument and the SEV (standard elution volume) total RNA purification kit (Promega, Madison, Wisconsin, USA) were used for the extraction of total RNA from n ¼ 15 scales/ individual sea bass. Scales in lysis buffer were mechanically disrupted with an Ultra Turrax Table 4 Detailed results from the annotation. Annotation results (in number of genes or percentage from total) are shown for the 749 genes differentially expressed with q < 0.05 (columns "DE all") and for the selection of 332 genes differentially expressed at q < 0.05 with a minimum 2-fold change (columns "DE ! 2 FC").   A. Venn diagrams representing common (Com.) and specific genes differentially expressed in sea bass scales in response to the treatments (17b-estradiol, E2, and/or genistein, Gen, compared to the corresponding controls at each sampling time 1 day and 5 days), compared to the differential expression in the control groups over time. The number of genes by treatment or time and their respective percentage are shown for two levels of stringency. "DE genes" above the diagram corresponds to the 749 genes differentially expressed at an FDR <0.05. Below the diagram the DE genes (332) differentially expressed with a minimum of 2-fold change are considered ("DE ! 2 FC" and FDR < 0.05). 51% of the "DE genes" changed expression only in control scales over time but these changes were of low magnitude (average fold change of 1.9-fold between C1d and C5d) and when the analysis stringency was increased to a minimum of 2-fold change, only 24% of these 332 genes changed expression in the control scales over time. The 254 genes (76%) significantly regulated by E2 or Gen were selected for further analyses. B. Shows the comparison of common/specific DE genes between E2 and Gen at the two stringency levels, FDR <0.05 or !2 FC and FDR <0.05 (irrespective of the sampling times). For the number of genes regulated by E2 and/or Gen at each sampling time see Fig. 2B of the associated paper in JSBMB [1]. . The red gradient indicates high abundance, the green gradient indicates low abundance and black indicates equal abundance for each gene and condition relative to the average. 1 day after treatment the DE genes of the E2 group clustered more closely to the control than Gen1d. 5 days after treatment both E2-and Gen-treated scales clearly separated from the control and clustered together, suggesting a similar response at this time point. Table 5 Enrichment of GO Biological Processes (GO-BP) using all genes found to be differentially expressed in response to E2 and/or Gen (analysis "All"). Significantly enriched biological processes (FDR < 0.05) were identified by ID and term description and grouped into 23 functionally related networks (GO group) obtained by ClueGO analysis. Each group is named after its most significant term (lowest FDR) and highlighted in bold, which was chosen for GOTerm representation in Fig. 3     RNA quantity and quality were measured in a NanoDrop 1000 spectrophotometer (Thermo Fisher Scientific, Waltham, Massachusetts, USA). Between 3 and 10 mg of RNA were digested with DNase, using a rigorous treatment regime by carrying out two sequential digestions of 30 min with 2U of TURBO  Table 6 Enrichment of KEGG pathways using all genes found to be differentially expressed in response to E2 and/or Gen (analysis "All").
Significantly enriched pathways (FDR < 0.05) were identified by their KEGG identifier and have been grouped according to 7 functionally related groups obtained by ClueGO analysis. Each group is named after its most significant term (lowest FDR), which is indicated in bold.  Table 7 Enrichment of GO Biological Processes (GO-BP) by the genes differentially expressed in response to E2 or to Gen (analyses "E2" vs "Gen"). Significantly enriched biological processes (FDR < 0.05), identified by their ID and term description, are grouped into 12 functionally related networks (GO group) obtained by ClueGO analysis. Each group is named after its most significant term (lowest FDR), highlighted in bold, which was chosen for group representation in Fig. 3 of the associated MS in JSBMB [1]. Functionally related groups are sorted by the highest enrichment score, calculated as [-Log2 (group FDR)]. The classification of each term as specifically enriched in response to the E2 or Gen treatments (or both), obtained by ClueGO cluster analysis, is also shown. For groups including terms enriched in more than one treatment, the classification of the leading term was adopted for the group.  DNAse, as recommended in the instructions of the DNA-free kit (Ambion, Thermo Fisher Scientific, Waltham, Massachusetts, USA).

RNA-seq library preparation
The prepared DNase treated RNAs were precipitated with 5 vol of 100% ethanol and shipped in refrigerated conditions to the Shanghai Ocean University Sequencing Service, Shanghai, China. Their quality was assessed using a Bioanalyser 2100 (Agilent Technologies, Santa Clara, California, USA), to confirm that all RNAs had an RNA integrity number (RIN) greater than 8. Library preparation was conducted using a TruSeq mRNA library prep kit (Illumina, San Diego, California, USA), following the suppliers' instructions. For each library, 0.5 mg of each of the 30 individual RNAs were used (n ¼ 4e6 individual libraries per treatment, see Table 1). Sequencing of paired-end (100 bp) reads was carried out using an Illumina Hi-Seq 1500, at the Shanghai Ocean University Sequencing Service.

Sequencing and data analysis
Quality control and trimming of the produced reads was carried out with FastQC and Cutadapt [3,4], using a Phred quality score cut-off of 20.
Good quality (filtered) reads (Table 1) were then mapped and assembled using the sea bass reference genome assembly from June 2012 (dicLab v1.0c) and the annotation from July 2013 [5,6], by running TopHat and Cufflinks packages with the data and using the default parameters [7,8]. Relative expression levels were obtained in fpkm (fragments per kilobase of transcript per million fragments mapped) for each sea bass gene and differential expression between experimental conditions was evaluated using Cuffdiff.
Differential expression was evaluated using pairwise comparisons between a) the scales of E2-or Gen-treated fish compared to control fish at the same sampling time (E1d vs C1d, Gen1d vs C1d, E5d vs C5d or Gen5d vs C5d) or b) between the control groups over time (C1d vs C5d). Two different stringency conditions were used to identify differentially expressed genes: those passing the condition FDR <0.05 and those satisfying both conditions FDR <0.05 and a ! 2-fold change in expression between the two compared groups. Table 8 Enrichment of KEGG pathways by the genes differentially expressed in response to E2 or to Gen (Analyses "E2" vs "Gen"). Significantly enriched pathways (FDR < 0.05), identified by their KEGG identifier, are grouped according to 6 functionally related groups obtained by ClueGO analysis. Each group is named after its most significant term (lowest FDR) and is highlighted in bold. Functionally related groups are sorted by highest enrichment score, calculated as [-Log2 (group FDR)].  Table 9 Enrichment of GO Biological Processes by the genes differentially expressed after 1 day or 5 days (Analyses "1d" vs "5d"). Significantly enriched biological processes (FDR < 0.05), identified by their ID and term description, are grouped according to 11 functionally related networks (GO group) obtained by ClueGO analysis. Each group is named after its most significant term (lowest FDR), highlighted in bold, which was chosen for GOTerm representation in Fig. 3 of the associated MS in JSBMB [1]. Functionally related groups are sorted by highest enrichment score, calculated as [-Log2 (group FDR)]. The classification of each term as specifically enriched in the responses after 1 day or 5 days (or both), obtained by ClueGO cluster analysis, is also shown. For groups including terms enriched in more than one list the classification of the leading term was adopted for the whole group.  The common or specific genes identified between different lists of E2 or Gen DE genes were represented by area proportional Venn diagrams generated with BioVenn [9,10], using their Cufflinks XLOC identifiers (see Supplementary Table 1). Similarities between global transcriptome changes identified for the six experimental conditions (C1d, E1d, Gen1d, C5d, E5d and Gen5d) were evaluated by hierarchical clustering with Cluster 3.0 [11,12], using median centered expression data after Log2 transformation of normalized gene expression (fpkm) levels. The uncentered correlation option and complete linkage options were used to cluster group arrays (experimental conditions).

Gene and functional annotations
The 749 genes for which differential expression was detected with FDR <0.05 were subjected to a multistep automatic gene annotation strategy. Stand-alone Blastx analyses of the corresponding Cufflinks transcripts (individual sequences extracted from the sea bass annotated genes coding sequences, between the mapping positions) were run against the Swiss-Prot curated protein database [13,14] and the unreviewed GenBank protein database [15,16], with expect values (E) set at a maximum of 10 À10 . These Blastx results were then combined and compared with the Cufflinks mappings to the annotated sea bass genome [5,6]. For genome mapping, the annotation of the closest gene was used as long as a maximum distance of 1000 bp existed between this gene and the mapping position of the Cufflinks transcript.
A hierarchical preference order was defined to assign annotation matches to the differentially expressed genes, which was Swiss-Prot hits > sea bass genome hits > GenBank hits, as represented in Fig. 1. Genes were annotated using Swiss-Prot hits for all genes that gave significant hits with this curated database; when no Swiss-Prot hits were found genes were annotated using genome mapping and genes with no significant annotation using the two previous databases were annotated using the non-curated Genbank protein database.
In addition, 54% of the 332 DE genes with ! 2-fold change were manually curated to verify the accuracy of the annotations. For this process, individual Cufflinks transcript sequences extracted from the mapping positions were re-annotated using individual BLAT versus the sea bass genome [5,6], compared with the sea bass annotations of these and close by genes. Their predicted coding sequences were blasted against the Swiss-Prot and GenBank protein databases, followed by a careful verification by multisequence alignments carried out with MultAlin [17,18].
Gene ontology (GO) and pathway (KEGG) enrichment analyses were carried out using Cytoscape v3.5.1 [19] and ClueGO plug-in v2.3.2 [20], with Cluepedia v1.3.2. The zebrafish (Danio rerio) Table 10 Enrichment of KEGG pathways by the genes differentially expressed after 1 day or 5 days (analyses "1d" vs "5d"). Significantly enriched pathways (FDR < 0.05), identified by their KEGG identifier, are grouped according to 7 functionally related groups obtained by ClueGO analysis.  [21]). D. rerio Ensembl protein IDs corresponding to each list of DE genes ("All" ¼ all genes regulated by E 2 and/or Gen; "E 2 " or "Gen" ¼ genes regulated by each treatment irrespective of the sampling time and "1d" or "5d" ¼ genes regulated after 1 or 5 days by either E 2 or Gen) were then submitted to the Cytoscape/ClueGO plug-in. This was run using the following settings: enrichment analysis (right-sided hypergeometric test) using GO Biological Process (GO BP, levels 3e8) terms for D. rerio updated on 13/05/2017; BenjaminieHochberg false discovery rate (FDR) correction with only terms with FDR 0.05 and a minimum of three genes/4% being considered significant; Initial group size (for grouping into functionally related networks of enriched terms) set as 1 and group merging at 50%, with a Kappa-statistics score threshold set at 0.4. The enrichment score for each functionally related network group was calculated as -Log2 (group FDR). The leading terms for each group were selected based on their highest enrichment score (lowest term FDR) and were used for group naming in tables and condensed bar plot representations and for evaluation of treatment specific enrichment. Parallel enrichment analyses were carried out for the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways for each DE list, using the same parameters and strategy as described for GO BP.