Data of RNA-seq transcriptomes in the brain associated with aggression in males of the fish Betta splendens

Siamese fighting fish Betta splendens are notorious for their aggressiveness and males of this fish have been widely used to study aggression. However, an understanding of brain transcriptome signature associated with aggression in the context of male-male interaction in this fish remains to be understood. Herein, RNA-Seq transcriptome data from 37 brains samples collected at different fighting stages are described. These brain samples were collected before fighting (B), during fighting (D20 and D60), and after fighting (A0 and A30). The raw data were analyzed for differential gene expression using edgeR package in R. A criterion of FDR cut-off ≤ 0.05 and an absolute fold change (FC) of 0 or greater were used to identify top upregulated and downregulated genes in fighting groups (D20, D60, A0, and A30) relative to non-fighting group (B). The data presented hereafter enable fundamental studies on genes and molecular events mediating aggressive behavior in this fish and will lay a valuable foundation for future research on the aggression of vertebrates.

Biology Specific subject area Molecular Biology, Transcriptomics Type of data Transcriptomics (RNA-seq) How data were acquired High-throughput sequencing (Illumina HiSeq 2500) Data format Abundance estimates of raw transcripts (genes) are generated by feature Counts [1] . Normalized gene expression values are presented as median of ratio values generated by the edgeR package [2] . Differential brain gene expression estimates of fighting groups (D20, D60, A0, and A30) verses non-fighting group (B) are calculated using the edgeR package [2] . Parameters for data collection Whole brains from 37 males of Betta splendens collected at different fighting stages namely non-fighting (B, 5 individuals), during fighting for 20 min (D20, 10 individuals), during fighting for 60 min (D60, 10 individuals), immediately after shifting their social status i.e., the winner/loser has emerged (A0, 6 individuals) and 30 min after this shift (A30, 6 individuals) (Fig. S1A). Description of data collection RNA was isolated using trizol from whole brains of 37 samples of males Betta splendens . Genomics core constructed and sequenced libraries as described in material and methods. Data

Value of the Data
• This data set provides insight into the brain gene expression alteration in males of the fish B. splendens in the context of male-male interaction and can further provide insights into other fish species. • This data set facilitates a comprehensive understanding of aggression at the molecular level among scientists working on fish biology, neuroscientists, and molecular evolutionary biologists.
• This data provides the information of genes associated potentially with long-term memory, hibernation state, and autism spectrum disorder, which will be valuable resources for future study in these respective fields.

Data Description
To address how a complex social behavior such as aggression is influenced both by genetic and environmental factors [ 3 , 4 ], we identified differentially expressed genes using the RNA sequencing approach [5] , which was applied to the male Betta splendens collected from different fighting stages namely non-fighting (B), during fighting (D20 and D60), and after fighting (A0 and A30). The results of sequence quality assessment for all the samples are summarized in Table 1 as the number of raw reads, mapped reads, and unique mapped reads (reads that matched the reference genome in only one position). Two types of sequencing were used including single-ended and pair-ended sequencing with different sequence lengths 51, 101, or 126 base pairs (bp). As for pair-ended sequencing, the average raw reads, mapped reads, and unique mapped reads are 39,407,972; 31,888,554; and 30,580,374, respectively. As for single-ended sequencing, the average raw reads, mapped reads, and unique mapped reads are 25,957,661; 21,801,529; and 20,823,889, respectively. The mapping rates for pair-ended and single-ended and sequencing are 79.6% and 79.0%, respectively. Given that the samples had undergone two different sequencing methods, we examined the possibility of whether it led to any biases in the data using multidimensional scaling (MDS) plot. The MDS plot, which was color-coded based on the sequencing method, revealed that the two methods resulted in slight or no biases as all samples were clustered together (Fig. S1B). 840 * When the FDR-value of the expression level in the same gene between the two groups was less than 0.05, the difference was significant and the number was indicated.
* * The expression of gene higher in G2 group than G1 group. * * * The expression of gene lower in G2 group than G1 group.  Table 2 shows the number of differentially expressed genes (DEGs) between fighting groups (D20, D60, A0, and A30) relative to the non-fighting group (B) using a criterion of FDR cutoff < 0.05. In this analysis, normalization of differential gene expression is required to obtain more objective values because the number of mapped reads varies with the length of a gene. In doing so, the trimmed mean of M values (TMM) method was implemented using edgR package in R . The gene ID, p -value, FDR-value, logFC, etc., for each section can be seen in Table S2 in the co-published article. Table 3 shows the enrichment of the biological process of the DEG list for the D20, D60, A0, A30 relative to the B group. Detailed GO IDs, gene names, descriptions, etc., are provided in Table S4 in the co-published article.

Sample collection
Several males of B. splendens (average standard length, 5.2 ± 1.1 cm) were imported from a local fish shop in Thailand. When they were brought to the laboratory for testing, all experimental males were isolated for at least one week. All fish were fed with commercial food daily and kept on a 12 h light/12 h dark cycle. The aggressive behaviors of this fish have been described previously [ 6 , 7 ]. For the behavioral test, briefly, several pairs of males B. splendens were introduced to fight each other in a small tank in a 1.7-L PVC tank (18 × 12.5 × 7.5 cm). Their fighting process took place in a sequence beginning with displaying behavior in which two individuals spread their fins and their body colors turned bright, next they circled to examine each other. Then, they bite/strike and went up to the surface to take oxygen (surface-breathing) or performed mouth-locking behaviors. Finally, one fish chased the other and this chasing period signified that the fight ended and the winner/loser became evident ( Fig. 1 A, B).

Time of sample collection
Two fighting experiments ( n ) were conducted per day beginning at 1 PM (t0), and fish were immediately sacrificed at specific time points (t1) e.g., 20, 60 min, etc, by submersion in the lethal dose of MS 222. It took one day to collect the five individuals for Set1, three days to collect the five pairs for Set2, followed by another three days to collect the five pairs for Set3 as well as another 3 days for Set4. Brains for RNA-seq were collected before fighting (B, Set1), during fighting (D20 and D60, Set2 and Set3), and after fighting (A0 and A30, Set4). Their heads were frozen in liquid nitrogen, and their whole brains were carefully dissected and placed individually in Eppendorf tubes containing 1 mL of TRIzol Reagent (Life Technologies). After being sacrificed, the samples were immediately transferred to a −80 °C freezer and were stored there until subsequent brain dissection, RNA extraction, and RNA sequencing.

RNA extraction
Total RNA was isolated using TRIzol Reagent according to the manufacturer's recommendation and was subsequently purified on columns with Quick-RNA MiniPrep (Zymo Research, USA).
RNA was eluted in a total volume of 30 μL in RNase-free water. Samples were treated with DNase (QIAGEN) to remove genomic DNA. RNA quantity was assessed using a Qubit (Eugene, Oregon, USA), and RNA quality was assessed using the Agilent Bioanalyzer 2100 Nano kit (Agilent, USA) (RNA Integrity Number -RIN: 6.3-8.8). RNA was immediately stored at −80 °C until it was used to prepare the sequencing libraries.

RNA-seq libraries preparation
RNA-seq libraries were constructed using the TruSeq Stranded mRNA Library Prep kit (Illumina, USA) with proper quality controls, and the molar concentrations were normalized using a KAPA Library Quantification kit (Kapa Biosystems, USA). Libraries were sequenced on the Illumina HiSeq 2500 system at Yourgene Bioscience Co., Ltd. (Taipei, Taiwan) and on the Illumina HiSeq 2500 system at the NGS High Throughput Genomics Core (Biodiversity Research Center, Academia Sinica, Taiwan).

Behavioral analysis
All the fighting pairs were videotaped by a camera (Nikon Cool Pix E5400). Then, the Video Marker tool was employed to tract the behavioral events e.g, biting/striking, surface-breathing, and mouth-locking for the behavioral analyses ( Fig. 2 ). This tool allowed us to break down the time into minute: second: millisecond. Additionally, a record sheet was designed to mark all the behavioral events e.g., biting/striking, surface-breathing, and mouth-locking across the fighting process. Took the fighting pair F5 vs. F8 for an example, the fish F5 performed bite/strike first at 2:42:73 then the fish F8 performed this behavior at 3:8:13 as was shown. The other events for surface-breathing and mouth-locking could also be seen in the form. A total of seven fighting pairs was analyzed in terms of frequency and duration respecting each behavioral event in 80 min.

RNA-seq data analyses
FASTQC tool was used to assess the quality of the reads [8] . Adaptor sequences and lowquality bases were clipped from 50 bp single-end and paired-end sequences using the Cutadapt tool [9] . Reads were aligned to the B. splendens reference genome [10] using TopHat version 2.1.1 [11] and Bowtie2 version 2.1.0 [12] with the default settings. The unique mapping reads (reads that matched the reference genome at only one position) were extracted using Samtools [13] . The exon-mapped reads were counted with feature Counts [1] . The normalized expression levels of genes, represented by the trimmed mean of M -values (TMM), were generated with the edgeR [2] package in R .

Accession code
The RNA-Seq data are accessible on DDBJ ( https://www.ddbj.nig.ac.jp/index-e.html ) with these ID: DRA009599 and PRJDB11439

Statistical analysis
To define DEGs, we included genes with at least one count per million (cpm) in at least one sample. Count data were normalized by the TMM using edgeR in R [14] . To assess differential expression, a nested interaction model was fitted in edgeR. A tagwise dispersion estimate was used after computing common and trended dispersions. We adjusted the p -values from all contrasts at once concerning the false discovery rate (FDR). Two criteria were used to call DEGs: (i) the relaxed version used FDR < 0.05 alone, and (ii) a stringent version used both the FDR and FC value, with FDR < 0.05 and |logFC| > 2; these were implemented by the edgeR package in R .
The significantly enriched GO terms (biological process and molecular function terms) and KEGG pathways were identified by DAVID [15] . We tested for the overrepresentation of transcripts with a raw p -value of < 0.05 (Bayesian statistic).

Ethics Statement
The animal experimentation procedures used in this study were approved by the Institutional Animal Care and Use Committee (IACUC) (Approval No. 106171) of the National Cheng Kung University, Tainan, Taiwan.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article. the opportunity to pursue this study and for useful discussions. Computational resources were provided by the Data Integration and Analysis Facility, National Institute for Basic Biology, Japan.

Supplementary Materials
Supplementary material associated with this article can be found in the online version at doi: 10.1016/j.dib.2021.107448 .