The brain transcriptome of the wolf spider, Schizocosa ocreata

Objectives Arachnids have fascinating and unique biology, particularly for questions on sex differences and behavior, creating the potential for development of powerful emerging models in this group. Recent advances in genomic techniques have paved the way for a significant increase in the breadth of genomic studies in non-model organisms. One growing area of research is comparative transcriptomics. When phylogenetic relationships to model organisms are known, comparative genomic studies provide context for analysis of homologous genes and pathways. The goal of this study was to lay the groundwork for comparative transcriptomics of sex differences in the brain of wolf spiders, a non-model organism of the pyhlum Euarthropoda, by generating transcriptomes and analyzing gene expression. Data description To examine sex-differential gene expression, short read transcript sequencing and de novo transcriptome assembly were performed. Messenger RNA was isolated from brain tissue of male and female subadult and mature wolf spiders (Schizocosa ocreata). The raw data consist of sequences for the two different life stages in each sex. Computational analyses on these data include de novo transcriptome assembly and differential expression analyses. Sample-specific and combined transcriptomes, gene annotations, and differential expression results are described in this data note and are available from publicly-available databases.


Objectives
Arachnids, including spiders, have diverse and unique reproductive behavior, including sexual cannibalism and female aggression, copulatory wounding, and elaborate courtship with sexual dimorphism in morphology and coloration [1][2][3][4][5]. The development of genomic resources in arachnids will allow for key comparisons not only in genome biology, but also in evolution and in the biology of sex. Comparative studies between arachnids and model organisms in other arthropod classes can provide a broader set of inferences that goes beyond what has been learned from model organisms. For example, copulatory wounding, male leg ornamentation, and elaborate courtship are well-studied in Drosophila, and arachnid genomic comparative studies can reveal parallel or divergent mechanisms [6][7][8][9][10][11][12][13][14][15].
Several arachnid genomes and transcriptomes, including those of spiders, mites and scorpions, have recently become available [16][17][18]. Given that spiders have unique sex-specific behaviors and that progress is ongoing in developing arachnid genomics, our goal was to generate transcriptomes and gene expression data using mRNA from brains of males and females of the wolf spider, Schizocosa ocreata. Studies of wolf spider sexual dimorphism in morphology and behavior have revealed intriguing parallels to textbook examples of sex dimorphism Stribling et al. BMC Res Notes (2021) 14:236 in well-studied model organisms [19][20][21][22][23][24][25]. The data presented here are valuable in laying necessary groundwork for broad comparative functional genomics of sex differences in brain and behavior across arthropods.

Data description
mRNA was isolated from brain samples of immature (Imm; subadult) and mature (Mat; adult) male and female Schizocosa ocreata, collected in Lancaster County, Nebraska (see detailed methods: NCBI Gene Expression Omnibus [GEO] Series accession number GSE168766). Illumina paired-end sequencing was performed with libraries generated from mRNA derived from individual brain samples, with three replicates for each sex/ stage (Data set 1; NCBI SRA: SRP302932). Sequence reads were processed to remove index and low-quality sequences; quality assessments are provided (Data file 1; Table 1; GEO GSE168766).
Transcriptome assembly was performed for each sample using Trinity followed by CAP3 [26,27]. A consensus transcriptome was assembled combining all individual assemblies using CAP3. Transcriptome quality was evaluated on individual sample and consensus transcriptome assemblies based on the number of conserved protein coding genes identified from the Core Eukaryotic Genes Mapping Approach (CEGMA) and Benchmarking sets of Universal Single-Copy Orthologs (BUSCO) Arthropod databases (Data file 2; GEO GSE168766) [28][29][30]. Both CEGMA and BUSCO alignments used the Basic Local Alignment Search (BLAST) utility with default threshold E-value of 1e−20 [31]. In the consensus assembly, 99% of CEGMA and 95% of BUSCO genes were identified, demonstrating a high assembly quality. To facilitate this workflow, the Transcriptome-Flow (TFLOW) pipeline was developed (Python 2.7; Zenodo; Data file 6). Each individual assembly was filtered to remove contaminant sequences and uploaded to the NCBI Transcriptome Shotgun Assembly (TSA) database (Data sets 2-11; Table 1). Putative protein coding sequences from the consensus assembly were extracted using TransDecoder v5.3.0 [32]. Coding sequences were annotated using Trinonate [33], aligning coding sequence (CDS) and predicted protein sequences against several databases, including Uniprot (October 2018), NCBI nr, and the Flybase Drosophila melanogaster v6.23 draft genome (Data file 3; GEO GSE168766). These alignments were used for identification of Protein Family (Pfam) domains, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) identifiers [34][35][36][37][38][39].
Annotated coding sequences were clustered into genes based on sequence similarity determined by an all-vsall BLASTn analysis, with software archived on Zenodo (Data file 7). Sequences identified as contaminants were removed and consensus CDS sequences were uploaded to the TSA database (Data set 12: TSA GIZQ00000000). For analysis of differential expression, reads were aligned to consensus CDS sequences and assigned to gene clusters, with expression estimated as read counts per gene (Data file 4; GEO GSE168766). Read alignment was performed using Burrows-Wheeler Aligner (BWA-MEM, version 0.6.1-r104) [40,41]. A linear model was fit with the glmLRT function in edgeR (version 3.1.2) using default (trimmed mean of M values, TMM) normalization [42][43][44]. Likelihood ratio tests were constructed with comparisons between: (1) immature vs mature adult data within sex; (2) all immature vs all adult data from both sexes; (3) male vs. female data at each stage; and (4) all male vs all female data from both stages. The calculated log fold-change (logFC), log counts-per-million (logCPM), Likelihood-ratio (LR), p-value, and false discovery rate (FDR) adjusted p-value are reported (Data file 5; GEO GSE168766). The R-script has been archived on Zenodo (Data file 8).

Limitations
Following read processing and quality assessment, two libraries (immature female 1 and adult male 1) were excluded from further analysis due to low sequence coverage. This limits the power to detect differential expression in the corresponding comparisons. The quality of the transcriptomes could be improved by coupling these data with long-read sequencing data in future work. Since the completion of this study, the CEGMA annotation database has been discontinued. The TFLOW software package was developed in the Python2.7 programming language which is no longer actively supported. Archival versions of Python2.7 may be utilized to execute TFLOW, or conversion of this software to a currently-supported version of Python can be performed using a python version-update package.

Funding
This study was supported by research start-up funds from the Florida State College of Medicine (awarded to MNA) and National Science Foundation (NSF) Grant 1751296 (awarded to RMG).

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Competing interests
The authors declare no competing interests.