Transcriptomic data during development of a two-spotted cricket Gryllus bimaculatus

The two-spotted cricket Gryllus bimaculatus is a popular food for reptiles and other insectivorous animals, for the ease of breeding and rich nutrients. It goes through eight moulting cycles until it grows into an adult of size around 30–40 mm, but different larval instars are also used for their sizes matching the fed animals. We therefore provide a transcriptomic resource on different developmental stages of G. bimaculatus to understand the inner molecular workings of these stages contributing to varying nutrients. The raw RNA sequence data is available at NCBI Sequence Read Archive (SRA) under the BioProject PRJNA716138 and the assembled contigs are available as a supplementary data of this report.


Specifications
Entomology and insect science Specific subject area Transcriptomics Type of data RNA-seq data (paired-end) and assembly of reads How data were acquired Illumina NextSeq 500 sequencing platform Data format Raw sequence reads (FASTQ), assembled contigs (FASTA), and expression quantification (Tab delimited text of TPM values) Parameters for data collection 1st instar, 2nd instar, 3rd instar were purchased from Tsukiyono Farm, Japan ( https://tsukiyonofarm.jp ), and wingless juvenile of size 6-10 mm (size S), wingless juvenile of size 10-15 mm (size S), wingless juvenile of size 15-20 mm (size ML), and winged adult female of Gryllus bimaculatus purchased from Mito-Korogi, Japan. 1st instar, 2nd instar, 3rd instar stages were based on 1st, 2nd, and 3rd week after hatching for the molting cycle takes about a week in these stages, and they were further confirmed by the body sizes of 2.5-3.5 mm, 3.5-5.0 mm, and 5.0-8.0 mm in these stages, respectively. Description of data collection Total RNA was extracted and purified from the whole body of a cricket specimen, which was sequenced as paired-end reads. Reference assembly was produced by merging reads from all conditions and subsampling 6 M paired reads, using Bridger software.

Value of the Data
• The transcriptome data for G. bimaculatus taken in multiple developmental stages would facilitate developmental study of arthropods to study the comprehensive gene expression dynamics through these life stages. • Entomologists studying insect development, as well as researchers rearing insectivorous animals that are fed with different developmental stages of crickets can use this data to explore the expression changes between the stages, and to gain insights into the varying nutrients produced. • Due to the ease of breeding, crickets are also gaining attention as a possible alternative protein source for animal-free meat. Transcriptome data for different developmental stages will allow these applications of cricket as food source by elucidating the differential metabolism and biosynthesis of various nutrients produced.

Data Description
The dataset contains raw RNA-Seq data obtained from seven life stage samples of G. bimaculatus , namely, 1st instar, 2nd instar, 3rd instar were purchased from Tsukiyono Farm, Japan ( https://tsukiyonofarm.jp ), and wingless juvenile of size 6-10 mm (size S), wingless juvenile of size 10-15 mm (size S), wingless juvenile of size 15-20 mm (size ML), and winged adult female, purchased from Mito-Korogi, Japan ( http://www2u.biglobe.ne.jp/ ∼m-korogi/ ). Summary of raw read data is shown in Table 1 . 12-18 M paired reads were obtained, where more than 80% of the reads had average quality value above Q30. The raw RNA sequence data is available at NCBI Sequence Read Archive (SRA) under BioProject PRJNA716138.
Reference transcriptome assembly was constructed from subsampled reads from all samples to achieve maximum coverage. Since we have previously seen a small but certain level of crosscontaminations with multiplexed sequencing, we have eliminated assembled transcripts that are less than 1 TPM. However, even after this screening, as the summary statistics shows in Table 2 , The assembly contains a large number of assembled contigs, with Complete + Partial BUSCO score of 99.44%, indicating that this is a very comprehensive assembly. This assembly file in FASTA format is available as Supplementary File 1. This reference was used to quantify and visualize the expression levels of the seven RNA-Seq datasets. Tab-delimited text data of all expression values of all genes is attached as Supplementary File 2. The expression profiles shown as heatmap clustering ( Fig. 1 ) shows reproducible expression in neighboring life stages, and specific gene expression patterns in early and late developmental stages. Correlation among the samples are also shown in Fig. 2 , where strong correlation is seen among 1st three instars and among later stages (S, M, ML, and adult samples).

Experimental Design, Materials and Methods
Sample preservation, RNA extraction, sequencing and assembly were conducted using methods previously described for spiders [1] , with some modifications. Briefly, a whole body of the single specimen of each of the developmental stages of G. bimaculatus was snap-frozen with liquid nitrogen and stored at −80 °C, and RNA was extracted using TRIzol reagent (Thermo Scientific) after homogenization with a metal cone using the Multi Beads Shocker (Yasui Kikai). RNA was further purified using RNeasy Plus Mini Kit (Qiagen). Purified total RNA was quantified with a NanoDrop 20 0 0 (Thermo Scientific) and a Qubit Broad Range (BR) RNA assay (Life Technologies). The integrity of RNA was estimated by electrophoresis using TapeStation 2200 with RNA Screen Tape (Agilent Technologies). The sequence library was prepared using NEB Next Ultra II RNA Library Prep Kit for Illumina (NEB) according to manufacturer's protocol, and then was sequenced on a NextSeq 500 (Illumina) with 300 cycles of high-output mode as pairedend reads. Sequences were base called and demultiplexed, and adaptor sequences were removed with bcls2fastq v.2 software (Illumina).
To generate a reference transcriptome assembly, duplicate reads were first removed using CD-HIT, and 6 M paired reads were subsampled from the seven samples for computational efficiency. Transcriptome assembly was then performed on the subsampled data using Bridger software with default parameters [2] . Expression levels of transcripts were quantified with Kallisto v.0.44 [3] using the assembled transcriptome, firstly using the subsampled data, and contigs with less than 1 TPM (transcript per million) expression were removed to eliminate possible cross-contaminations during multiplexed sequencing. Assembly completeness was assessed using BUSCO v.3 [4] with the Arthropoda dataset through the gVolante server [5] . Using the refined reference, expression levels of all conditions were calculated with Kallisto. Statistical analyses were performed using JMP software v.14.1 (SAS Institute).

Ethics Statement
All experiments were conducted following the Japanese law and guidelines from The Science Council of Japan, as well as the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan.

Declaration of Competing Interest
Hiroyuki Nakamura and Rintaro Ohtoshi are employees of Spiber Inc., a company selling protein materials. However, Spiber Inc. had no role in study design and formal analysis.