MicroRNA dataset of normal and Nosema ceranae-infected midguts of Apis cerana cerana workers

Nosema ceranae is a widespread fungal pathogen of honeybees, which is infective to all castes in the colony, including queens, drones and workers. Nosemosis caused by N. ceranae poses a big challenge for apiculture all over the world. Here, midguts of normal and N. ceranae-infected Apis cerana cerana workers at 7 and 10 days post infection were sequenced utilizing small RNA sequencing (sRNA-seq) technology. Totally, more than 150.54 Mb raw reads were produced in this article, and over 144.26 Mb high-quality clean reads with a mean ratio of 95.83% were obtained after strict filtering and quality control. For more insight please see “Comparative identification of microRNAs in Apis cerana cerana workers' midguts responding to Nosema ceranae invasion” (Chen et al., 2019). Raw data are available in NCBI Sequence Read Archive (SRA) database under the BioProject number PRJNA487111. Our data can be used for investigating differentially expressed microRNAs (miRNAs) and piRNAs and their regulatory roles engaged in A. c. cerana response to N. ceranae infection, and for offering potential candidates for uncovering the molecular mechanisms regulating eastern honeybee-microsporidian interactions.


Data
N. ceranae spores (Fig. 1A) were purified with Percoll discontinuous density gradient centrifugation, followed by validation with specific primers and agrose gel electrophoresis (Fig.1B). After being starved for 2 h, each worker of A. c. cerana was artificially inoculated with 50% sucrose solution containing N. ceranae spores (Fig. 1C). The shared miRNA profile is from normal and N. ceranae-infected midguts of A. c. cerana workers [1]. On average, more than 12.54 Mb raw reads in each group were yielded from sRNA-seq, and over 12.02 Mb (95.83%) clean reads were gained after strict filtering and quality control (Table 1). Additionally, Pearson correlation coefficients between different biological replicas within each control and N. ceranae-infected group were above 0.9768 and 0.9912, respectively (Fig. 2) [1]. In total, 14 differentially expressed miRNAs (DEmiRNAs) were observed in midgut at 7 days post inoculation (dpi) with N. ceranae (AcT1) compared with corresponding normal midgut (AcCK1), including eight up-regulated and six downregulated miRNAs (Table 2); while 12 miRNA with differential expressions were detected in midgut at 10 dpi with N. ceranae (AcT2) compared with corresponding normal midgut (AcCK2), including nine upregulated and three down-regulated ones ( Table 3). The raw data were deposited in the Sequence Read Archive (SRA) database (http://www.ncbi.nlm.nih.gov/sra/) and connected to BioProject PRJNA487111.

Honeybee midgut sample preparation
Frames of a sealed brood comb from a healthy colony of A. c. cerana were kept in an incubator at 34 ± 2 C to offer newly emerged Nosema-free workers. Workers 24 h after eclosion were used for artificial inoculation, following the previously developed standard method [2]. In brief, each worker in Specifications table   Subject Biology Specific subject area Transcriptomics Type of data Table, Figure  How

Experimental features
Midgut samples in control groups were harvested from A. c. cerana workers inoculated with sterile sucrose solution, while midgut samples in treatment groups were harvested from workers inoculated with sterile sucrose solution containing N. ceranae spores. Total RNA of control and N. ceranae-infected groups were extracted followed by small RNA library construction and next-generation sequencing using the single-end strategy.

Data source location
College of Bee Science, Fujian Agriculture and Forestry University, Fuzhou, China

Value of the data
The datasets offer comprehensive information associated with small RNAs including miRNAs and piRNAs in normal and N. ceranae-infected A. c. cerana workers. Our data provide a valuable genetic resource and potential candidates for further investigation of the regulatory roles of miRNAs involve in N. ceranae-response of A. c. cerana. This data is beneficial for deciphering the molecular mechanisms regulating the eastern honeybee-microsporidian interactions.
N. ceranae-treated group was fed with 5 mL of a 50% sucrose (w/w in water) solution containing 1 Â 10 6 N. ceranae spores [3], while each worker in control group was fed with 5 mL of a 50% sucrose solution without N. ceranae spores. There were three cages (30 workers per cage) for each N. ceranae-treated group and three cages (30 workers per cage) for each control group. Midguts of nine workers from each cage in the N. ceranae-treated and control groups were respectively collected at 7 dpi and 10 dpi and immediately pooled, frozen in liquid nitrogen, and stored at À80 C until deep sequencing.

Small RNA library construction and next-generation sequencing
Small RNA libraries were constructed according to the general protocol [1]. Briefly, total RNA of each midgut sample in N. ceranae-treated and control groups were extracted using TRIzol Reagent followed    strategy. The libraries were as follows: AcCK1-1, AcCK1-2 and AcCK1-3 as replicate libraries for normal midguts at 7 dpi with sucrose solution; AcT1-1, AcT1-2 and AcT1-3 as replicate libraries for midguts at 7 dpi with sucrose solution containing N. ceranae spores; AcCK2-1, AcCK2-2 and AcCK2-3 as replicate libraries for normal midguts at 10 dpi with sucrose solution; AcT2-1, AcT2-2 and AcT2-3 as replicate libraries for midguts at 10 dpi with sucrose solution containing N. ceranae spores. All sRNA sequencing data produced in our study are available in NCBI SRA database under BioProject number: PRJNA487111.

Quality control and sequence analysis
The raw data generated from the platform were pre-processed to exclude low-quality reads (length < 20 nt and ambiguous N), 5 0 adapter, 3 0 adapter and poly(A) sequences, then the obtained clean reads were aligned against NCBI GeneBank and Rfam databases to remove noncoding RNA such as rRNA, scRNA, snoRNA, snRNA and tRNA, followed by comparison with exons and introns in the A. cerana genome (assembly ACSNU-2.0) to classify mRNA degradation products and the repeat associate miRNA sequences. All the downstream analyses were carried out using the clean reads with high quality.
Bowtie (v 1.1.0) [4] was used to align the filtered sequences against miRBase 21.0 by allowing at most two mismatches outside of the seed region, and small RNAs that matched exist miRNAs of other animal species in miRBase were identified as known miRNAs. The sequences that did not match known miRNAs were used to predict potentially novel miRNA candidates using RNAfold software [5]. Only sequences with typical Stem-loop hairpins, mature length distributed between 18 nt and 26 nt and free energy lower than À20 kcal/mol were considered as potential novel miRNAs. The suffixes "-x" and "-y" mean a certain miRNA deriving from the processing of the 5 0 and 3' arms of its precursor, respectively; while the suffix "-z" means a certain miRNA with unknown processing direction.
The miRNA expression levels in each sample were normalized to the total number of sequence tags per million (TPM) following the formula: normalized expression ¼ mapped read count/total reads Â 10 6 . The differential expression of miRNAs in each comparison group was analyzed using the DEGseq R package [6]. The criteria of p value＜0.05 and jlog 2 (Fold change)j＞1 were set as the threshold for statistically significant differential expression, and p values were adjusted using q value.