Metabolomic and transcriptomic data on major metabolic/biosynthetic pathways in workers and soldiers of the termite Prorhinotermes simplex (Isoptera: Rhinotermitidae) and chemical synthesis of intermediates of defensive (E)-nitropentadec-1-ene biosynthesis

Production of nitro compounds has only seldom been recorded in arthropods. The aliphatic nitroalkene (E)-nitropentadec-1-ene (NPD), identified in soldiers of the termite genus Prorhinotermes, was the first case documented in insects in early seventies. Yet, the biosynthetic origin of NPD has long remained unknown. We previously proposed that NPD arises through the condensation of amino acids glycine and/or l-serine with tetradecanoic acid along a biosynthetic pathway analogous to the formation of sphingolipids. Here, we provide a metabolomics and transcriptomic data of the Prorhinotermes simplex termite workers and soldiers. Data are related to NPD biosynthesis in P. simplex soldiers. Original metabolomics data were deposited in MetaboLights metabolomics database and are become publicly available after publishing the original article. Additionally, chemical synthesis of biosynthetic intermediates of NPD in nonlabeled and stable labeled forms are reported. Data extend our poor knowledge of arthropod metabolome and transcriptome and would be useful for comparative study in termites or other arthropods. The data were used for de-replication of NPD biosynthesis and published separately (Jirošová et al., 2017) [1].

deposited in MetaboLights metabolomics database and are become publicly available after publishing the original article. Additionally, chemical synthesis of biosynthetic intermediates of NPD in nonlabeled and stable labeled forms are reported. Data extend our poor knowledge of arthropod metabolome and transcriptome and would be useful for comparative study in termites or other arthropods. The data were used for de-replication of NPD biosynthesis and published separately (Jirošová et al., 2017) [ Fig. 2 and raw data submitted under xxx in Dryiad will be useful for comparative study on metabolomics of other termite species as well as for other arthropods.
Transcriptomic data exemplified in Figs. 3-5 and raw data submitted under xxx in Dryiad will be useful for comparative study on transcriptome of other termite species as well as for other arthropods and we welcome other groups using our data and building a cooperation in future.
Chemical synthetic methods here used can be extended to other aminoalcohols, aminoketones and nitro compounds with different chains and substitution pattern.

Data
Detailed description of chemical synthesis of standards and incubation probes used as biosynthetic precursors for NPD biosynthesis pathway elucidation (Figs. 1, 6 and 7).

Synthesis of intermediates and metabolites
The following standards of precursors and putative intermediates were synthesized: 1-nitropentadecan-2-one, 1-nitropentadecan-2-ol 1-aminopentadecan-2-ol and NPD. The following analogs labeled with deuterium at 13 carbon atoms of the aliphatic chain were synthesized:

Sample preparation
For UHPLC-ESI-MS/MS system (Q-Exactive, Thermo) analyses, soldiers and control workers were homogenized in a glass teflon homogenizer and extracted in dichloromethane:methanol (2:1, v/v) (5 individuals per sample in 250 µl) for 40 min at room temperature. After sonication, the samples were filtered through extracted cotton wool in glass Pasteur pipettes and 10 μL was injected into UPLC. For GC-FID analyses, individual soldiers were put into glass vials containing 50 μL of dichloromethane, homogenized and sonicated for 5 min. The liquid fraction was injected into a GC.

UHPLC-ESI-MS
Samples were analyzed on Ultimate 3000 series RSLC (Dionex, Sunnyvale, CA, USA) system coupled to a Q-Exactive Plus Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with an ESI source. The separation was performed on Acclaim RSLC column (2.1 mm×150 mm, C18, 2.2 µm particles with 120 Å pore diameter, Thermo Scientific) using water with 0.1% formic acid as solvent A and acetonitrile with 0.1% formic acid as solvent B at 300 µl flow rate. The gradient was as follows

Data analysis
For semi-quantification based on the retention times and mass spectra of reference compounds, NPD and its precursors and likely intermediates were identified in termite extracts. Several primary metabolites involved in the metabolism of the Krebs cycle and fatty acids (FAs); additionally, amino acids were identified based on accurate mass data and in some cases on CID spectra. The peaks of these compounds were integrated in Xcalibur and the area under the curve was calculated.

RNA-Seq and differential gene expression analysis
RNA was extracted from the following dissected tissues and body parts of P. simplex soldier and worker caste: 1) soldier abdominal cavity without gut (pooled tissue from 10 specimens), 2) soldier legs (10 specimens), soldier frontal glands (50 specimens), and worker abdominal cavity without gut (10 specimens) stored in TRIzol (Invitrogen) at −80°C prior to RNA extraction. Total RNA was extracted using standard phenol-chloroform procedure with TRIzol according to the manufacturer's protocol (Life Technologies), followed by digestion of DNA contaminants with TURBO DNase (Ambion) at 37°C for 1 h and subsequent RNA purification using the RNeasy Mini Kit (Qiagen) according to the manufacturer's protocol for RNA cleanup. The quantity of RNA was determined using a Nanodrop ND-1000 UV/Vis spectrophotometer (Thermo Fisher Scientific). The integrity of the RNA was verified using an Agilent 2100 Bioanalyzer and a RNA 6000 Nano Kit (Agilent Technologies, Palo Alto, CA).
Tissue-specific transcriptome sequencing of the four different RNA samples was performed with poly(A)þ enriched mRNA fragmented to an average of 150 nucleotides. Sequencing was carried out by the Max Planck Genome Center Cologne (MPGCC) on an Illumina HiSeq. 2500 Genome Analyzer platform using paired-end (2×100 bp) reads. This yielded approximately 25 million paired-end reads for each of the four samples. Quality control measures, including the filtering of high-quality reads based on the score given in fastq files, removal of reads containing primer/adaptor sequences and trimming of read length, were carried out using CLC Genomics Workbench v8.1 (http://www.clcbio.    com). The de novo transcriptome assembly was carried out with the same software, combining all of the four RNA-Seq samples, and selecting the presumed optimal consensus transcriptome as described in Vogel et al. [5]. The resulting final de novo reference transcriptome assembly (backbone) of P. simplex contained 79,916 contigs (minimum contig size ¼ 300 bp) with a N50 contig size of 1486 bp and a maximum contig length of 27,056 bp. The transcriptome was annotated using BLAST, Gene Ontology and InterProScan searches using BLAST2GO PRO v3.1 (www.blast2go.de).
Digital gene expression analysis was carried out using CLC Genomics Workbench v8.1 to generate BAM (mapping) files, and QSeq Software (DNAStar Inc., Madison, WI, USA) was then used to remap the Illumina reads from all four samples onto the reference transcriptome followed by counting the sequences to estimate expression levels, using previously described parameters for read mapping and normalization [5].

Transparency document. Supporting information
Supplementary data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2018.04.052.