Metabolome and transcriptome related dataset for pheromone biosynthesis in an aggressive forest pest Ips typographus

Eurasian spruce bark beetle, Ips typographus, is an aggressive pest among spruce vegetation. I. typographus host trees colonization is mediated by aggregation pheromone, consisting of 2-methyl-3-buten-2-ol and cis-verbenol produced in the beetle gut. Other biologically active compounds such as ipsdienol and verbenone have also been detected. 2-Methyl-3-buten-2-ol and ipsdienol are produced de-novo in the mevalonate pathway and cis-verbenol is oxidized from α-pinene sequestrated from the host. The pheromone production is presumably connected with further changes in the primary and secondary metabolisms in the beetle. To evaluate such possibilities, we obtained qualitative metabolomic data from the analysis of beetle guts in different life stages. We used Ultra-high-performance liquid chromatography-electrospray ionization-high resolution tandem mass spectrometry (UHPLC-ESI-HRMS/MS). The data were dereplicated using metabolomic software (XCMS, Camera, and Bio-Conductor) and approximately 3000 features were extracted. The metabolite was identified using GNPS databases and de-novo annotation in Sirius program followed by manual curation. Further, we obtained differential gene expression (DGE) of RNA sequencing data for mevalonate pathway genes and CytochromeP450 (CyP450) genes from the gut tissue of the beetle to delineate their role on life stage-specific pheromone biosynthesis. CyP450 gene families were classified according to subclasses and given individual expression patterns as heat maps. Three mevalonate pathway genes and five CyP450 gene relative expressions were analyzed using quantitative real-time (qRT) PCR, from the gut tissue of different life stage male/female beetles, as extended knowledge of related research article (Ramakrishnan et al., 2022). This data provides essential information on pheromone biosynthesis at the molecular level and supports further research on pheromone biosynthesis and detoxification in conifer bark beetles.

Further, we obtained differential gene expression (DGE) of RNA sequencing data for mevalonate pathway genes and Cy-tochromeP450 (CyP450) genes from the gut tissue of the beetle to delineate their role on life stage-specific pheromone biosynthesis. CyP450 gene families were classified according to subclasses and given individual expression patterns as heat maps. Three mevalonate pathway genes and five CyP450 gene relative expressions were analyzed using quantitative real-time (qRT) PCR, from the gut tissue of different life stage male/female beetles, as extended knowledge of related research article (Ramakrishnan et al., 2022). This data provides essential information on pheromone biosynthesis at the molecular level and supports further research on pheromone biosynthesis and detoxification in conifer bark beetles.

Value of the Data
• Provided dataset of various metabolites and relative gene families from the gut tissue of Ips typographus is valuable for researchers with interest in studying different life stages of the bark beetle. • Metabolomic data from UHPLC -HR-MS/MS analysis has provided insight of metabolites in different measurement modes and shared in dryad link. The acquisition methods of this data (using bioinformatics software programs such as GNPS, Sirus) are vital information for aiding similar analysis in the future and to developing bioinformatics tools for high-throughput metabolomics analysis. • RNA seq. data revealed expression patterns of key gene families from the gut tissue of bark beetle life stages. This is valuable insight knowledge, allowing the researchers to follow up the present study with further research questions aligning with identified gene families. • Information of standardized housekeeping genes and the quantitative real-time (qRT)-PCR data covers the knowledge gap, not included in the related research article. • Henceforth, listed data in this article will be of added value for researchers to understand pheromone biosynthesis and metabolism of the related compounds in I. typographus and other bark beetle species and thus help to interrupt the beetle aggregation over spruce vegetation.

Data Description
The dataset we provided here is subjected to gut tissue of different lif e stages of the bark beetle, I. typographus . Ultra-high-performance liquid chromatography-electrospray ionizationhigh resolution tandem mass spectrometry (UHPLC-ESI-HR-MS/MS) data identified various metabolite compounds from the gut extracts using positive and negative ion mode and the results were shared in Table 1 . Multivariate analysis of UHPLC -HR-MS/MS is shown in both positive and negative mode ( Fig. 1a and b ). Identified compounds clustering based on Partial least squares-discriminant analysis (PLS-DA) for different life stages of the beetle was given with different colours in Fig. 1a and b . Specific compounds masses responsible for the separation of life stages were listed with respective m/z ratio and retention time (RT) in Fig. 1a , B-G for positive mode, and in Fig. 1b , B-H for negative mode analysis. Fatty acids (C16 and C18) quantitative data over life stages are shown in Fig. 2 . Proportions of identified metabolite classes from Table 1 are shown as Venn diagrams for both positive and negative ion mode in Fig. 3 . Insight of di-glycosylated monoterpene alcohols was measured in both modes and masses were shared as peaks and CID spectra ( Fig. 4 ). Mevalonate pathway compounds such as isopentyl-di-phosphate (IPP)/ dimethylallyl pyrophosphate (DMAPP) were visible in negative mode analysis with the help of synthetic standards and provided as peaks and CID spectra ( Fig. 5 ).    Furtherly, RNA sequencing data is shown as heatmaps for the expression pattern of the interested gene families between the life stages of the beetle. Primarily, insight of gene families such as the mevalonate pathway genes as heat map expression ( Fig. 7 ) and further sesquiterpene compound producing genes from the pathway was described using the quantitative real-time (qRT)-PCR ( Fig. 8 ). Identified 56 Cytochrome P450 genes (CyP450) and their overall expression given as a heat map ( Fig. 9 ), with specific subclusters based on names acquired from Gene Ontology (GO) web reference using sequence similarity approach ( Table 2 ). The expression pattern of the CyP450 gene seven subclusters was provided separately as heat map expression pattern in Figs. 10A , 10B , 10C and 10D which belongs to CyP450 6 like, CyP450 9e2 like, CyP450 9a1 like, CyP450 4 like and unknown CyP450 respectively. Furthermore, we studied qRT-PCR data of functionally known CyP450 gene known with sequence similarity from other bark beetle species and provided their expression level between mated male gut tissue and mated female gut tissues of I. typographus in Fig. 11 . Added information of the housekeeping gene list with thirteen genes was ranked and provided after standardization ( Fig. 6 ), which supports the related research article [5] for future gene study in mentioned tissue of the beetle.  Table 1 . CID spectra from isolated fixed precursor ion scans (lower panel) : In negative ion mode, the intensive molecular peak is visible @ m/z 445.20685 with molecular composition C 21 H 33 O 10 . Ion @ m/z 161.0445 with molecular composition C 6 H 5 O 5 is a deprotonated hexose. Two fragment ions C 16 H 25 O 6 and C 11 H 17 O 9 resulted from a loss of dehydro-pentose or monoterpene neutrals, respectively. In positive ion mode, molecular adduct peak is not visible, but low mass carbocation fragments (like C 10 H 15 and C 7 H 9 ) indicate monoterpene aglycone. The proposed structure of carbocation is given in the last CID spectra.

Table 2
Identified 56 genes from CytochromeP450 gene family RNA seq data were clustered as shown in the table. Seven subcluster based on Multiple sequence alignment -Unipro UGENE v33.0 maximum likelihood is given as the table with the color difference. Sub-cluster 5, 6, and 7 were shown in similar colours since they were closely related. Cytochrome name replaced as Cy from RNA seq. data and the names were given based upon GO web reference using CLC workbench software. Tissue compared: fed male gut and immature male gut.

Experimental Design, Materials and Methods
Beetle rearing conditions and gut dissections were mentioned in relevant research article [5] . Before analysis, the guts were dissected from beetles of different life stages for further analysis.

Ultra-high-performance liquid chromatography-electrospray ionization -high resolution tandem mass spectrometry (UHPLC-ESI-HRMS/MS) analysis
Gut tissue was dissected (5 guts /sample) and collected in ethyl acetate (5 μl/gut) for storage at −80 °C before analysis. Gut extracts (solvent without gut) were removed for the nonpolar fraction. For polar extraction, rest of the solvent was removed by a gentle stream of nitrogen, and the remaining tissue was extracted (7 ml/gut) with MeOH/water/acetic acid (70/30/0.5 v/v) mixture containing 13 C 2 -myristic acid (1 μg/ml) standard. After sonication on ice (5 min) the tissue was disrupted with a pre-chilled Eppendorf tip and sonicated for an additional 5 min. The samples were then centrifuged at 40 0 0 RPM for 3 min and the supernatant was collected in a new vial with 100 μl glass insert. Gut extracts with nonpolar and polar fractions were used for UHPLC -HRMS/MS analysis [5] .
UHPLC-ESI-HRMS/MS was performed at Ultimate 30 0 0 series RSLC (Dionex) coupled with Q-Exactive HF-X mass spectrometer (Thermo Fisher Scientific, Waltham, USA). Water (solvent A) and acetonitrile (solvent B, LiChrosolv hyper grade for LC-MS; Merck, Darmstadt, Germany), both with 0.1% (v/v) formic acid (Eluent for LC-MS, Sigma Aldrich, Steinheim, Germany), were used for the binary solvent system. After injection of 10 μl extract, chromatographic separation was performed with a constant flow rate of 300 μl/min using an Acclaim C18 column (150 × 2.1 mm, 2.2 μm; Dionex, Borgenteich, Germany). Solvent gradients (B 0.5-100% v/v for 15 min; 100% B for 5 min; 100-0.5% v/v for 0.1 min; 0.5% for 5 min) were used. Ionization in HESI ion source was achieved by 4.2 kV cone voltage, 35 V capillary voltage, and 300 °C capillary temperature in the transfer tube in positive ion mode and 3.3 kV cone voltage, 35 V capillary voltage, and 320 °C capillary temperature in negative mode. Mass spectra were recorded in the positive and negative ion mode at m/z 80-800 mass range in duplicate. Date-dependent acquisition using TOP5 routine was used with one survey scan mass resolution 60,0 0 0 (HWFM), and 5 CID scans with 7500 resolution in ca 0.3 s. Colision-induced dissociation (hcd) of quadrupole selected precursor (0.8 Da mass window) was done in a collision cell at typically normalised fragmentation energy 30 eV. For identification pairs of the accurate mass of ions and their collision-induced ionization fragments with the retention time values were interpreted using software XCALIBUR (Thermo Fisher Scientific, Waltham, USA).
To identify metabolites, samples were compared and statistically evaluated using the software MetaboAnalyst 5.0 [3 , 8] , and determined masses were compared with the database. The highresolution LC-MS raw spectra were first centroided by converting them to mzXML format using the MS Convert feature of ProteoWizard 3.0.18324. Data processing was subsequently carried out with R Studio v1.1.463 using the Bioconductor XCMS package v 3.4.2 [1 , 9 , 10] , which contains algorithms for peak detection, peak deconvolution, peak alignment, and gap filling. The resulting peak list was uploaded into MetaboAnalyst 5.0 [3 , 8] , a web-based tool for metabolomics data processing, statistical analysis, and functional interpretation where statistical analysis and modeling were performed. Missing values were replaced using a (K-nearest neighbor) KNN missing value estimation. Data filtering was implemented by detecting and removing non-informative variables characterized by near-constant values throughout the experimental conditions by comparing their robust estimate interquartile ranges (IQR). Data was auto-scaled out of the 3020 mass features originally detected, using the Principal Least Square Discriminant Analysis PLS-DA [4] .
To identify candidate metabolites, the individual mass features that contributed to the separation between the different classes were further characterized by applying a range of univariate and multivariate statistical tests to determine their importance including the PLS-DA importance variables, t -test, and Random Forest. This information, along with retention time, accurate mass, and MS/MS spectra were used to probe into existing literature and databases. MS/MS spectra files were also centroided and imported into GNPS [11] for spectral matches and classical molecular networking. The obtained database hits were manually evaluated. First, we looked for the quality of mass spectral peak matching, and later, we considered only reasonable hits. The hits related to contaminations were determined at this stage and are labeled in black. Obtained hits were collected in Table 1 and colored depending on the biosynthetic class of described compounds.

RNA sequencing (RNA seq.) analysis
Dissected gut tissues were put in RNA later solution (10 μl/gut) and 10 guts per biological sample were used. RNA extraction was performed using the pre-optimized protocol [6 , 5] . The quality and quantity of the extracted RNA were evaluated using agarose gel and Qubit, respectively. Integrity was determined using the 2100 Bioanalyzer system (Agilent Technologies, Inc). Better quality RNA samples (RIN > 7) were sent for sequencing (150 bp paired-end reads, minimum 30 mil. reads per sample) to Novo-gene sequencing company, China [5] .
Quantification of gene expression from the RNA sequence data was performed using CLC workbench was used to standardize by pre-optimized setting for mapping exon regions exclusively with genome reference. The biases in the sequences datasets and different transcript sizes were corrected using the TPM algorithm to obtain correct estimates for relative expression levels. Finally, Empirical analysis of differential Gene expression (DGE) was performed using the recommended parameters [6 , 7] . For DGE, FDR corrected p-value cut off < 0.05 and fold change cut off of ± 4 -fold as a threshold value for being significant. Differentially expressed genes were functionally annotated using the "cloud blast" feature within the "Blasto2GO plugin" in CLC Genomic Workbench. Nucleotide blast was done against the arthropod database with an Evalue cut off 1.0E-10. Both, annex and GO slim was used to improve the GO term identification further by crossing the three GO categories (biological process, molecular function, and cellular component) to search for name similarities, GO term, and enzyme relationships within KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database [5] .

Quantitative real-time-PCR (qRT-PCR) analysis
qRT-PCR was used to validate the list of selected genes. Primers were designed using IDT's primer design software as given in Tables 3 and 4 . cDNA for RT-qPCR was synthesized using RNA from respective gut tissue samples. cDNA was synthesized using an M-MLV reverse transcriptase Table 3 Primers designed for mevalonate pathway gene family in IDT primer quest designing tool with primer length of 18-25 bp Tm-55-65, GC-50-60%, Amplicon size:100-150 bp. Modified from kit following the manufacturer protocol. Resulted in cDNA samples were diluted up to 1:4 with nuclease-free water, and qRT-PCR was performed using SYBR TM Green PCR master mix (Applied Biosystems, USA) under the following parameters: 95 °C for 3 min, 40 cycles of 95 °C for 3 s, 60 °C for 34 s [2 , 7 , 5] . Melt curves were generated to ensure single product amplification. The expression levels of the target genes were calculated using the 2-Ct method with optimized two housekeeping genes as a reference for normalization with four biological replications.

Ethics Statement
We have performed all beetle experiments comply with the ARRIVE guidelines and are being carried out in accordance with the U.K. Animals (Scientific Procedures) Act, 1986 and associated guidelines, EU Directive 2010/63/EU for animal experiments , or the National Institutes of Health guide for the care and use of laboratory animals (NIH Publications No. 8023, revised 1978).

Declaration of Competing Interest
None.