A Transcriptome Survey Spanning Life Stages and Sexes of the Harlequin Bug, Murgantia histrionica

The harlequin bug, Murgantia histrionica (Hahn), is an agricultural pest in the continental United States, particularly in southern states. Reliable gene sequence data are especially useful to the development of species-specific, environmentally friendly molecular biopesticides and effective biolures for this insect. Here, mRNAs were sampled from whole insects at the 2nd and 4th nymphal instars, as well as sexed adults, and sequenced using Illumina RNA-Seq technology. A global assembly of these data identified 72,540 putative unique transcripts bearing high levels of similarity to transcripts identified in other taxa, with over 99% of conserved single-copy orthologs among insects being detected. Gene ontology and protein family analyses were conducted to explore the functional potential of the harlequin bug’s gene repertoire, and phylogenetic analyses were conducted on gene families germane to xenobiotic detoxification, including glutathione S-transferases, carboxylesterases and cytochrome P450s. Genic content in harlequin bug was compared with that of the closely related invasive pest, the brown marmorated stink bug, Halyomorpha halys (Stål). Quantitative analyses of harlequin bug gene expression levels, experimentally validated using quantitative real-time PCR, identified genes differentially expressed between life stages and/or sexes.


Introduction
The harlequin bug, Murgantia histrionica (Hahn) (Hemiptera: Pentatomidae), though native to Mexico and Central America, has been a serious pest of mustard-family crops (Brassicaceae) in the southern USA for~150 years, and also infests plants in the caper family (Capparaceae) [1][2][3]. It is limited in its northern spread by cold winter temperatures, and through the mid-20th century, there were destructive harlequin bug outbreaks following mild winters [3][4][5][6]. Earlier still, the insect's unexpected arrival in Confederate states during the Civil War prompted a formal accusation that the Union had introduced it as a deliberate act of biological warfare [7].
Destruction of crop debris in which bugs overwinter, use of soaps and oil emulsions, handpicking, and trap cropping with attractive plants such as early cruciferous crops, were recommended controls in the 19th and early 20th centuries [3]. With the advent of synthetic organic insecticides, harlequin bug was suppressed by broad-spectrum organophosphates and carbamates, and later by pyrethroids and Another class of gene targets involves enzymes used in detoxification of xenobiotic compounds, which includes synthetic insecticides. Glutathione S-transferases (GSTs), carboxylesterases (COEs), and CYPs are three classes of enzymes that have been widely associated with rising levels of insecticide resistance, and their characterization is of urgent concern in maintaining feasible and effective pest management programs. In particular, RNAi-mediated knockdown of these genes may result in more efficacious and/or efficient use of chemical pesticides, thereby reducing the monetary and environmental costs associated with their use in modern agricultural production operations.
In insects, GSTs are typically split into two groups: cytosolic and microsomal. Cytosolic GSTs are ordinarily divided into six subclasses: Delta, Epsilon, Sigma, Omega, Theta and Zeta [18,19]. In particular, Delta and Epsilon groups are unique to insects and have been widely associated with insecticide resistance; although the Delta type are observed among the Insecta in general, the Epsilon subclass appears to be unique to the Holometabola [18]. In contrast to cytosolic GSTs, those of the Insecticide resistance among hemipterans has been associated with multiple carboxylesterase groupings. Point mutations of acetylcholinesterases, as well as gene duplication and over-expression of E4 and FE4 β-esterases, have been reported in Myzus persicae [20] and Aphis gossypii [21]. The most recent classification of carboxylesterases divides these genes into three major clades [22][23][24]. The first and most basal is the neurodevelopment and cell adhesion clade, which comprises neuroligins, glioactins, neurotactins, acetylcholinesterases and glutacin enzymes. With the exception of acetylcholinesterases, all members of the neurodevelopmental clade are non-catalytic. The second is the hormone and semiochemical processing clade, which contains secreted β-esterases, integument esterases and juvenile hormone esterases. The third clade is associated with dietary and detoxification functions and contains α-esterases.
Cytochrome P450s have a two-fold role in gene-environment interactions, participating both in host biosynthetic pathways, as well as in detoxification of xenobiotic compounds. Biosynthesis of critical endogenous molecules via P450s can be targeted by pesticides. Molting, hormone/pheromone synthesis and turnover, and cuticular hydrocarbon waterproofing may all be targets to exploit for pest control. Conversely, insects have evolved modified P450s to detoxify exogenous chemicals like pesticides, leading to resistance. These two classes of P450s are easily observed in phylogenetic trees. Highly conserved one-to-one orthologs between insect species are parts of pathways to make essential biomolecules like ecdysone (Halloween genes: CYP302, CYP306, CYP307, CYP314, and CYP315; [25]), juvenile hormone (CYP15; [26]), as well as fatty-acid-derived alkanes and alkenes for exoskeleton coating (CYP4G; [27,28]).
Resistance has been associated with numerous cytochrome P450s, often members of "gene blooms", which are large expansions of P450s in tandem duplication arrays on chromosomes. These are not highly conserved or even limited to one CYP clan. Almost any P450 family can become adapted to detoxify a pesticide [29][30][31]. Resistance may not only be due to pesticide inactivation, but it may be caused by blocking pesticide entry via thickening of the cuticular hydrocarbon barrier [32]. On the biocontrol side, entomopathogenic fungi kill insects by using P450s like CYP52X1 to degrade and penetrate the hydrocarbon coating on insects [33].
Mating in many if not most insects is dependent on pheromone signaling. The volatile chemicals must be synthesized, released and rapidly turned over at the receptors in antennae. P450s are actively involved in both areas [34][35][36]. Even the status of the queen in social insects, such as the termite Cryptotermes secundus, depends on molecules made by P450s [37]. Disruption of these signals by P450 inhibitors may also constitute an effective pesticidal mode of action.
In this study, we provide the first large scale assembly of transcriptomic sequences in the important agricultural pest, Murgantia histrionica (Hahn). We also identify a number of pheromone synthesis-and insecticide resistance-related genes with potential utility in the development of molecular biopesticides and/or transgenic organisms for insect biocontrol.

Materials and Methods
Field-collected harlequin bugs were cultured for several generations in a laboratory setting without exposure to insecticides. Original collections were by hand from their host plants in gardens and small farms within 80 km of Beltsville, MD. They were reared under 16:8 L:D photoperiod on collards in a greenhouse with temperature 18 to 28 • C. Messenger RNA populations were extracted from whole-insect preparations from 2nd and 4th instar nymphs, as well as female and male adults. For 2nd instar nymphs, 15 insects were pooled per bioreplicate, whereas five insects were pooled for all other samples. Only one bioreplicate was submitted for PE100 transcriptome sequencing (i.e., paired-end reads, 100 bp apiece) on an Illumina HiSeq instrument: RNA libraries were prepared using the TruSeq RNA v2 kit per manufacturer's protocol (Illumina, San Diego, CA, USA). Total RNA quality was checked using an Agilent Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA), and enrichment for mRNA was done using poly-dT beads. Libraries were resuspended in Qiagen EB buffer (Qiagen, Germantown, MD, USA) and a Fragment Analyzer NGS (Advanced Analytical, Ankeny, IA, USA) was used to check library size and quality; library concentration was checked using a Qubit fluorometer (ThermoFisher Scientific/Life Technologies, Waltham, MA, USA).
Raw sequencing volumes achieved are shown in Table 1, and these sequencing data were uploaded to the NCBI Sequence Read Archive (SRA) division under the BioProject accession identifier, PRJNA302154. Raw reads were normalized in a sample-specific manner using utilities supplied with the Trinity RNA-Seq assembly software package, version 2.2.0 [38]. A global transcriptome assembly was produced using Trinity: default parameter settings were used, with the exceptions that the read normalization (as here applied to the set of pooled reads) and Trimmomatic-based quality control submodules were invoked as pre-processing steps [39]. The resulting putative unique transcripts (PUTs) were post-processed using PRINSEQ-lite [40], which in particular was used to clip any residual poly-A/T tails. Resultant PUTs having lengths of at least 200 bp were then compared with the UniVec database using the NCBI C++ Toolkit's vecscreen utility [41]-transcripts having hits flagged as "Strong" and/or "Moderate" were purged. Residual transcripts were checked for contaminants and low-quality elements using the NCBI Transcriptome Shotgun Assembly (TSA) division's internal screening system, and assessed for completeness with respect to known single-copy orthologs generally conserved among insects using the BUSCO pipeline [42]. The final assembly was also made publicly available under the aforementioned BioProject accession. NCBI-vetted PUTs were compared with the NR protein database using Blastx [43] with default parameter settings. PUTs were partitioned into gold, silver and bronze tiers accordingly to similarity with existing sequence data as described previously [44,45]: Briefly, each gold-tier entry involved a PUT at least 300 nucleotide residues long exhibiting a single high-scoring segment pair in its Blastx hit against a reference protein of at least 100 amino acids; at least 75% of aligned residues had to be positively similar, with the ratio of hit length to subject sequence length being at least 90%. Silver-tier PUTs were a minimum of 100 nucleotide residues long with a hit spanning at least 75% of an NR reference protein sequence's length. Bronze-tier entries involved PUTs at least 100 nucleotides long with hits covering at least 30% of the NR protein's length.
Protein family analyses were conducted using HMMER3 with default settings in conjunction with the Pfam-A (version 30) database [46,47]. Each PUT could be associated with zero, one or more distinct Pfams. Inferred translational products identified in the gold-tier PUT subset were analyzed for Pfam family content. An additional analysis was performed using the complete set of PUT sequences by comparing the longest ORF present in each (considered across all six frames) with Pfam-A. Annotations based on gene ontology (GO) terms were determined using the pfam2go mapping table made available by Pfam, for each of the ontology's cellular component, biological process and molecular function aspects. Fine-grained GO terms were resolved to their respective penultimate ancestral nodes in the directed acyclic graph (DAG) used to represent the ontology, thereby providing a more abstract view of the functional capacity inherent in the harlequin bug transcriptome-more specifically, these GO-Slim terms were recovered through traversals of "is_a" paths in the DAG using custom scripts.
Isoform-level expression amounts were estimated using the RSEM package and were recorded using the Transcripts per Million (TPM) measure [48]. Three contrasts were made based on these abundance estimates: 2nd vs. 4th instar nymphs, female vs. male adults, and nymphs vs. adults-reads from 2nd and 4th instar nymphs were pooled to compose the "nymphal" group, and those from adult males and females for the "adults" set. A listing of all PUTs exhibiting at least a two-fold expression differential is presented in Table S1. Validation of gene expression differences was performed on a subset of 16 transcripts using quantitative real-time PCR, incorporating three biological replicates and three technical replicates. Primers were designed using PrimerPlex version 2.62 (PREMIER Biosoft, Palo Alto, CA, USA); these data are also presented in Table S1. Following manufacturer protocol, cDNA was synthesized using SuperScript TM II Reverse Transcriptase (ThermoFisher Scientific). Primers targeting harlequin bug 18S rRNA (forward: 5 -TTTTATCCAGAAAATCCCGATCA-3 , reverse: 5 -ACAACAAGTCCTCCGAAAAACC-3 ) were used to control for RNA quantity, and those targeting harlequin bug elongation factor 1-alpha (forward: 5 -CGAGAAAGAGGCTCAGGAGA-3 , reverse: 5 -TCAGCCTGAGAAGTCCCTGT-3 ) were used as an internal reference for gene expression.
Real-time PCR was ran on an ABI 7500 instrument using Power SYBR Green master mix (ThermoFisher Scientific), as follows: 40 cycles of 50 • C for 2 min, 95 • C for 10 min, 95 • C for 15 s, 60 • C for 1 min; followed by 95 • C for 15 s, 60 • C for 1 min, 95 • C for 30 s, and finally 60 • C for 15 s.
To identify harlequin bug transcripts involved in insecticide resistance, proteins identified in the Gnomon-based annotation of the brown marmorated stink bug (Halyomorpha halys) genome as being carboxylesterases (COE) or glutathione-S-transferases (GST) were retrieved. Blastp was used to search the full H. halys inferred protein set to identify any COE or GST proteins that had not been properly labeled as such by automated methods. Following manual scrutiny, these refined sets were used as Blastp queries against the harlequin bug data-in particular, H. halys sequences were compared against the longest ORF present in each harlequin bug transcript to test for homology. Inclusion of an M. histrionica gene in the COE or GST gene family required its top high-scoring segment pair to exhibit a bit score of at least 75, a subject sequence length of at least 100 amino acid residues, a hit length to subject sequence length ratio of 0.90 or greater, a hit length to query sequence length ratio of 0.75 or greater, and a ratio of positively similar aligned residues to hit length of at least 0.80. Independently for each of these gene families, combined M. histrionica and H. halys protein sets were multiply aligned using the MUSCLE program [49], from which a maximum likelihood-based phylogeny was generated using the method of Le and Gascuel [50] as implemented in PhyML [51] using 100 bootstrap replicates. Phylogenies were visually rendered using the R package, phytools [52]. For both gene family analyses, GenomeThreader [53] was used to align M. histrionica proteins to the brown marmorated stink bug genome (available at https://i5k.nal.usda.gov/Halyomorpha_halys) in an effort to elucidate patterns of molecular evolution across this taxonomic divide.
The full PUT set was batch tBlastn searched with each cytochrome P450 subfamily from Halyomorpha halys (brown marmorated stink bug; 48 sequences). The top 50 hits for each search were kept using an expect value of 10. Blast results were imported into a spreadsheet and duplicate accessions were removed by filtering for unique entries only. In total, 478 PUTs were identified. These sequences were batch Blastx searched against 9125 named insect P450 sequences and the best hit was retained with percent identity and alignment length. The results were sorted by best blast hit which clustered closely related sequences together. The longest member of each group was translated and used to search the 478 PUTs for exact or near-exact matches. These sequences were binned into a collection from a single gene. Additional P450 PUTs were identified by HMMER searches of Pfam-A and Blastx searches of the NCBI protein database, NR. These were mostly short (i.e., <175 aa) fragments.
To determine whether harlequin bug harbors an iflavirus similar to that observed in the brown marmorated stink bug [54], raw RNA-Seq reads were mapped against the viral genome sequence available under NCBI Accession number KF699344.1 using bowtie2 [55]. PUTs potentially associated with organisms of the genera Pantoea, Erwinia and Nosema, which correspond to known pentatomid symbionts [45,56,57], were identified through Blastx comparisons with representative sequences present in the NR database. These comparisons involved the full set of assembled PUTs rather than those that had been explicitly vetted by the NCBI TSA, a vetting which involves removal of putative contaminant sequences, comprising microbial symbionts.

Results
Assembly of the full RNA-Seq dataset generated in this study produced 526,403 putative unique transcripts (PUTs) comprising 427,532,460 bases. Following data cleaning routines, a total of 523,102 NCBI-vetted PUTs, containing 425,689,726 assembled bases, was made available under BioProject accession number PRJNA302154. A total of 95.7% of conserved single-copy orthologs in the Insecta class were fully represented in this assembly, with 3.6% fragmented and only 0.7% absent. Sorting these sequences on the basis of extrinsic homology information resulted in the labeling of 19,303 gold-tier, 19,755 silver-tier and 33,482 bronze-tier PUTs. Pfam family analysis of translation products inferred from gold-tier PUT sequences identified 4518 unique families in total. Assaying Pfam family presence among longest ORFs present in the comprehensive PUT collection resulted in detection of 5352 distinct Pfam families in total. Table 2 presents the 25 most abundant Pfam entries detected in both datasets. These described a fairly typical array of protein family content, evincing presence of genes associated with cell signaling, DNA binding and innate insect immunity. Analysis of the complete PUT dataset yielded 552 distinct GO terms associated with the ontology's Biological Process aspect, 199 with Cellular Component and 627 with Molecular Function. Analysis on the gold-tier PUT subset resulted in identification of 463, 162 and 554 distinct terms corresponding to the Biological Process, Cellular Component and Molecular Function aspects, respectively. Supplementary Figure S1 presents the 10 most frequently encountered GO terms seen within each of these dimensions, for each set of PUTs; no appreciable differences were evident with which to distinguish functional capacity of the complete PUT set relative to the gold-tier subset. Analogous to its full-ontology counterpart, Supplementary Figure S2 presents the five most abundant GO-Slim terms observed. Similarly, the distribution of GO-Slim terms between the two datasets appeared equitable. Of the PUTs that exhibited the greatest fold changes in expression levels between 2nd and 4th instar nymphs, the transcripts seemingly most abundantly up-regulated in the former relative to the latter comprised a cuticular protein, perhaps associated with the molting process (see Table 3). Those seemingly more abundantly expressed among 4th instar nymphs were usually digestion or stress-response related. Transcripts predominantly expressed in females included a juvenile hormone acid O-methyltransferase and a muscle component, troponin C. Transcripts that suggested male-dominant expression patterns included an odorant binding protein, which may be involved in mate recognition. No clear functional trend seemed evident among genes whose fold changes suggested more abundant expression in adults relative to nymphs, but in the reverse case, transcripts were often associated with cuticular tissues. In all three comparisons, transcripts with the greatest perturbations in expression fold changes included a number of distinct uncharacterized proteins. Gene expression patterns for a subset of these genes were validated experimentally using quantitative real-time PCR, and these results generally corroborated findings suggested by RNA-Seq data (see Figure 2). Table 3. Gold-tier transcripts exhibiting the greatest gene expression fold changes within each of the three harlequin bug mRNA population comparisons performed. Expression levels were conveyed using the Transcripts Per Million (TPM) measure, and binary logarithms were used to rank sample-specific ratios of these amounts. A floor on TPM values of 5.0 was imposed. Forty-one H. halys and 44 M. histrionica glutathione S-transferase associated transcripts encode 35 and 34 unique protein sequences, respectively. A phylogeny of these genes' unique translation products is shown in Figure 3. Four distinct clades corresponding to GST class were observed: Theta, Delta/Epsilon, Sigma, and microsomal (which was positioned within the Sigma clade). Although members of the Delta/Epsilon classes are not distinguished in annotations made available by the NCBI, the observation that Epsilon-type GSTs are exclusive to the Holometabola [18] suggests genes from these two Paraneopteran species are of the Delta type. The majority of protein sequences were placed in the Sigma clade. Two H. halys prostaglandin E synthase isoforms were observed and subsumed under the phylogeny's microsomal GST glade; no harlequin bug homologs to these enzymes were evident, however. Thirty-one of the 44 unique M. histrionica GST associated protein sequences could be reliably splice-aligned to the H. halys genome, but only 19 of these aligned to unique loci. After further accounting for isoforms, an additional 14 GST homologs were identified: one Theta, two Delta, eight Sigma and three microsomal GSTs. Homologs of GSTs from the Zeta and Omega subclasses were not apparent in these data. Forty-one H. halys and 44 M. histrionica glutathione S-transferase associated transcripts encode 35 and 34 unique protein sequences, respectively. A phylogeny of these genes' unique translation products is shown in Figure 3. Four distinct clades corresponding to GST class were observed: Theta, Delta/Epsilon, Sigma, and microsomal (which was positioned within the Sigma clade). Although members of the Delta/Epsilon classes are not distinguished in annotations made available by the NCBI, the observation that Epsilon-type GSTs are exclusive to the Holometabola [18] suggests genes from these two Paraneopteran species are of the Delta type. The majority of protein sequences were placed in the Sigma clade. Two H. halys prostaglandin E synthase isoforms were observed and subsumed under the phylogeny's microsomal GST glade; no harlequin bug homologs to these enzymes were evident, however. Thirty-one of the 44 unique M. histrionica GST associated protein sequences could be reliably splice-aligned to the H. halys genome, but only 19 of these aligned to unique loci. After further accounting for isoforms, an additional 14 GST homologs were identified: one Theta, two Delta, eight Sigma and three microsomal GSTs. Homologs of GSTs from the Zeta and Omega subclasses were not apparent in these data. Ninety H. halys and 88 M. histrionica carboxylesterase associated transcripts encode 82 and 70 unique translation sequences, respectively. A maximum likelihood phylogeny of these unique translation products is shown in Supplementary Figure S3. A preponderance of sequences were phylogenetically positioned in a clade containing carboxylesterases proper, comprising carboxylesterase 4A-like, carboxylesterase 5A-like and β-esterase clades. The large β-esterase clade contains venom carboxylesterase 6-like enzymes, as well as E4 and FE4 esterases. Acetylcholinesterase-, neurotactin-and neuroligin-associated sequences were all basal to the large carboxylesterase clade.
Fifty-five of the 70 unique M. histrionica carboxylesterase associated transcripts map to the H. halys genome, suggesting two acetylcholinesterase (four mapped transcripts), three neuroligin (11 mapped transcripts), one neurotactin (three mapped transcripts) and 19 carboxylesterase (37 mapped transcripts) homologs. All fifteen unique, unmapped M. histrionica carboxylesterase associated transcripts are located within the carboxylesterase clade, perhaps suggesting a lineage-specific expansion of these genes in harlequin bug (see Supplementary Figure S3).
Some H. halys scaffolds contain multiple esterase genes in close proximity, specifically scaffolds 261 and 2592, which comprise tandem arrays of eleven and six esterase genes, respectively. On both scaffolds 261 and 2592, M. histrionica esterase proteins mapped to three distinct, homologous esterase H. halys loci, suggesting a potentially elevated esterase gene copy number in this latter species relative to the former (see Figure 4).  H. halys scaffolds 261, 2592, 159, 1429 and 116 contain, respectively, 11, 6, 6, 4 and 2 carboxylesterase genes in close proximity. On both scaffolds 261 and 2592, M. histrionica carboxylesterase associated transcripts map to three distinct, homologous H. halys carboxylesterase associated loci (see Figure 4). All other M. histrionica carboxylesterase associated transcripts map to H. halys scaffolds containing just one carboxylesterase associated loci. associated transcripts are located within the carboxylesterase clade, perhaps suggesting a lineage-specific expansion of these genes in harlequin bug (see Supplementary Figure S3). Some H. halys scaffolds contain multiple esterase genes in close proximity, specifically scaffolds 261 and 2592, which comprise tandem arrays of eleven and six esterase genes, respectively. On both scaffolds 261 and 2592, M. histrionica esterase proteins mapped to three distinct, homologous esterase H. halys loci, suggesting a potentially elevated esterase gene copy number in this latter species relative to the former (see Figure 4).
H. halys scaffolds 261, 2592, 159, 1429 and 116 contain, respectively, 11, 6, 6, 4 and 2 carboxylesterase genes in close proximity. On both scaffolds 261 and 2592, M. histrionica carboxylesterase associated transcripts map to three distinct, homologous H. halys carboxylesterase associated loci (see Figure 4). All other M. histrionica carboxylesterase associated transcripts map to H. halys scaffolds containing just one carboxylesterase associated loci. A total of 133 novel cytochrome P450 sequences in M. histrionica were assigned names based on their relationship to named P450s from other insects. Many were orthologs to P450s of Halyomorpha halys. Every P450 family in brown marmorated stink bug was found in harlequin bug except CYP3228A1. This sequence was searched for specifically and found as a pseudogene fragment. An NJ tree was made using Clustal Omega using midpoint rooting ( Figure 5). The four insect clans are present. The CYP2 clan and the mito clans are small, containing all of the Halloween genes and some other highly conserved genes like CYP15, CYP18, CYP301 and CYP305. The CYP3 clan is large containing about half of the sequences with a gene bloom in the CYP6LV subfamily. The CYP4 clan has about one quarter of the total sequences. These two clans contain the least conserved genes with the exception of the CYP4G genes (cuticular hydrocarbon synthesis). A total of 133 novel cytochrome P450 sequences in M. histrionica were assigned names based on their relationship to named P450s from other insects. Many were orthologs to P450s of Halyomorpha halys. Every P450 family in brown marmorated stink bug was found in harlequin bug except CYP3228A1. This sequence was searched for specifically and found as a pseudogene fragment. An NJ tree was made using Clustal Omega using midpoint rooting ( Figure 5). The four insect clans are present. The CYP2 clan and the mito clans are small, containing all of the Halloween genes and some other highly conserved genes like CYP15, CYP18, CYP301 and CYP305. The CYP3 clan is large containing about half of the sequences with a gene bloom in the CYP6LV subfamily. The CYP4 clan has about one quarter of the total sequences. These two clans contain the least conserved genes with the exception of the CYP4G genes (cuticular hydrocarbon synthesis).
A number of P450-encoding transcripts exhibited possible sex-preferential gene expression patterns. Five exhibited gene expression patterns suggestive of female-specific expression among adult insects, and seven suggested a male-dominant pattern (see Table 4). Table 4 also presents the seven PUTs having expression values of at least 5 TPM in both male and female imagoes, and exhibited at least a two-fold expression differential per RNA-Seq data-among these, six (~86%) suggested a male-dominant pattern.
In searching for M. histrionica terpene biosynthetic genes, a total of 19 sequences were identified using query sequences for the MVA and JH biosynthetic pathways from various insects ( Table 5). These sequences were orthologs to respective genes in H. halys and included transcripts of all six genes of the MVA pathway (acetoacetyl-CoA thiolase, HMG-CoA synthase, HMG-CoA reductase, mevalonate kinase, phosphomevalonate kinase and diphosphomevalonate decarboxylase) and an IDP isomerase. Furthermore, two IDS homologs of FDP synthase-like genes were observed, one of which exhibited higher expression levels in adults and male bugs (Table 6). In the JH pathway branch, multiple isoforms of farnesol dehydrogenase (3), farnesal dehydrogenase (2), and farnesoic acid methyltransferase (3) were found, but expression values for several of these isoforms were low at all stages of development (Table 6). A single transcript for methyl farnesoate epoxidase was identified; however, no homolog of farnesyl diphosphatase could be observed. While transcripts of MVA pathway genes did not show major developmental or sex specific expression differences, more distinct expression patterns were found for isoforms of JH biosynthetic genes (farnesol dehydrogenase, farnesal dehydrogenase, farnesoic acid methyltransferases) dependent on instar, nymphal/adult stage or sex. As in M. histrionica, MVA pathway gene orthologs in H. halys were equally expressed among sexes and developmental stages, while expression profiles of orthologs in JH biosynthesis differed dependent on the isoform. These expression patterns were, however, not consistent with those found in harlequin bug. Interestingly, a total of four FDP synthase-like transcripts were detected in H. halys as opposed to only two such sequences in M. histrionica. A number of P450-encoding transcripts exhibited possible sex-preferential gene expression patterns. Five exhibited gene expression patterns suggestive of female-specific expression among adult insects, and seven suggested a male-dominant pattern (see Table 4). Table 4 also presents the seven PUTs having expression values of at least 5 TPM in both male and female imagoes, and exhibited at least a two-fold expression differential per RNA-Seq data-among these, six (~86%) suggested a male-dominant pattern.   Transcripts appearing to encode proteins involved in RNAi-related processes were detected, including such RISC-associated components as Dicer, Loquacious and Argonaute, as well as homologs of Aubergine, Tarbp2, various RNA helicases and PIWI-containing proteins. No PUTs with evident homology to sid-1 or sid-2 were detected, nor were any apparent RNA-dependent RNA polymerases observed.
No evidence of an iflavirus homologous to that previously observed in the brown marmorated stink bug was apparent in harlequin bug, and only very weak evidence supporting the possible presence of a Nosema sp. microbe was present: only seven PUTs corresponded to genes observed in the Nosema genus, and these were predominantly associated with retrotransposable elements. In contrast, 653 and 67 PUTs exhibited high levels of sequence similarity with known Pantoea and Erwinia genes, respectively, consistent with the notion that these genera likely comprise the primary microbial endosymbionts of M. histrionica, as well as several other pentatomids [57,58].

Discussion
The increasing importance of stink bug pests is concurrent with the reduction in foliar insecticides in major, predominantly transgenic field crops, and the invasion of several major stink bug pests into new continents. Panizzi et al. [59] listed 25 "important" pest species in 17 genera, and about 41 "less important" pests in 16 additional genera, infesting a wide variety of agronomic and horticultural crops and other useful plants. Since that review, two of those pests hitherto considered less important, H. halys and Bagrada hilaris, have become vastly more important following continental-scale invasions, and a number of other species are potentially serious threats for invasion [59,60]. Complicating management is the fact that several predatory pentatomid species in the subfamily Asopinae are important and widespread natural enemies in agroecosystems [61]. All of these trends have prompted increasing interest in new management tools as well as avoidance of insecticide resistance in this important group.
The current study addressed the harlequin bug gene space from both quantitative and qualitative perspectives. That only one biological replicate's transcriptome was sequenced implies that any inferred fold changes in gene expression per RNA-Seq data are at most suggestive of differential expression (DE), but these cannot support the notion of true DE with any statistical significance. However, a subset of transcripts exhibiting high fold changes was further probed using quantitative real-time PCR combined with rigorous statistical analyses-for these cases, DE could be asserted, and these findings represent compelling targets on which to perform RNAi-based reverse genetics. The lack of biological replication among RNA-Seq data per se has no impact on qualitative results presented herein; in particular, these qualitative findings afford useful insights into the inherent enzymatic capacity of the harlequin bug to detoxify xenobiotic compounds and to synthesize terpenoid compounds.
M. histrionica appears to possess four distinct Delta-class GST genes, although these resolve to two pairs, each of which contain two genes exhibiting very high levels of sequence similarity. Whether these correspond to recent duplication events or to potential RNA sequencing and assembly artifacts will require exploring the genome of M. histrionica, which is beyond the scope of this study. The H. halys genome seems to encode two Delta-type GSTs. These counts are lower than what has been seen in other hemipterans: M. persicae has eight and Acyrthosiphon pisum has ten such genes [62]. They are also lower than what has been observed in the Dipterans Drosophila melanogaster (11) and Anopheles gambiae (12) [63,64], although slightly higher than that seen in the Hymenopteran species, Apis mellifera (one; [23]). The low count of Delta-type GSTs in both harlequin bug and brown marmorated stink bug makes them appealing as knockdown/knockout targets for transcriptional disruption. Their potential utility may, however, be overshadowed by the generally highly conserved nature of genes from the Delta (as well as Epsilon) cytosolic GST subtypes, thereby incurring risks of off-target effects in practical applications. Transcriptome sequencing and analysis of closely related insect taxa should help clarify the extent to which such off-target gene silencing risks might manifest in unanticipated ecological damage.
Some H. halys scaffolds contain multiple β-esterase genes in close proximity, specifically scaffolds 261 and 2592, which comprise tandem arrays of eleven and six β-esterase genes, respectively. On both scaffolds 261 and 2592, M. histrionica β-esterase proteins mapped to three distinct, homologous β-esterase H. halys loci. This may suggest a potentially elevated β-esterase gene copy number in H. halys when compared to M. histrionica. However, fifteen unique M. histrionica carboxylesterase-associated protein sequences that did not map to the H. halys genome via GenomeThreader analysis are all located within the carboxylesterase clade (Supplementary Figure S3). These unmapped M. histrionica carboxylesterase-associated protein sequences may represent multiple M. histrionica-specific carboxylesterases. Currently, with only M. histrionica transcript data, it is not possible to identify novel gene duplication or to identify divergent homologs.
Inspection of genome sequence organization hints towards an increase in copy number of H. halys β-esterase genes through tandem gene duplication, which may have been effected by chromosomal slippage during DNA replication, for example. Results considered herein suggest that tandem duplication of β-esterase genes is also present in M. histrionica, albeit to a lesser extent than observed in H. halys. Whether such shared duplications occurred in a common ancestor or were realized as independent events subsequent to speciation remains uncertain, and will require genome sequence analysis of a broader taxonomic sampling of pentatomid species to resolve.
It is possible that β-esterase gene duplication is a common method of generating insecticide resistance in the Hemiptera. Gene duplication and over expression of β-esterases has been observed as one of the primary insecticide resistance mechanisms in M. persicae [65,66]. The presence of numerous carboxylesterase genes being actively transcribed in cultures of H. halys and M. histrionica, neither of which have been exposed to insecticides for at least 30 and 10 generations, respectively, suggests these constitute a standing defense against insecticide and/or other toxin exposure. Future investigation of differential expression between insecticide exposed and non-exposed groups could elucidate gene expression regulation associated with insecticide exposure. Identification of up regulated genes in response to insecticide exposure could help narrow the field of potential β-esterase gene targets, as well as identify regulation mechanisms that may also be of interest for combatting insecticide resistance.
Harlequin bug cytochrome P450s are most similar to brown marmorated stink bug (BMSB) P450s. BMSB has 128 named P450s in 26 families and 11 fragments that were too short to name. Harlequin bug has 87 named P450s and 17 fragments in 25 families. Sixty-three (72%) were considered orthologs. The differences occurred in gene clusters where there was not a 1:1 relationship between the species. The BMSB had more P450s in some large gene blooms like CYP6LV (20:9) and CYP6LT (8:4). One CYP3 clan family CYP3228A1 was found in BMSB but not in harlequin bug. Only two new subfamilies were present in harlequin bug CYP3226C and CYP4KC1. The harlequin bug data are based on transcriptome sequences. If some genes were not expressed they would not be represented. This may explain some of the differences. The large number of genes in the CYP3 (43 genes) and CYP4 (30 genes) clans provides the raw material for developing pesticide resistance. The genes in the CYP2 and mito clans might be good targets to block molting (Halloween genes and CYP18) and juvenile hormone synthesis (CYP15). Inhibition of the CYP4G P450s could act to block or reduce cuticular hydrocarbon synthesis, though this subfamily has six genes so inhibition might be difficult to achieve. Differential P450 expression in antennae could identify potential pheromone metabolizing P450s. Although the mating process in M. histrionica is not exclusively controlled by olfaction, but also via behavioral cues including visual and vibrational signals, inhibitors of P450s may nonetheless have utility in disrupting successful mating.
In the core MVA terpenoid biosynthetic pathway of M. histrionica and H. halys, all enzymes except for acetoacetyl-CoA synthase appear to be represented by single-copy genes, which is consistent with MVA pathways from other insects (e.g., [67][68][69]). On the contrary, enzymes involved in the conversion of FDP to farnesoic acid in JH biosynthesis are known to be encoded by families of genes with expression in different tissues based on the importance of farnesol and farnesal homeostasis for a variety of cellular functions [70]. Accordingly, we found isoforms of M. histrionica and H. halys farnesol and farnesal dehydrogenase; however, we were unable to identify orthologs of farnesyl diphosphatase. Furthermore, in agreement with findings from other insects [70], harlequin bug and brown marmorated stink bug methyl farnesoate epoxidase appears to be represented by a single-copy gene transcript; unexpectedly, we found several potential isoforms for juvenile hormone acid methyltransferase. To build a biosynthetic branch for the formation of murgantiol, it is possible that the pathway is initiated by one of the two identified FDP synthases because of the higher male specific expression of this gene (at least for M. histrionica). It can further be assumed that P450 enzymes with similarity to methyl farnesoate epoxidase are involved in the final epoxidation step in murgantiol biosynthesis because of the same position of the epoxy group at C10-C11 in both pheromone and JH molecules. Further comparative expression profiling paired with gene functional analysis will be required to elucidate the murgantiol biosynthetic pathway in M. histrionica.

Conclusions
This study generated the first high-volume collection of transcriptome sequence data for the important insect pest Murgantia histrionica. A number of gene targets for transcriptional disruption were identified, which will require further experimental study in an in vivo context. These genes include elements of three families implicated in the development of resistance to synthetic insecticides (glutathione S-transferases, carboxyesterases and cytochrome P450s), as well as genes involved pheromone biosynthesis pathways. These data will be of further utility in studies addressing additional gene families, in performing comparative transcriptomics among various insect species, and in supporting gene annotation efforts for any possible genome sequencing efforts made specifically for the harlequin bug.