Transcriptome-informed identification and characterization of Planococcus citri cis- and trans-isoprenyl diphosphate synthase genes

Summary Insect physiology and reproduction depend on several terpenoid compounds, whose biosynthesis is mainly unknown. One enigmatic group of insect monoterpenoids are mealybug sex pheromones, presumably resulting from the irregular coupling activity of unidentified isoprenyl diphosphate synthases (IDSs). Here, we performed a comprehensive search for IDS coding sequences of the pest mealybug Planococcus citri. We queried the available genomic and newly generated short- and long-read P. citri transcriptomic data and identified 18 putative IDS genes, whose phylogenetic analysis indicates several gene family expansion events. In vitro testing confirmed regular short-chain coupling activity with five gene products. With the candidate with highest IDS activity, we also detected low amounts of irregular coupling products, and determined amino acid residues important for chain-length preference and irregular coupling activity. This work therefore provides an important foundation for deciphering terpenoid biosynthesis in mealybugs, including the sex pheromone biosynthesis in P. citri.


INTRODUCTION
Terpenoids are a large class of metabolites produced by organisms in every branch of life.In insects, terpenoids have important roles in development (e.g., juvenile and moulting hormones) and as semiochemicals in intra-as well as interspecific interactions (e.g., sex, aggregating, alarm, dispersal, maturation, anti-aphrodisiac or trail pheromones, and defense compounds). 1,2Of special interest are the terpenoid sex pheromones produced by some mealybug and scale insect species (order Hemiptera, Superfamily Coccoidea).Sex pheromones are chemical signals that mediate communication between males and females of a species and are important for mating.In Coccoidea, they are produced by females to facilitate male navigation during mating and are thus emitted by virgin females, with cessation of production after mating. 3The majority of mealybug (family Pseudococcidae) and armoured scale insect (family Diaspididae) sex pheromones identified to date are irregular terpenoid compounds, mostly esterified mono-or sesquiterpenoid alcohols and carboxylic acids. 3,46][7][8] Although an extensive body of research has been dedicated to the identification and study of mealybug and scale insect sex pheromones, the biosynthesis of their irregular backbone remains elusive.
IDSs can be classified as either cisor trans-, depending on the stereochemistry of the double bond in the reaction product.The two classes are evolutionarily and structurally distinct.Trans-IDSs adopt an a-fold and contain two conserved aspartate-rich motifs known as FARM (first ll OPEN ACCESS aspartate-rich motif) and SARM (second aspartate-rich motif), which mainly occur as ''DDxx(x)D''.Cis-IDSs, however, lack distinct conserved motifs and adopt the z-fold. 9Aspartate-rich motifs in trans-IDSs coordinate the Mg 2+ ions important for the generation of the carbocation in the allylic substrate (DMAPP, GPP, FPP, or other), which can be attacked by IPP to form the new carbon-carbon (C-C) bond in the alkylation reaction.The coupling step catalyzed by IDSs can be also classified based on the orientation of the alkylation reaction.Most IDS enzymes catalyze the formation of regular 1 0 -4 (head-to-tail) C-C bond between the allylic substrate and IPP, while irregular head-to-middle reactions have been described for some cisand trans-IDS enzymes, coupling two DMAPP units or DMAPP with an allylic substrate, forming branched or cyclic polyprenyl chains. 10To date, only a few enzymes with irregular coupling activity have been identified from plants (e.g., [11][12][13][14] ), bacteria (e.g., 15,16 ), and archaea (e.g., 17,18 ).
Planococcus citri (Risso, 1813), the citrus mealybug, is a small sucking insect.It is a widespread pest of numerous crops and ornamental plants, causing serious damage and economic losses. 19Its sex pheromone, planococcyl acetate, has a cyclobutane backbone, presumably resulting from irregular IDS coupling.1][22][23] Alternative management strategies, such as mating disruption, demand larger quantities and, therefore, require low-cost, scalable production methods. 3Biotechnological production of sex pheromones has been demonstrated recently for lepidopteran pheromones 24,25 and biotechnological synthesis has been proposed as a sustainable route for large-scale, low-cost pheromone production.However, these methods require knowledge of the biosynthetic pathways.
To that end, we performed a comprehensive search of the P. citri genome, supported by newly generated short-and long-read sequence datasets of P. citri females, to identify candidate cisand trans-IDSs.Eleven of the identified IDS sequences were tested for regular and irregular coupling activity, with a focus on the production of short-chain backbones.We confirmed regular coupling activity for five trans-IDS candidate enzymes, one of which also produced small amounts of irregular prenyl diphosphates, lavandulyl and maconellyl diphosphates.Our work contributes to the understanding of mealybug terpenoid biosynthesis providing a foundation for its biotechnological exploitation.

Planococcus citri short-and long-read transcriptome resource
To complement the available P. citri genomic sequence data (Pcitri.v1genome assembly, deposited at MealyBugBase 26 ) we generated short and long transcriptome reads by performing RNA-seq and Iso-seq.With the transcriptome data, we aimed to obtain reliable coding sequence information at high depths, enabling a comprehensive search for IDS candidates expressed in P. citri or symbiotic organisms.Additionally, virgin and mated female-specific transcriptomic data were generated to perform differential expression analysis and detect possible pheromone synthesis-related changes in gene expression and therefore support the characterization of putative IDSs involved in sex pheromone biosynthesis.This was based on the hypothesis, that sex pheromone biosynthesis genes are expressed in virgin females only, as they are also the ones producing the pheromone compounds. 27hort Illumina reads from virgin and mated P. citri females generated in this study were de novo assembled on its own (Figure S1A), and together with other available P. citri short-read sequence data, including reads from P. citri males (Figure S1B).Both assemblies were combined, resulting in 389,962 unique sequences (Figure S1C).After the merge with the long-read transcriptome (Figure S1D), the full consolidated set contained 440,881 unique transcript sequences (Figure S1E), with an average length of 1,356 nucleotides and N50 of 2,908 nucleotides (Table S1).On average, around 80% of short reads from each sample mapped back to de novo transcriptome assembly (Table S2).To retain as much variability as possible, we did not do any further assembly thinning and the presented dataset is, therefore, redundant.Approximately 53% of the transcripts mapped to the Pcitri.v1genome, with a third of them mapping to the annotated Pcitri.v1gene models.Completeness assessment with BUSCO 28 found 98.7% complete BUSCOs for the consolidated transcriptome, the majority of which were duplicated (Figure S2).Transcript IDs with their InterPro annotations, information on mapping to Pcitri.v1genome, and virgin versus mated female differential expression values are available in Table S3A, as well as on FAIRDOMHub along with a fasta file with all sequences from the consolidated transcriptome (see Data and code availability).
Due to the elusive nature of terpenoid biosynthesis in insects, it has been hypothesized that the origin of some terpenoids (or their precursors), might not be the insects themselves, but rather endosymbiotic bacteria or even plant food. 29For a comprehensive analysis of the biosynthetic capacity for terpenoids found in P. citri, our transcriptomic resource, therefore, contains transcripts of all taxonomic origins, as we elected not to exclude non-insect reads.Taxonomic analyses revealed that out of 207,753 protein sequences determined from the consolidated transcriptome, 26% were unclassified, with the majority of classified sequences assigned to insects (Figure S3).2,641 sequences were assigned to bacteria, 1,010 of which correlated best with sequences from Candidatus Moranella endobia, a secondary endosymbiont of P. citri.The acquired data also includes 2,364 viral, 1,629 plant, and 4,299 fungal sequences.Of the plant sequences, most were assigned to potato (Solanum tuberosum), probably originating from the P. citri food source.Of the fungal sequences, most were assigned to Wallemia mellicola, a cosmopolitan fungus with air-disseminated spores. 30

Identification of putative IDS coding sequences
To find sequences with homology to cisand trans-IDS enzymes, we searched Pcitri.v1genome assembly as well as the P. citri consolidated transcriptome (Figure S1E) for sequences with assigned InterPro family annotations IPR001441 and IPR036424 (for cis-IDS sequences) or IPR000092 (for trans-IDS sequences).On the Pcitri.v1genome assembly we also performed MAST motif search.All extracted sequences (Tables S4A and S4B) were collapsed into clusters based on their sequence similarity to manually inspect possible assembly or sequencing errors and determine the most probable consensus sequence.This resulted in a final number of 30 putative IDS sequences (Table S4C).
Newly acquired long-and short-read data provided alternative transcript sequences to matching Pcitri.v1gene models of the same cluster for seven IDS sequences as well as information on possible sequence variability (Figures 1 and S4).For five candidates (cisIDS1, transIDS4, transIDS5, transIDS11, and transIDS12) we were able to confirm the sequences predicted by the transcriptome data with amplification (Figure 1).Two different amplicons were obtained in the case of transIDS4 (Figure 1A), one identical to the gene model and one with N-terminal truncation and an insertion of 25 amino acids.For transIDS5 (Figure 1B), the gene model was missing 283 amino acids at the N-terminus.For transIDS11 (Figure 1C), we amplified a sequence with a frameshift close to the C-terminus resulting in a 21 amino acids long C-terminus truncation.In the case of transIDS12 (Figure 1D), the structure of the amplified sequence differed significantly from the annotated gene models.Additionally, two variations of the same full-length transIDS12 sequence were amplified, differing in a single amino acid position: L328 versus Q328 (Figure 1D).We also detected other single nucleotide polymorphisms (SNPs).For example, the translated amplicon of cisIDS1 has Leu (L327) instead of His at position 327 (H327), the latter predicted in the gene model, while the transcripts predicted from the de novo assembly Multiple sequence alignments for clusters of overlapping transcripts and gene models for candidate sequences transIDS4 (A), transIDS5 (B), transIDS11 (C), and transIDS12 (D).Each cluster includes translated coding sequences obtained from Pcitri.v1 gene models (in black text), long-read transcripts (in blue text), and transcripts de novo assembled from short reads (in orange text).All sequences are available in Table S4C.Sequences confirmed with amplification from cDNA and Sanger sequencing are marked with black dots.Color-coded sequence similarity is given in the legend in the bottom left.The alignment around the L328 versus Q328 variation is shown below the full-length alignments for transIDS12 in (D).Multiple sequence alignment was done in MEGAX using the MUSCLE algorithm and visualized with Geneious software.
or fully sequenced all have L327.Examining the mapping of short reads to the genome assembly, we observed that approximately 80-90% of the short reads correspond to L327 and the rest to H327.
Eight of the identified IDS sequences most likely originated from other species: transIDS14 has only one mismatch to an FPPS (farnesyl diphosphate synthase) from potato, while transIDS15, cisIDS7, and cisIDS10 have 100% identity to different IDS sequences from the fungus Wallemia mellicola (Table S4C).TransIDS6, transIDS7, cisIDS4, and cisIDS6 have homology to insect IDS sequences.However, as determined by phylogenetic analysis (Figures S5-S8), these grouped with homologous sequences from Chalcidoidea wasps (Hymenoptera).Additionally, one sequence can be assigned to a P. citri symbiont: cisIDS8 is identical to a protein annotated as a UPPS (undecaprenyl diphosphate synthase) from Candidatus Moranella endobia.

In silico analysis reveals diverse putative functions of candidate P. citri IDSs
We further analyzed the sequence features and phylogeny of identified P. citri IDS sequences, which suggested their potential functions (Table 1; Figure 2 for trans-IDS candidates).Two candidate sequences (transIDS2, transIDS4) are similar to decaprenyl diphosphate synthases (DPPSs), long-chain IDSs that catalyze the formation of the ubiquinone prenyl sidechain, important in aerobic cellular respiration.The functional DPPS enzyme forms a heterodimer of subunits 1 and 2, of which subunit 1 has a higher sequence homology to IDSs, while subunit 2 acts as a regulatory subunit. 31,32TransIDS2 is similar to sequences of DPPS subunit 1 (Figures 2 and S5), while transIDS4 lacks the D-rich motifs and is similar to sequences of DPPS subunit 2 (Table 1, and Figure S10).
TransIDS3 and transIDS5 have high similarity to FPPSs (Table 1; Figures 2 and S6).Additionally, there are seven sequences (transIDS8, tran-sIDS9, transIDS10, transIDS11, transIDS12, transIDS13, transIDS18) with similarity to FPPS-like sequences from other Coccoidea species (Figure S11).All seven of them have a canonical FARM (DDxxD), but not SARM (Table 1).Phylogenetically, the FPPS-like sequences are separated from the mealybug FPPS sequences with low sequence similarity between the two groups (Figure S11).Similar diversification pattern of IDS genes within Coccomorpha has been indicated previously. 36TransIDS3 and transIDS5, as well as transIDS12 and transIDS13 also have high pairwise sequence similarity (78% and 75%, respectively), indicating a possibility of more recent duplication events.
TransIDS16 is similar to geranylgeranyl diphosphate synthases (GGPPS), catalyzing the formation of the diterpenoid precursor (Figures 2  and S12).Sequences of transIDS1, transIDS19, and transIDS20 do not have canonical D-rich motifs.Additionally, transIDS1 is similar to E3  S4C) and are marked with an asterisk if the differential expression was statistically significant (pAdj >0.05) for at least one of the sequences in each cluster."/" -activity test not performed.Asterisk in the "Candidates" column denotes the sequence originating from Candidatus Moranella endobia.DPPS -decaprenyl diphosphate synthase, FPPS -farnesyl diphosphate synthase, GGPPS -geranylgeranyl diphosphate synthase, DHPPS -dehydrodolichyl diphosphate synthase.
ubiquitin protein ligases, while we could not find any similarity matches for transIDS19 and transIDS20.TransIDS17 is similar to otubain-like ubiquitin thioesterases (Figure S13) and includes a DDxxD motif aligning with FARM, as do all other otubain-like orthologs identified from mealybugs and included in the phylogenetic analysis (Figure S13).Out of the cis-IDS candidates, cisIDS1, cisIDS2, cisIDS3, and cisIDS5 are homologous to the catalytic subunit of heterodimeric dehydrodolichyl diphosphate synthase (DHPPS) (Figures S7 and S14), involved in the biosynthesis of dolichol phosphate, the lipid carrier of glycosyl units used for N-glycosylation in eukaryotes.Candidates cisIDS2, cisIDS3, and cisIDS5 have some closer orthologs within the Coccoidea, however, they are very distant to other DHPPS sequences and even cisIDS1 (Figure S14).On the other hand, cisIDS9 has highest similarity to sequences of the DHPPS regulatory subunit (Figure S8).

Regular IDS activity was shown for five and irregular activity for one P. citri candidate
With the aim of identifying enzymes involved in the synthesis of the irregular C10 sex pheromone, we tested the activity of P. citri IDS candidates for producing short polyprenyl chains.We selected 11 P citri candidate sequences with homology to transor cis-IDSs for which the full sequence was confirmed with long-read transcriptome sequencing (Tables 1 and S4C).Two of these sequences (transIDS3 and tran-sIDS10) were synthesized according to long-read sequencing data, and nine sequences (transIDS2, transIDS4, transIDS5, transIDS11, tran-sIDS12, transIDS16, transIDS17, cisIDS1, and cisIDS8) were amplified from P. citri cDNA (Table S4C).All sequences were expressed in E. coli and purified recombinant proteins were tested in vitro to identify regular and irregular IDS catalytic activity forming C10 or C15 backbones (see Figure S15 for SDS-PAGE of expressed and purified candidates).For transIDS4 and transIDS12, two amplicons were obtained, cloned, and tested (Figures 1A and 1D, and Table S4C).
Five trans-IDS-like proteins (transIDS2, transIDS3, transIDS5, transIDS11, and transIDS17) displayed regular IDS activity (Table 1, and Figure 3A).The products of transIDS2, transIDS5, transIDS11, and transIDS17 were identified as geranyl and farnesyl diphosphates based on chromatographic behavior and mass spectra of diphosphates and their dephosphorylated alcohol derivatives (Figures 3A, 3C, and S16-S24).Candidate transIDS3 had an unusual pattern of regular prenyl diphosphate products.It formed two isomers of C10 prenyl diphosphate, geranyl and iso-geranyl diphosphates, that were further elongated to two C15 isomers: farnesyl and iso-farnesyl diphosphate (Figures 3C and  S16-S21).Both isomers were identified based on the MS reference library spectra.Specific activities of transIDS5 were significantly higher than The predicted functions are shown in colored rectangles (DPPS, decaprenyl diphosphate synthase; GGPPS, geranylgeranyl diphosphate synthase; FPPS, farnesyl diphosphate synthase).The maximum-likelihood tree is drawn to scale, with branch lengths measured in the number of substitutions per site (scale on the bottom left).Bootstrap values (1,000 replicates) are shown next to nodes.We included all P. citri trans-IDS candidates, except for transIDS17, a predicted ubiquitin thioesterase, and examples of IDS sequences from Hemiptera species, which were characterized in vitro.The latter include sequences from aphids (Megoura viciae, 32,33 Aphis gossypii, 31 Myzus persicae, 34 and Acyrthosiphon pisum 35 ) and from shield bugs (Halyomorpha halys, 36 Nezara viridula, 36 and Murgantia histrionica. 37See Figure S9 for the alignment of P. citri trans-IDS sequences included in the phylogenetic tree.S5.Note the y axis break due to the much higher activity of transIDS5.We did not detect any C10 or C15 products with other tested candidates (Table 1).Chromatograms in (B) are also shown in Figure S16.MS spectra of highlighted peaks are given in Figure S17.Chromatograms and MS spectra of products obtained with other candidates are available in Figures S16-S27.In (C), dimethylallyl diphosphate (DMAPP) can be joined to its isomer isopentenyl diphosphate (IPP) to form regular geranyl and iso-geranyl diphosphates (GPP and iso-GPP) and isomers of farnesyl diphosphate (FPP).FPP is a precursor for juvenile hormone biosynthesis, as denoted by the dashed arrow.Alternatively, two DMAPP units can be assembled into irregular lavandulyl diphosphate (LPP) and maconellyl diphosphate (MPP).The structure of planococcyl acetate (PAc), the sex pheromone of P. citri, is shown in brackets and is presumed to result from the coupling of two DMAPP units as well.
those of the other enzymes, with transIDS3 also exhibiting a relatively high activity for the production of C10 prenyl diphosphates (Figure 3A, and Table S5).Both transIDS3 and transIDS5 sequences were expressed in P. citri females with transIDS3 also differentially expressed between virgin and mated females with higher expression in virgin females (Tables 1 and S4C).
Moreover, candidate transIDS5 also had a low level of irregular IDS activity, if supplied with DMAPP only (Figure 3C).The enzyme produced two compounds, which were identified as lavandulyl (LPP) and maconelliyl diphosphate (MPP), based on GC-MS data of their dephosphorylated derivatives (Figures S25-S27).
To assess the activities of candidate proteins in vivo in the eukaryotic cell environment, the candidate IDS sequences were transiently expressed in the plant heterologous expression model Nicotiana benthamiana.Plants utilize two separately compartmentalized pathways for production of C5 isoprene blocks IPP and DMAPP.Mevalonate (MVA) pathway is localized in the cytoplasm, while MEP (methylerythritol phosphate) pathway operates in chloroplasts.Regular monoterpenes in plants derive mainly from the MEP pathway, although the cross-talk between the precursor pools is possible. 38The known irregular IDS enzymes of plant origin (CPPS from Tanacetum cinerariifolium and LPPS from Lavandula x intermedia) are localized to chloroplasts. 11,13In our experiments, the deletion of chloroplast-targeting peptide from these enzymes resulted in activity loss in plant cells.Therefore, each candidate gene was tested in its native form and with the addition of a chloroplast transit peptide.HPLC-MS analysis of leaf extracts did not reveal irregular prenyl diphosphates.Similarly, no production of volatile organic compounds derived from irregular monoterpenoids was detected by GC-MS in agroinfiltrated N. benthamiana leaves.Regular isoprenyl diphosphate synthase activity of candidate proteins could not be assessed in this system due to the presence of native plant GPP and FPP interfering with the detection of additional GPP and FPP possibly formed by heterologously expressed enzymes.

Mutagenesis of transIDS5 active site revealed functional importance of active site residues for its enzymatic activity
We further set to identify the amino acid residues important for the irregular coupling activity of transIDS5 by performing mutagenesis.It was suggested that replacement of the first aspartate (D) in SARM with asparagine (N) played an important role in the evolution of irregular coupling activity. 39However, in transIDS5, analogous mutation (D308N) caused a complete loss of irregular activity.The same was observed after the exchange of two other aspartate residues in SARM (D309N and D312N), and the first aspartate in FARM (D166N).
Moreover, the regular activity of aspartate mutants was affected to different degrees (Figure 4A).The regular activity of D166N mutant was significantly diminished, although the ratio between C10 GPP and C15 FPP was similar to the wild-type (WT) transIDS5 (Figure 4A).D308N and D312N substitutions had an impact on the regular product length.Both mutants performed the initial coupling of IPP and DMAPP to C10 chains at levels comparable to the WT enzyme, but the addition of the second IPP to form the C15 chain was hampered compared to the WT (Figure 4A).The regular activity was completely abolished after the D309N exchange (Figure 4A).
To find additional residues important for the catalytic activity of transIDS5, we build a computational model of its 3D structure (Figure 4B).We identified a positively charged lysine (K) residue in position 120 located at a comparatively short distance from the negatively charged diphosphate moiety of the IPP substrate (Figure 4B).We exchanged this residue with alanine (K120A), polar uncharged glutamine (K120Q), and negatively charged glutamate (K120E).All three substitutions resulted in the loss of irregular coupling activity, as well as in the diminished elongation of GPP (C10) to FPP (C15).As expected, the K120E exchange, introducing a negative charge in the active site, had the most inhibiting effect, precluding FPP formation entirely (Figure 4A).

Heteromeric associations did not change IDS activity
In most cases, transand cis-IDSs are active in the form of homodimers. 402][43][44][45][46] To determine if irregular coupling might depend on the presence of other subunits, we tested P. citri IDS candidates in combinations with potential interactors (Table S6).In the P. citri genome, we identified homologs of the DPPS and DHPPS regulatory subunits (transIDS4 and cisIDS9, respectively) and both were combined with all IDS-like candidates in irregular activity assays (Table S6).In addition, we examined two P. citri non-IDS-like proteins for their IDS-modifying activities: the protein product of g23689.t1,annotated as a pheromone-binding protein (PBP) and highly expressed in virgin females (Table S3B), and g36424.t1,identified as a P. citri homolog of isopentenyl diphosphate isomerase (IDI), an enzyme catalyzing the reversible isomerization of IPP to DMAPP (Table S6).Addition of P. citri-IDI to the reaction with IDS candidates supported the formation of regular products, if either IPP or DMAPP only were added to the reaction.However, none of the tested putative regulatory subunits altered or spurred irregular coupling activity of IDS candidates.
We also tested the hypothesis that heterodimeric complexes consisting of two IDS subunits are required for irregular coupling.Activity assays with DMAPP as the only substrate were carried out with pairwise combinations of P. citri IDS candidate proteins with each other as well as with IDS proteins forming irregular prenyl diphosphates (Table S6).The latter allows for the detection of terpene synthase (TPS) activities of P. citri IDS candidates, converting the prenyl diphosphate substrates into corresponding prenyl alcohols.Combined assays did not reveal any changes to the enzymatic activities of IDS candidates, which also did not demonstrate TPS activity on the tested irregular prenyl diphosphates.

DISCUSSION
Sustainable insect pest management, supported by state-of-the-art knowledge and technology, is an integral part of sustainable food production in a changing climate, in which the geographical ranges of many insect species are expected to widen. 47However, the use of biotechnology and genetic solutions depend on the detailed information about gene function.In this work, we present the results of a comprehensive search for candidate genes coding for isoprenyl diphosphate synthase (IDS) activity from the citrus mealybug, P. citri.The identification of these genes has the potential to enable either biological production of P. citri sex pheromones or gene silencing approaches, both of which provide novel options for pest management (reviewed in the study by Mateos Fernandez et al. 48).
Functional genomics of many insect species is hindered by the lack of well-annotated, high-quality genome sequences. 49,50To improve the available genetic resources for P. citri, 26 we complemented the existing genome data with short-and long-read transcriptome data (Figure S1).The transcriptomic dataset enabled us to predict and amplify alternative transcript sequences compared to the available genome models (Figures 1 and S4).In addition, we found a novel coding sequence (cisIDS5) among the de novo assembled transcripts, for which we were unable to find an aligning gene model in the current P. citri genome assembly.Using all resources, we were able to identify 18 candidate P. citri IDS sequences.Among these were putative DHPPS, FPPS, FPPS-like, GGPPS, DPPS, and DPPS-like coding sequences, as well as a ubiquitin thioesterase-like sequence with a D-rich motif (Table 1).Apart from the search for IDS coding sequences, our virgin and mated female-specific dataset could be useful for the identification of other genes related to mating.Among the differentially expressed genes, we were for example able to identify a putative pheromone-binding protein (g23689.t1).
However, inferring function from sequence analyses alone is inconclusive for IDSs, as even small changes can alter substrate and chainlength specificity. 51To test the activity of IDS candidates, we focused on the detection of C10 and C15 coupling products.We, therefore, did not confirm if the predicted DHPPS, GGPPS, and DPPS had longer chain-producing activities.TransIDS16, a predicted GGPPS did not produce any C10 or C15 products.Similarly, neither of the two tested cis-IDS-like proteins produced detectable levels of regular or irregular C10 or C15 backbones, as both were predicted to be long-chain IDSs, namely DHPPS (cisIDS1) and UPPS (cisIDS8) (Table 1).Interestingly, transIDS2, a predicted long-chain IDS (DPPS catalytic subunit), produced low amounts of FPP and GPP (Figure 3A).This might be attributed to functional promiscuity in vitro, producing polyprenyl chains of different lengths, as reported for other IDS enzymes. 9Additionally, tran-sIDS17, which was not homologous to IDS sequences but rather to otubain-like ubiquitin thioesterases, was able to produce regular C10 and C15 backbones, albeit at the lowest rate of all candidates (Figure 3A).It remains unknown if otubain-like sequences from other mealybug species have IDS activity, and whether this activity has a biological function.S5. (B) Mutated amino acids and their position in the active site, based on a computational model of active center of transIDS5 with two bound substrate (IPP) molecules showing the aspartate of the first (D166) and the second (D308, D309, D312) aspartate-rich motifs and lysine 120 (K120).Oxygen atoms in aspartate side chains and in pyrophosphate moieties (PPi) of the substrates are shown in red; the nitrogen atom of lysine side chain is highlighted in blue.
The two putative FPPS sequences (transIDS3 and transIDS5) both displayed regular coupling activity resulting in C10 and C15 units.Our enzymatic activity and gene expression results indicate that transIDS5 might be the major source of C15 terpenes in P. citri, while the C10 could be mainly composed of transIDS5 and possibly also transIDS3 products.The highest specific activity was measured for transIDS5, also producing low amounts of irregular prenyl diphosphates, LPP and MPP (Figure 3C), which have not been reported in P. citri.The 2-methylbutanoated versions of both compounds have been identified as the sex pheromone of Maconellicoccus hirsutus, the pink hibiscus mealybug. 52Esterified lavandulol compounds act as sex pheromones also in some other Planococcus, Pseudococcus, and Dysmicoccus species. 3However, the irregular coupling activity of transIDS5 only takes place in the absence of IPP and might, therefore, be unlikely to occur in vivo where the presence of IPP is expected.Although we did not find conclusive evidence for the role of transIDS5 in the biosynthesis of the P. citri sex pheromone, we propose that it is the main source of regular sesquiterpenes in P. citri, possibly including the juvenile hormone (Figure 3C).Using structure-informed mutagenesis, we additionally demonstrated the importance of K120, D166, D308, D309, and D312 for the irregular coupling mechanism observed for transIDS5 in vitro (Figure 4A).
Terpenoids are known for their extensive evolutionary divergence, with many instances of lineage-specific pathways and, therefore, metabolites.Despite the occurrence of many common and unique terpenoids in animals, terpenoid diversification is better understood in plants and microbes. 2,53In plants, terpenoid diversity is mainly determined by the activity of TPSs, which have high evolutionary divergence and functional plasticity. 54However, sequences with homology to plant and microbial TPSs have not been identified in insect genomes.Instead, numerous instances of IDS gene-family expansion have been observed, with frequent gene duplications resulting in functional redundancy or neofunctionalization. 36,37,[55][56][57] Among the P. citri IDS sequences, we observed expansions of FPPS-like sequences and DHPPS-like sequences (Figures S11 and S14).The former extends the previously reported FPPS-like diversification within the Coccomorpha. 36Three FPPS-like sequences were successfully cloned and tested (transIDS10, transIDS11, and transIDS12), however, only one candidate (transIDS11) exhibited IDS activity, producing low amounts of GPP and FPP (Figure 3A).Interestingly, while we were not able to confirm GPPS or FPPS activity for transIDS12, its expression was observed to be strongly upregulated in virgin compared to mated P. citri females, which was also true for tran-sIDS9, transIDS10, and transIDS11 (Table 1).This differential expression suggests a possible role in mating or reproduction and needs further investigation.
Despite detecting two irregular prenyl diphosphate products, none of the IDS-like candidates that we tested displayed coupling activity leading to cis-planococcyl diphosphate under our experimental conditions.Another option, which could be explored in the future, is an expression of candidate genes in insect cells, as their cellular environment should be closer to in vivo conditions and could contribute to correct folding and glycosylation of the holoenzyme.In vitro functional characterization of IDS enzymes can be unreliable, as activity can depend on many factors, including the formation of heteromers. 31To that end, we tested for modulating activity of several potential interactors but did not detect any changes in the activity of P. citri IDS candidates.
The model of irregular terpenoid biosynthesis via a genome-encoded IDS has been challenged.Alternative hypotheses include the possibility that insects sequester plant-derived compounds or utilize products of their endosymbionts' metabolism.Many studies reveal a straightforward strategy of modification and accumulation of plant defense compounds, 29 but, in some cases, the derivatives of plant secondary metabolites are also used in sexual communication.For example, fragments of plant pyrrolizidine alkaloids and phenylpropanoid metabolites are reported to make danaine butterfly males more attractive to the co-specific females. 58However, the presence of irregular monoterpene structures has not been reported for known P. citri host plants, which makes the sequestration hypothesis improbable.Like other insects feeding on nutritionally unbalanced plant sap, mealybugs rely on obligate endosymbiotic bacteria as a source of amino acids and vitamins. 59P. citri is a part of a nested tripartite endosymbiotic system, where the insect harbors the Betaproteobacteria Candidatus Tremblaya princeps, which contains the Gammaproteobacteria Candidatus Moranella endobia. 60,61Although we were unable to identify any shortchain IDS-coding sequences within the genomes of P. citri endosymbionts, we cloned and tested one IDS sequence, a putative UPPS, from Candidatus Moranella endobia (cisIDS8).However, this did not produce any regular or irregular short prenyl chains.Moreover, genetic crosses between P. citri and Planococcus minor-the latter producing a branched lavandulol-type monoterpene-indicate that the enzyme responsible for the formation of the irregular terpene skeleton is encoded in the insect nuclear genome and is present at a single locus. 62The structure of the sex pheromone is also not inherited maternally, 62 as would be expected if synthesis requires the transfer of egg-transmitted endosymbionts.
Another possibility is that the mealybug sex pheromones are biosynthesized via a different catalytic mechanism, executed by a different, and as yet unidentified, class of enzymes.Novel terpene synthases have been described, expanding on the canonical models of terpene biosynthesis. 63A recent example is the elucidation of the iridoid pheromone found in the pea aphid, Acyrthosiphon pisum.Biosynthesis of this molecule proceeds through the same intermediates as in plants, however, it was shown to involve a set of enzymes that lack homology to the plant counterparts. 64The insect enzymes were identified by transcriptomic analysis of the female hind legs, the site of pheromone biosynthesis. 646][67][68] At present, however, such approaches are not possible in P. citri as the site of the pheromone gland remains unknown.
The identification of the P. citri IDS gene family provides an important foundation for deciphering terpenoid biosynthesis in this species as well as other mealybugs.The candidate FPPS and FPPS-like sequences with confirmed regular and irregular activity might also present a starting point for protein engineering; mutations have been shown to increase or even spur irregular coupling activity in IDSs. 14,69Diversification of IDS sequences in Coccoidea is an intriguing genomic enigma for future investigation, especially in relation to their capacity for species-specific irregular monoterpene synthesis, a unique feature in this insect superfamily, which still remains elusive.10 mM MgCl 2 and 5 mM b-mercaptoethanol at 30 C for 30 or 60 min.Regular coupling activity was determined using IPP and DMAPP as substrates, both at 100 mM, while the assays for irregular activity contained 200 mM of only DMAPP.The reaction was stopped by adding 100 mL of chloroform, and the proteins were precipitated by vigorous shaking followed by centrifugation (10 min at 16,000 g).The water phase (injection volume 5 mL) was used for LC-MS analysis performed on an 1260 Infinity HPLC system coupled to a G6120B quadrupole mass spectrometry detector (Agilent Technologies, Santa Clara, CA, USA).The separation was carried out on a Poroshell 120 EC-C18 (3.0 x 50 mm, 2.7 mm) column (Agilent Technologies, Santa Clara, CA, USA) using the method described in Nagel et al., 2012. 99The detection of C10 and C15 prenyl diphosphates was performed in negative single ion mode; m/z 313 and 381, respectively.The quantity of formed products was calculated based on peak areas using a calibration line built for GPP.
For analysis of regular terpene alcohols, the reactions were carried out with 70-500 mg of protein in a total volume of 400 mL in 35 mM HEPES buffer, pH 7.4, containing 10 mM MgCl 2 , and substrates (IPP and DMAPP, 1 mM for transIDS5 and transIDS3, 10 mM for transIDS2, transIDS11, and transIDS17) at 28 C overnight.For dephosphorylation of prenyl diphosphates, the reaction mixture was supplemented with 80 mL of 0.5 M glycine-NaOH buffer (pH 10.5) containing 5 mM ZnSO 4 ; 80 units of calf intestinal alkaline phosphatase (Promega, Madison, WI, USA) were added and the mixture was incubated at 37 C for 1 h.The alcohols were extracted 3 times with 400 mL of methyl tert-butyl ether (MTBE); the organic phase was evaporated to approximately 150 mL under compressed air flow.GC-MS analysis for detection of C10 products was carried out on a QP2010 Ultra (Shimadzu, Duisburg, Germany).The gas chromatograph (GC) was equipped with a ZB-5MS fused silica capillary column (30 m, 0.25 mm ID, df = 0.25 mm) from Phenomenex (Aschaffenburg, Germany).1-3 ml sample aliquots (depending on concentrations) were injected by using an AOC-20i autosampler-system (Shimadzu, Duisburg, Germany) into a PTV-split/splitless-injector (Optic 4, ATAS GL, Eindhoven, Netherlands), which operated in splitless-mode.PTV injection-temperature was programmed from initial 50 C to 250 C (heating-rate of 50 C/s) and then an isothermal hold until the end of the GC run.Hydrogen was used as carrier-gas with a constant flow rate of 1.3 mL/min.The GC-temperature profile was 60 C for 2 min, then from 60 C to 120 C at 2 C per min, from 120 C to 230 C at 10 C per min, from 230 C to 320 C at 20 C, with a final temperature hold for 5 min.Electron ionization mass spectra were recorded at 70 eV from m/z 40 to 250.The ion source of the mass spectrometer and the transfer line were kept at 250 C. GC-MS analysis for detection of C15 products was carried out on Cyclosil-B capillary column (30 m x 0.250 mm x 0.25 mm; Agilent Technologies, Santa Clara, CA, USA) with the following temperature gradient: 50 C for 3 min / from 50 C to 120 C at 2 C per min / 120 C to 230 C at 10 C per min.Helium was applied as the carrier gas at a flow rate of 1 mL/min.Injection volume was 1 mL (1:100 split).For identification of irregular products of transIDS5, 1 mg of protein was incubated with 2 mg of DMAPP tri-ammonium salt in 1.6 mL of 35 mM HEPES buffer, pH 7.4, containing 10 mM MgCl2 at 28 C overnight.The products were dephosphorylated and extracted as described above, and analysed by GC-MS using the protocol for C10 regular products.Maconelliol standard was synthesized by the protocol of Dunkelblum et al., 2002  22 and its structure confirmed by 1H-and 13C-NMR measurements.

Candidate expression in Nicotiana benthamiana
Genetic constructs for candidate expression in plants were assembled with the Golden Braid (GB) cloning system. 100Transcriptional units in alpha-level GB vectors included IDS coding sequences fused with a C-terminal His-tag under control of the 35S CaMV promoter.Each candidate gene was tested in its native form and with the addition of a modified chloroplast transit peptide (cTP) of ribulose bisphosphate carboxylase small subunit from Nicotiana sylvestris (GenBank: XP_009794476.1).If a different (e.g.mitochondrial) localization signal was predicted, it was removed before adding the cTP.Transient expression in N. benthamiana leaves was carried out as described earlier. 101For detection of irregular IDS activity, the plant material (200 mg) was frozen in liquid nitrogen, ground and extracted with 80 % methanol (400 mL) by 30 min incubation in the ice ultrasound bath.Supernatants after two centrifugation steps (10 min at 16,000 g) were analysed using 1260 Infinity HPLC system coupled to G6120B quadrupole mass spectrometry detector (Agilent Technologies, Santa Clara, CA, USA).The separation was carried out on Zorbax Extend-C18 (4.6 x 150 mm, 3.5 mm) column (Agilent Technologies, Santa Clara, CA, USA) with mobile phase consisting of 5 mM ammonium bicarbonate in water as solvent A and acetonitrile as solvent B applying the following gradient (% B): starting at 0, 0 to 8 within 2 min, holding 8 for 15 min, 8 to 64 within 3 min, 64 to 100 within 2 min, holding 100 for 4 min, 100 to 0 within 1 min, re-equilibrating at 0 for 12 min.The detection of C10 and C15 prenyl diphosphates was performed in negative single ion mode; m/z 313 and 381, respectively.For detection of volatile compounds, samples from agroinfiltrated leaves were collected 5 dpi (days post infiltration), snap-frozen in liquid nitrogen and analysed by GC-MS as described previously. 24mputational model and site-directed mutagenesis of transIDS5 The 3-D model of transIDS5 was built on the SWISS-MODEL homology-modelling server 89 using the crystal structure of Saccharomyces cerevisiae geranylgeranyl pyrophosphate synthase in complex with magnesium and IPP 70 (PDB: 2E8U, https://doi.org/10.2210/pdb2E8U/pdb)as a template.For structure visualisation and analysis, UCSF Chimera v.1.14was applied. 102The location of prenyl substrates was derived by superposition of the transIDS5 model and the template structure.Mutations were introduced using the Q5-SDM kit (New England Biolabs) with primers listed in Table S8.

QUANTIFICATION AND STATISTICAL ANALYSIS
Short read mapping counts for all three datasets were used for differential transcript expression analysis based on R packages edgeR and limma. 74,75,94

Figure 1 .
Figure 1.Identification and confirmation of IDS coding sequences predicted by transcriptome dataMultiple sequence alignments for clusters of overlapping transcripts and gene models for candidate sequences transIDS4 (A), transIDS5 (B), transIDS11 (C), and transIDS12 (D).Each cluster includes translated coding sequences obtained from Pcitri.v1 gene models (in black text), long-read transcripts (in blue text), and transcripts de novo assembled from short reads (in orange text).All sequences are available in TableS4C.Sequences confirmed with amplification from cDNA and Sanger sequencing are marked with black dots.Color-coded sequence similarity is given in the legend in the bottom left.The alignment around the L328 versus Q328 variation is shown below the full-length alignments for transIDS12 in (D).Multiple sequence alignment was done in MEGAX using the MUSCLE algorithm and visualized with Geneious software.

Figure 2 .
Figure 2. Phylogenetic tree of P. citri trans-IDS candidates and characterized trans-IDS sequences from Hemiptera species The predicted functions are shown in colored rectangles (DPPS, decaprenyl diphosphate synthase; GGPPS, geranylgeranyl diphosphate synthase; FPPS, farnesyl diphosphate synthase).The maximum-likelihood tree is drawn to scale, with branch lengths measured in the number of substitutions per site (scale on the bottom left).Bootstrap values (1,000 replicates) are shown next to nodes.We included all P. citri trans-IDS candidates, except for transIDS17, a predicted ubiquitin thioesterase, and examples of IDS sequences from Hemiptera species, which were characterized in vitro.The latter include sequences from aphids (Megoura viciae,32,33 Aphis gossypii,31 Myzus persicae,34 and Acyrthosiphon pisum35 ) and from shield bugs (Halyomorpha halys,36 Nezara viridula,36 and Murgantia histrionica.37See FigureS9for the alignment of P. citri trans-IDS sequences included in the phylogenetic tree.

Figure 3 .
Figure 3.In vitro activity of P. citri IDS sequences expressed in E. coli Measured enzymatic activity for production of C10 and C15 prenyl diphosphates (A), chromatogram of standard geraniol (peak 1A in upper panel) and transIDS5 product (peak 1B in lower panel) (B), and schematic representation of confirmed coupling reactions performed by P. citri trans-IDS candidates (C).TransIDS5, transID11, transIDS2, and transIDS17 produced C10 geranyl diphosphate (GPP) and C15 farnesyl pyrophosphate (FPP); transIDS3 produced C10 GPP and cisisogeranyl diphosphate and two C15 FPP isomers.(A) Activity for production of C10 prenyl diphosphates (light blue bars) and activity for production of C15 prenyl diphosphates (dark blue bars).Height of the bars represents the mean of three separate measurements, plotted as block dots.Raw data are available in TableS5.Note the y axis break due to the much higher activity of transIDS5.We did not detect any C10 or C15 products with other tested candidates (Table1).Chromatograms in (B) are also shown in FigureS16.MS spectra of highlighted peaks are given in FigureS17.Chromatograms and MS spectra of products obtained with other candidates are available in FiguresS16-S27.In (C), dimethylallyl diphosphate (DMAPP) can be joined to its isomer isopentenyl diphosphate (IPP) to form regular geranyl and iso-geranyl diphosphates (GPP and iso-GPP) and isomers of farnesyl diphosphate (FPP).FPP is a precursor for juvenile hormone biosynthesis, as denoted by the dashed arrow.Alternatively, two DMAPP units can be assembled into irregular lavandulyl diphosphate (LPP) and maconellyl diphosphate (MPP).The structure of planococcyl acetate (PAc), the sex pheromone of P. citri, is shown in brackets and is presumed to result from the coupling of two DMAPP units as well.

Figure 4 .
Figure 4. Mutagenesis of transIDS5 active site residues (A) Activity of transIDS5 and its mutants expressed in E. coli.Activity for the production of C10 geranyl diphosphate (GPP, light blue bars) and activity for the production of C15 farnesyl diphosphate (FPP, dark blue bars) are plotted.Height of the bars represents the mean of three separate measurements, plotted as black dots.Raw data are available in TableS5.(B) Mutated amino acids and their position in the active site, based on a computational model of active center of transIDS5 with two bound substrate (IPP) molecules showing the aspartate of the first (D166) and the second (D308, D309, D312) aspartate-rich motifs and lysine 120 (K120).Oxygen atoms in aspartate side chains and in pyrophosphate moieties (PPi) of the substrates are shown in red; the nitrogen atom of lysine side chain is highlighted in blue.

Table 1 .
P. citri cisand trans-IDS candidates For each candidate sequence, its putative function, differential expression values (logFC), contrasting samples from virgin to mated females, and results of in vitro regular (R) and irregular (I) C10 or C15 coupling activity tests of the proteins expressed in E. coli are given.For trans-IDS candidates, amino acids positioned at the first and second D-rich motif (FARM, SARM) according to the multiple sequence alignment are given as well.Conserved aspartate (D) residues are underlined.LogFC (log2-fold changes) values were obtained by averaging logFC values for all sequences in the sequence cluster of each candidate (Table