Analysis of the agrotis segetum pheromone gland transcriptome in the light of Sex pheromone biosynthesis

Moths rely heavily on pheromone communication for mate finding. The pheromone components of most moths are modified from the products of normal fatty acid metabolism by a set of tissue-specific enzymes. The turnip moth, Agrotis segetum uses a series of homologous fatty-alcohol acetate esters ((Z)-5-decenyl, (Z)-7-dodecenyl, and (Z)-9 tetradecenyl acetate) as its sex pheromone components. The ratio of the components differs between populations, making this species an interesting subject for studies of the enzymes involved in the biosynthetic pathway and their influence on sex pheromone variation. Illumina sequencing and comparative analysis of the transcriptomes of the pheromone gland and abdominal epidermal tissue, enabled us to identify genes coding for putative key enzymes involved in the pheromone biosynthetic pathway, such as fatty acid synthase, β-oxidation enzymes, fatty-acyl desaturases (FAD), fatty-acyl reductases (FAR), and acetyltransferases. We functionally assayed the previously identified ∆11-desaturase [GenBank: ES583599, JX679209] and FAR [GenBank: JX679210] and candidate acetyltransferases (34 genes) by heterologous expression in yeast. The functional assay confirmed that the ∆11-desaturase interacts with palmitate and produces (Z)-11-hexadecenoate, which is the common unsaturated precursor of three homologous pheromone component acetates produced by subsequent chain-shortening, reduction and acetylation. Much lower, but still visible, activity on 14C and 12C saturated acids may account for minor pheromone compounds previously observed in the pheromone gland. The FAR characterized can operate on various unsaturated fatty acids that are the immediate acyl precursors of the different A. segetum pheromone components. None of the putative acetyltransferases that we expressed heterologously did acetylate any of the fatty alcohols tested as substrates. The massive sequencing technology generates enormous amounts of candidate genes potentially involved in pheromone biosynthesis but testing their function by heterologous expression or gene silencing is a bottleneck. We confirmed the function of a previously identified desaturase gene and a fatty-acyl reductase gene by heterologous expression, but the acetyltransferase postulated to be involved in pheromone biosynthesis remains illusive, in spite of 34 candidates being assayed. We also generated lists of gene candidates that may be useful for characterizing the acetyl-CoA carboxylase, fatty acid synthetase and β-oxidation enzymes.


Background
Moths rely heavily on sex pheromones for mate finding [1]. Usually the females produce and emit the pheromone from a specialized structure, the sex pheromone gland located at the intersegmental membrane between the 8 th and 9 th abdominal segment, associated with the ovipositor at the end of the adult female abdomen [2]. Over the last five decades, sex pheromones have been identified from more than 700 species [3,4]. The pheromone compounds are mostly fatty acid derivatives, with carbon chain length C10-C18, with 0-4 double bonds [3], and an oxygenated functional group (alcohol, aldehyde, acetate ester) [3][4][5]. A substantial portion is made up by acetate esters [3]. Most moths use a combination of two or more compounds in a specific ratio, constituting a more or less species-specific blend. The pheromone biosynthesis pathways have been studied extensively and are well documented in many moth species [6,7]. Characterization of the enzymes involved in the process of pheromone biosynthesis not only helps to understand the evolution of sexual communication and speciation, but could ultimately also aid in pest control by allowing the design of drugs that block the biosynthetic machinery or by allowing the synthetic biologist to produce species-specific pheromones for mass trapping or mating disruption in biological systems like cell factories or genetically modified plants [8][9][10].
During the last two decades, desaturases introducing double bonds into the acyl chain in Δ6 [11], Δ9 [12,13], Δ10 [14], Δ11 [12,13,[15][16][17] and Δ14 [18] position have been cloned from many moth species, and their functions have been characterized in various heterologous expression systems. Also an omega-desaturase that introduces a double bond in the methyl terminal carbon has been cloned and characterized [19]. In some cases, two double bonds can be produced by one desaturase [17] or by the consecutive activities of two desaturases [11]. These studies highlight desaturases as key players for the diversity of pheromone structures in moth species and their role in reproductive isolation and speciation. A variety of desaturases in combination with limited chain-shortening or chain-elongation account for the diversity of double bond isomerism observed among moth sex pheromone compounds.
After the double bonds are in place and the acyl chain length is adjusted, the carbonyl carbon is modified to form a functional group. Firstly it requires a step that converts the fatty-acyl precursors into fatty alcohols. Great progress has been made since the first fatty-acyl reductase gene was identified in Bombyx mori [20]. Several Ostrinia spp. FARs have been characterized [21,22]. The Ostrinia FARs are very essential to determine the final pheromone compositions [22], minimal changes in sequence can cause the pheromone component ratio to shift dramatically [23]. On the other hand, there are also FARs that are very versatile in terms of substrate specificity [24,25], involved in the biosynthesis of multicomponent pheromones.
Fatty alcohols serve as the actual pheromone components in a number of moth species [3], but mostly, fatty alcohols will be either oxidized to aldehydes [26][27][28] or esterified to form acetate esters [29][30][31]. Acetyltransferases are important enzymes since the acetate esters are very commonly occurring pheromone components among moth species. They have not been cloned from insects yet, but have been investigated in some cases by in vivo labeling studies [32][33][34]. Unlike FARs, these functional group modification enzymes have not been studied extensively.
The chain-shortening pathway has not been characterized at the enzymatic level in insects, but it was noted in a couple of cases that mutations in the β-oxidation pathway did affect the final pheromone compositions. The major pheromone component of cabbage looper moth Trichoplusia ni is Z7-12:OAc, whereas a mutant strain produced a greatly increased amount of Z9-14:OAc [35,36].
An EST-library was previously constructed from the A. segetum pheromone gland, revealing candidate genes involved in pheromone production [42] and a Δ11 desaturase and a FAR involved in pheromone production were already characterized [9]. The EST library, however, contains only 2, 285 objects. We now report a more extensive database of candidate genes potentially involved in the A. segetum sex pheromone biosynthetic pathway, generated by next generation sequencing technology (NGS). We constructed transcriptome libraries from two tissues of A. segetum: pheromone gland (As_PG) and abdomen (As_AB). Furthermore, to test the function of several of the candidate genes we assayed them in a yeast heterologous expression system.

Results and discussion
Illumina sequencing, unigene assembly, and analysis of transcripts In total 53 million raw clean reads were obtained from each (As_PG and As_AB) library, with a total of 4.8G clean nucleotides in each ( Table 1). The clean reads from the two libraries were assembled into 62,165 consensus contigs (Table 2), including 22,633 distinct clusters (referred to as CL) and 39,532 distinct singletons (referred to as Unigene). These consensus contigs have a mean length of 733 nt, and N50 [43] =1,150, total length 45 Mb. Size distributions of the unigenes can be seen in Fig. 1 and their differential expression in the two tissues is displayed in Fig. 2. This transcriptome shotgun assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GBCW00000000. The version described in this paper is the first version, GBCW01000000.
After alignment by blastx to protein databases NCBI Nr, Swiss-Prot, KEGG and COG (e-value < 0.00001), and alignment by blastn to nucleotide database NCBI Nt, annotations were retrieved from the highest sequence similarity with the given unigenes along with their protein functional annotations. About half of the unigenes have a hit in one or more of the databases (Table 3). 14,108 unigenes hit the first record in the Nr database with Bombyx mori (Table 4). A. ipsilon got 293 hits, but this does not mean the A. segetum is closer to B. mori than A. ipsilon, it is just because the B. mori got more records deposited in GenBank than A. ipsilon. In terms of clusters of orthologous groups (COG), 19,911 unigenes were classed in one or more of the 26 COG functional categories (Fig. 3).
By running the Blast2GO program [44] using Nr annotation, 11,246 (18 %) unigenes were assigned to one or more GO categories. The GO terms cellular process, binding, catalytic activity, and metabolic process were the most abundantly represented categories (over 5,000 unigenes, details see Fig. 4). These numbers and percentages are similar to the results that Gu and coauthors [45] presented from A. ipsilon.
The biosynthetic pathway (Fig. 5) leading to the pheromone components of A. segetum is similar to what has been reported for many other moth species [37,45]. The key players among enzymes involved or postulated to be involved are desaturases, β-oxidation enzymes, fatty-acyl reductases, and acetyltransferases. Pheromone biosynthesis is reported to be under control by a pheromone biosynthesis activating neuropeptide (PBAN) [46]. In the following we present candidate genes related to each step in the biosynthetic pathway, their expression levels and their functional assay in yeast heterologous expression systems.
Pheromone biosynthesis activating neuropeptide (PBAN) receptor PBAN is released from the subesophageal ganglion (located near the brain) and is transported through hemolymph (or via the ventral nerve cord) to the pheromone gland. Upon binding to the PBAN receptor present on the pheromone gland cell membrane [47], it will induce the opening of calcium channels causing influx of extracellular calcium [7]. Then the calcium binds to calmodulin, that stimulates phosphatase (and/or kinase), which subsequently activates the FAR in the case of Bombyx mori [48] or in other cases the acetyl-CoA carboxylase [49] or the acetyltransferase [50] maybe regulated. In As_PG, we found one gene, Ase_17579 [GenBank: KJ622075], which is 97 % identical to Helicoverpa zea PBAN receptor. Its expression level is similar in As_PG and As_AB, 15.6 FPKM and 16.9 FPKM, respectively.

Acetyl-CoA carboxylase
The rate-limiting step in fatty acid biosynthesis [51] is the ATP-dependent carboxylation of acetyl-CoA to malonyl-CoA catalyzed by acetyl-CoA carboxylase (ACCase) [52], the first step in saturated long chain fatty acid biosynthesis. ACCase is a large protein with multiple catalytic activities, working coordinately and providing malonyl-CoA substrate for the biosynthesis of fatty acids [50]. In A. segetum, we found one contig Ase_7442 (KJ622074), encoding the full-length of this protein. It shares 69 % amino acid identity with the ACCase of Tribolium castaneum and 88 % aa identity with B. mori ACCase. The expression level of Ase_7442 is not significantly different in As_AB (37 FPKM) and in As_PG (26 FPKM).

Fatty acid synthase
Fatty acid synthase (FAS) is a multifunctional protein [52] that produces saturated fatty acids using malonyl-CoA and acetyl-CoA as substrate and that requires NADPH as reducing agent, in a cyclic process in which an acetyl primer undergoes a series of decarboxylative condensations with several malonyl moieties [53]. The resulting products are palmitic and stearic acid in insects, Q20 percentage is proportion of nucleotides with quality value larger than 20 in reads (sequencing error rate = 1 %); N percentage is proportion of unknown nucleotides in clean reads. GC percentage is proportion of guanidine and cytosine nucleotides among total nucleotides as proven by labeling studies [49,53,54]. We found six unigenes (KJ622068-KJ622073) that are homologous to the Aip_FAS_JX989151 [45]. In total, these unigenes are seven times more expressed (significant) in As_PG than in As_AB (FPKM 67.1 / FPKM 9).

Desaturases
Fatty acid desaturases catalyze the introduction of double bonds into acyl chains with strict regioselectivity and stereoselectivity, and can be divided into four categories [55]: 1) first desaturases, inserting a double bond into the saturated acyl chain; 2) frontend desaturases, introducing a double bond between an existing double bond and the carboxylic end; 3) omega desaturases, inserting a double bond between an existing double bond and the methyl end; 4) sphingolipid desaturases, introducing double bonds into sphingolipids which are important components of eukaryotic plasma membranes. We found 10 desaturase candidates from the two tissues, and they belong to the first desaturase (7/10), the front-end desaturase (2/10), and the sphingolipid desaturase (1/10) subfamilies. Among the first desaturase subfamily, several groups have been recognized based on phylogenetic and functional analysis and a four-letter "signature motif" has been suggested for each of them as shown in Fig. 6. These signature motifs strongly associate with the location of double bond that the desaturase is inserting in the fatty-acyl chain, with less emphasis on the chain length selectivity: Δ9_KPSE (C16 > C18) desaturases and Δ9_NPVE (C18 > C16) desaturases that are mostly involved in fatty acid metabolism [56], and Δ11,Δ10 and bifunctional desaturases with the "xxxQ" motif (with a few exceptions having "xxxE" motif ) exclusively involved in pheromone biosynthesis [13,57] (Fig. 6). The Ase_1623 and Ase_4567 have the signature motif of KPSE and NPVE, respectively, that strongly suggest their roles in ordinary metabolic pathways as Δ9 desaturases. The Ase_21308 and Ase_5534 both have "xxxQ" motif, suggesting that they represent Δ11 desaturases. The Ase_5534 is the obvious candidate for pheromone biosynthesis since   its expression level in As_PG is 1-2 magnitudes higher than any other candidate, and it has a very low expression level in As_AB. The functional assay confirms that it possess the ability of creating a Δ11double bond mainly on palmitate (Fig. 7), corroborating that Ase_5534 [GenBank:KJ622051] is the desaturase responsible for pheromone production in A. segetum. This result is consistent with the recent study in which this gene was expressed in different yeast strains [9] and this desaturase was as a matter of fact found already in the EST analysis reported by Strandh et al. [42]. This desaturase introduces a double bond with Z configuration on 16:Acyl. Besides Z11-16:Acyl, some other minor products were detected: Δ11-12:Acyl, Z11-14:Acyl, E11-14:Acyl, and Z11-15:Acyl (Fig. 7). The Ase_21308 displayed no activity in our yeast expression system. Gu et al. [45] found a desaturase that is very close to our Ase_5534 (as shown in Fig. 6) and which should be the one involved in pheromone biosynthesis in A. ipsilon. Compared to [42,45,58], our dataset includes a larger set of desaturases isolated from the moth pheromone gland and surrounding tissues.

β-oxidation enzymes
After the palmitate has undergone Δ11 desaturation in the A. segetum pheromone gland, it is subject to limited chain shortening by β-oxidation, resulting in three homologous fatty-acyl pheromone precursors with 14C, 12C, and 10C chain length (Fig. 5). Chain-shortening by βoxidation is the action of a series of enzymes, working sequentially and forming a reaction spiral.
In the first reaction, acyl-CoA is converted to E2enoyl-CoA, by acyl-CoA oxidases (in peroxisomes) and acyl-CoA dehydrogenases (in mitochondria). Four acyl-CoA dehydrogenases with different chain length specificities cooperate to assure that the complete degradation of all fatty acids with different chain length. The names of the four dehydrogenases, short-chain, medium-chain, long-chain, and very long-chain acyl-CoA dehydrogenases, reflect their chain-length specificities. Short-chain acyl-CoA dehydrogenase only acts on short-chain substrates like butyryl-CoA and hexanoyl-CoA. Mediumchain acyl-CoA dehydrogenase is most active with substrates from hexanoyl-CoA to dodecanoyl-CoA, whereas long-chain acyl-CoA dehydrogenase preferentially acts on octanoyl-CoA and longer-chain substrates [59]. Verylong-chain acyl-CoA dehydrogenase extends the activity spectrum to longer-chain substrates, including those having acyl chains of 22 and 24 carbon atoms [60]. We found a full spectrum of dehydrogenases and oxidases in A. segetum (Table 5). It is noteworthy that unigene6715 was expressed 72 times higher (386FPKM) in As_PG than in As_AB (5FPKM). In addition, we found two unigenes of isovaleryl-CoA dehydrogenase, which is specific for metabolism of branched-chain fatty acids [61].
The second step of β-oxidation E2-enoyl-CoA is reversibly hydrated by enoyl-CoA hydratase to L-3hydroxyacyl-CoA. Two categories of enoyl-CoA hydratases have been identified in mitochondria [62]. One is specialized for crotonyl-CoA (4C). The second one is long-chain enoyl-CoA hydratase, effectively hydrates medium-chain and long-chain substrates. Long-chain enoyl-CoA hydratase is a component enzyme of the trifunctional β-oxidation complex, which additionally exhibits long-chain activities of L-3-hydroxyacyl-CoA dehydrogenase and 3-ketoacyl-CoA thiolase [63], and resides in the inner mitochondrial membrane. We found six enoyl-CoA hydratases from A. segetum (Table 5), with similar expression level in both As_PG and As_AB.
The third reaction is the reversible dehydrogenation of L-3-hydroxyacyl-CoA to 3-ketoacyl-CoA catalyzed by L-3hydroxyacyl-CoA dehydrogenase. Four categories of L-3hydroxyacyl-CoA dehydrogenases have been identified in mitochondria. Long chain L-3-hydroxyacyl-CoA dehydrogenase is a component enzyme of the trifunctional β-oxidation [63], which is most active with long-chain substrates. Medium-chain and short-chain L-3-hydroxyacyl-CoA dehydrogenase, are both soluble matrix enzymes, processing medium-and short-chain substrates. These three enzymes complement each other and thus assure high rates of dehydrogenation over the whole spectrum of β-oxidation intermediates [63]. In A. segetum this group of enzymes is represented by five unigenes (Table 5), with similar expression level in both tissues.
The final step in which 3-ketoacyl-CoA is cleaved by thiolase between its α and β carbon atoms, makes the  The first column shows the species with the highest number of similar genes in descending order, the second column indicates the number of these annotated genes, and the last column shows the percentage of genes with respect to the total annotated genes sequences substrate two carbons shorter. Three classes of thiolase exist in motochondria: acetoacetyl-CoA thiolase or acetyl-CoA acetyltransferase (specific for acetoacetyl-CoA), 3ketoacyl-CoA thiolase or acetyl-CoA acyltransferase (act on C4-C16), and long-chain 3-ketoacyl-CoA thiolase that is a component enzyme of the membrane-bond trifunctional βoxidation complex, whereas the first two thiolases are soluble matrix enzymes [62]. We found six unigenes of 3ketoacyl-CoA thiolase from A. segetum (Table 5), and all of them expressed at similar level in As_PG and As_AB. The degradation of unsaturated fatty acids requires auxiliary enzymes like Δ3,Δ2-enoyl-CoA isomerase and 2,4-dienoyl-CoA reductase to modify the structure of double bonds during the β-oxidation process to ensure a continuous flow of intermediates through the β-oxidation spiral [64]. We found two unigenes of the Δ3,Δ2-enoyl-CoA isomerase, one mitochondrial type and one peroxisomal. In addition, we found a Δ3,5Δ2,4-dienoyl-CoA isomerase that is specialized for processing odd-numbered double bonds [61].
Insects in general have the ability to shorten longchain fatty acids to a specific shorter chain length [65]. Jurenka [5,66] suggests that this kind of limited chain shortening takes place in the peroxisomes. Unlike the β- oxidation in mitochondria in which substrates are thoroughly degraded into two-carbon units, the β-oxidation in peroxisomes ceases at the formation of medium-chain fatty-acyl-CoAs, because acyl-CoA oxidase is inactive toward substrates having acyl moieties of eight or fewer carbon atoms [66]. Moth pheromone components are commonly 12C and 14C, suggesting that there is still something specific about the chain-shortening being involved in moth pheromone biosynthesis.
In the present study we found representatives of all the key players and auxiliary enzymes of β-oxidation, and some of them are very high in expression level and differentially expressed among the As_PG and As_AB (Table 5), forming promising candidates to tackle the role of β-oxidation on pheromone biosynthesis either by heterologous expression or by RNAi.

Fatty-acyl reductases
Chain-shortened fatty-acyl precursors are reduced to the corresponding alcohols by Fatty-Acyl Reductases (FARs) [20]. FARs have been cloned and characterized from several moth species [21][22][23] since the first one was found in B. mori [20]. We found ten full-length unigenes in both the As_PG and As_AB (Fig. 8) and two of them cluster within the pgFAR clade [25]. We expressed the two ORFs that belong to the pheromone-producing clade in our yeast expression system. The results showed that it is the one named Ase_1929 that is responsible for reducing all the three fatty-acyl precursors (Z5-10:CoA, Z7-12:CoA, Z9-14:CoA) into their corresponding fatty alcohols (Fig. 9), and it is the most abundant transcript among all these ten unigenes (Fig. 8). This result is consistent with previous findings that a single FAR takes a wide range of the fatty-acyl substrates and convert them into fatty alcohols [24,25,67]. Our results complement the findings of Hagström et al. [9], who presented the Ase_1929 with longer chain acyl substrates (16C) and found it to be very active on them. But what appears in Fig. 9 as if the Ase_1929 when expressed in yeast is more active towards longer chain length substrates than towards shorter ones, does not necessarily represent its actual activity in the pheromone gland. This difference could partly be due to the volatility of the substrates and products, since shorter chain methyl esters and alcohols are more volatile and thus may escape from the yeast expression system during incubation. The other unigene Ase_20982 which also clustered with the pgFAR clade, did not show any function in our heterologous expression system. The FAR [GenBank: JX989146, protein ID: AGR49323] found by Gu et al. [45] clustered very close to our Ase_1929, suggesting that it is likely involved in pheromone biosynthesis in A. ipsilon.

Acetyltransferases
The genes involved in acetylation of fatty pheromone alcohols have not been cloned from any insect species. Acetyltransferases probably belong to a huge family of acyl CoA-utilizing enzymes whose products include a variety of chemicals, such as neurotransmitters [68], plant volatile esters [69,70], constitutive defense compounds, waxes [71], phytoalexins, lignin, phenolics, alkaloids, and anthocyanins [72][73][74], which makes it very difficult to make functional predictions from primary sequence information alone. Attempts have been made but ended up with getting other member of the family [75]. This step is the last step in the A. segetum pheromone biosynthetic pathway. Most likely it does not significantly influence the ratio between the homologous acetate pheromone components [32,33], although in several moth species the Z isomers are produced faster than the E isomers and yield more products, when the alcohol substrates were supplied to the pheromone gland homogenate [34]. Previously, a number of acetyltransferases that were cloned from plant species were studied. For example, the kiwi alcohol acetyltransferase AT9 Fig. 5 Biosynthetic pathway leading to the sex pheromone of Agrotis segetum, modified from [37]. It starts with carboxylation of acetyl-CoA to malonyl-CoA, and then they are entered to a cycle of fatty acid synthesis and end up with common fatty acids stearate and palmitate. The Δ11 desaturase inserts a double bond in the acyl chain and then the unsaturated fatty acid is subjected to three rounds of chain-shortening by β-oxidation, forming three acyl-chains different by two carbon atoms. These acyl-chains are then reduced by fatty-acyl CoA reductase (FAR) to make fatty alcohols, which are then acetylated to acetate esters, the final A. segetum pheromones. Thick arrows represent steps are functionally assayed in heterologous systems produced butyl acetate and butyl propionate with highest catalytic activities when tested with short chain alcohols [69]. Another example is the apple alcohol acyltransferases (MpAAT1) that use coenzyme A (CoA) donors together with alcohol acceptors as substrates, which were cloned and characterized [76]. The MpAAT1 recombinant enzyme can utilize a range of alcohol substrates from short to medium straight chain (C3-C10), branched chain, aromatic and terpene alcohols. The enzyme can also utilize a range of short to medium chain CoAs [76], but alcohols longer than C10 have not been tested. Similar work was done with enzymes forming volatiles esters from banana, melon, and strawberry. The recombinant enzymes were capable of producing esters from a wide range of alcohols and acyl-CoA [70,77,78]. Overall, these enzymes are members of the BAHD [73,74], generally recognized by their active site motif (HXXXD) and a conserved region (DFGWG) with likely structural significance [69]. A tBLASTn search with current published BAHDs as query against our A. segetum transcriptome got no hit, suggesting this moth may not express this gene family in its pheromone gland or they have undergone substantial evolutionary changes. But according to previously published putative acetyltransferase [45,58] and the annotation results we found 34 candidates in our transcriptome (Table 6) and we heterologously expressed them in our yeast system. The ORF of each of these putative acetyltransferases was thus cloned into a yeast expression vector, pYES-DEST52, under the control of galactose inducible promoter. The ΔATF1 yeast strain (knock out strain that lacking most of the acetylation activity) was transformed with an individual construct and incubated in liquid culture with the culture medium supplemented with a mixture of Z9-14:OH, Z7-12:OH and Z5-10:OH. After incubation for 2 days, the total lipids were extracted and analyzed by GC-MS for acetate esters. The results did not reveal any of insect candidate genes being capable of esterifying fatty alcohols into acetate esters (Fig. 10), whereas the strain overexpressing ATF1 was highly active (positive control).

Conclusions
We explored the data obtained from massive sequencing of pheromone producing tissue of the turnip moth and compared the expression levels of candidate unigenes with their expression levels in the abdominal epidermal tissue. This allowed identification of key parts involved in the pheromone biosynthesis pathway such as: βoxidation enzymes, a desaturase, a fatty-acyl reductase, putative acetyltransferases, and other components involved in the fatty-acid metabolism, like acetyl-CoA carboxylase and fatty-acid synthetase. By phylogenetic analyses of desaturases and FARs, we found the most promising candidates for each gene and confirmed their function in pheromone biosynthesis. The Δ11-desaturase is a specialist interacting preferentially with 16:CoA and producing Z11-16:CoA. The FAR we found is a generalist that can reduce a broad range of saturated and unsaturated acyl substrates from 10C to 16C. We had specifically hoped to be able to clone and characterize an acetyltransferase involved in pheromone biosynthesis that has been postulated in many studies. We tested 34 genes that were annotated to be acetyltransferases in our yeast expression system (which can successfully express plant-derived acetyltransferase [10]) but it turned out that none of them was functional in comparison to the ATF1 positive control. The nature of the acetyltransferase involved in moth pheromone biosynthesis remains illusive. In addition, we generated a list of β-oxidation enzymes, ACC and FAS that can be further tested either by heterologous expression or by RNAi.

Insects and tissue collection
Agrotis segetum were obtained from a laboratory culture, continuously maintained for more than 20 years in Lund, but repeatedly rejuvenated by addition of field-collected insects. The larvae were reared on a semisynthetic beanbased diet [79] and kept at 25°C under a 16 h:8 h light: dark cycle. Pupae were separated by sex and placed in different jars. The third day after they emerged, 30 pheromone glands [58,80] and abdominal epidermal tissue [6] from 3 individuals of female moths were dissected 3-4 h into scotophase [81] and stored in −80°C freezer until RNA extraction.

RNA extraction
Total RNA from pheromone gland and abdominal tissue were extracted using the TRIzol reagent (Life (See figure on previous page.) Fig. 6 The neighbor-joining tree of selected lepidopteran desaturase genes, constructed using amino-acid sequences. Desaturases described in this study are indicated by different shapes (with signature motif displayed for the First Desaturase), followed by unigene expression levels in the gland and abdomen library, respectively (As_PG_FPKM /As_AB_FPKM). Desaturases in previous studies are named as follows: biochemical activities (if known) are indicated in connection to the species name, followed by accession number in parenthesis. Most of the desaturases used in here are First Desaturases that introduce double bond into saturated fatty acids. Among the First Desaturase, four distinctive groups formed that separate their biological functions. The Δ9 desaturases are usually used for normal fatty acid metabolism, with the "KPSE" group having preference on C16 and "NPVE" group mainly modifying C18. The Δ11,Δ10 and bifunctional desaturases with the "xxxQ" motif (with a few exceptions having "xxxE" motif) exclusively involved in pheromone biosynthesis. The Δ5,Δ6,Δ14 group contain a mixture of different signature motifs derived from the Δ9 and Δ11 groups, and their biological function are also diverged. The tree was rooted on the Δ9-desaturase-KPSE (C16 > C18) functional class Technologies, Lidingö, Sweden) according to the manufacturer's instructions except for three additional ethanol washes before dissolving RNA in water. RNA concentration and purity were checked on Nano-Drop2000 (Thermo Scientific, Saveen Werner, Malmö, Sweden).

Illumina sequencing and bioinformatic analysis
Twenty μg of total RNA from each sample were sent to BGI (Hong Kong Co., Ltd) for library construction, Illumina sequencing and subsequent bioinformatic analysis.
Reads were assembled using Trinity [82] into contigs. Then the reads are mapped back to contigs, get sequences  without Ns and cannot be extended on either end. Such sequences are defined as unigenes. Then TGICL [83] is used to assemble all the unigenes from As_PG and As_AB to form a single set of non-redundant unigenes. Then gene family clustering was performed and unigenes were divided to two classes. One is clusters, which the prefix is CL and the cluster id is behind. In one cluster, there are several unigenes which similarity between them is more than 70 %. The other is singletons, which the prefix is unigene. Unigene sequences are aligned with blastdb using blastx (E-value < 0.00001). Sequence orientations are determined according to the best hit in the database. Bioinformatic data were viewed and further processed using Geneious version (6.1.6), created by Biomatters and available from http://www.geneious.com. The calculation of unigene expression was performed using the FPKM method (Fragments Per kb per Million fragments) [84,85], the formula is FPKM = 10 6 C/NL/ 10 3 . Set FPKM to be the expression of unigene A, and C to be number of fragments that uniquely aligned to unigene A, N to be total number of fragments that uniquely aligned to all unigenes, and L to be the base number in the CDS of unigene A. The FPKM method eliminates the influence of different gene lengths and sequencing levels on the calculation of gene expression. Therefore the calculated gene expression can be directly used for comparing the differences in gene expression between samples [85].
Functional annotations of unigenes were conducted, based on protein sequence similarity, towards the KEGG Pathway, COG [86] and Gene Ontoloty (GO) databases. Briefly, we search all unigene sequences against protein databases (NR, SwissProt, KEGG, COG) using blastx (E-value < 0.00001). Based on NR annotation, we use Blast2GO program [44] to get GO annotation of all unigenes. After getting GO annotation for every unigene, we use WEGO software [87] to do GO functional classification for all unigenes.

Phylogenetic reconstruction
Sequences used for phylogenetic reconstructions were retrieved from the GenBank (http://www.ncbi.nlm.nih.gov) database. Neighbor-Joining trees were constructed using Mega version 4.0 [88]. Briefly, multiple sequence alignments were run using the MAFFT (online version) and the output FASTA format data were plugged Table 5 The gene candidates found in As_PG that may be involved in p-oxidation processes (Continued) were from our laboratory collection of pheromone compounds.

Construction of expression vector for functional assay
For the construction of yeast expression vectors containing the candidate genes, specific primers with attB1 and attB2 sites incorporated were designed for amplifying the ORF of genes of interest. The PCR products were subjected to agarose gel electrophoresis and purified using the Wizard® SV Gel and PCR Clean up system (Promega Biotech AB, Nacka, Sweden). The ORFs were subcloned into the pDONR221 vector in presence of BP    [19]. In the acetyltransferase assay the yeasts were supplemented with a mixture of the three alcohols Z9-14:OH, Z7-12:OH and Z5-10:OH. Yeasts were cultured in 30°C in a shaking incubator at 30°C.

Fatty acid/alcohol/acetate analysis
After 48 h of incubation yeast cells were harvested by centrifugation at 3,000 rpm. For the analysis of desaturase products, total lipids were extracted using 3.75 mL of methanol/chloroform (2:1, v/v), in a glass tube. One mL of HAc (0.15 M) and 1.25 mL of water were added to the tube to wash the chloroform phase. Tubes were vortexed vigorously and centrifuged at Fig. 10 Functional assay of putative acetyltransferase genes. Y-axis represents the total amount of acetate esters (sum of Z5-10:OAc, Z7-12:OAc, and Z9-14:OAc) produced by the yeast cells transformed with candidate genes (±95 % confidence interval, n = 3). Negative control is yeast cell (ΔATF1) transformed with empty vector and the yeast strain overexpressing ATF1 gene serves as positive control. None of the 34 candidate genes produces significantly higher amount of acetate esters compared to the negative control (overlapping 95 % confidence intervals), whereas the ATF1 produces a 45-fold increase in acetate production 2000 rpm for 2 min. The bottom chloroform phase, about 1 mL, containing the total lipids, were transferred to a new glass tube. Fatty acid methylesters (FAMEs) were made from this total lipid extract. The solution of total lipids was evaporated to dryness under gentle nitrogen flow. One mL of sulfuric acid (2 % in methanol) was added to the tube, which was then vortexed vigorously and incubated at 90°C for an hour. After incubation, 1 mL of water was added, mixed well, and then 1 mL of hexane was used to extract the FAMEs [10]. Fatty alcohols and acetates were extracted from cells using 800 μL of hexane plus sonication. After brief centrifugation, the supernatant was transferred to a new tube and subjected to GC-MS analysis.
Double bond positions were confirmed by dimethyl disulfide (DMDS) derivatization [13], followed by GC-MS analysis. FAMEs (50 μL) were transferred to a new tube and 50 μL DMDS was added and incubated at 40°C overnight, in the presence of 5 μL of iodine (5 % in diethyl ether) as catalyst. Hexane (200 μL) was added to the sample and the reaction was neutralized by addition of 50-100 μL Na 2 S 2 O 3 (5 % in water). The organic phase was recovered and concentrated under a gentle nitrogen stream to 40-50 μL.

Gas chromatography -mass spectrometry (GC-MS)
The methylesters, fatty alcohols and acetates were subjected to GC-MS analyses on a Hewlett Packard 6890 GC coupled to a mass spectrometer HP 5973. The GC was equipped with an INNOWax column (30 m × 0.25 mm i.d. × 0.25 μm film thickness, Agilent Technologies), and helium was used as carrier gas (average velocity: 33 cm/s). The MS was operated in electron impact mode (70 eV), scaning between m/z 30 and m/z 400, and the injector was configured in splitless mode at 220°C. The oven temperature was set to 80°C for 1 min, then increased at a rate of 10°C/min up to 210°C, followed by a hold at 210°C for 15 min, and then increased at a rate of 10°C/ min up to 230°C followed by a hold at 230°C for 20 min.
DMDS derivatives were analyzed on an Agilent 6890 GC system equipped with HP-5MS capillary column (30 m × 0.25 mm i.d. × 0.25 μm film thickness, Agilent Technologies) coupled with an HP 5973 mass spectrometer. The oven temperature was set at 80°C for 1 min, raised to 140°C at a rate of 20°C/min, then to 250°C at a rate of 4°C/min and held for 20 min [11].
Data were analyzed using the ChemStation software (Agilent, Technologies, USA).

Competing interests
The authors declare that they have no competing interests.
Authors' contributions BJD and CL conceived the study, BJD and CL designed the study, BJD performed the research, BJD analyzed the data, BJD and CL wrote the manuscript. Both authors read and approved the final manuscript.