Molecular Cloning, Characterization and Positively Selected Sites of the Glutathione S-Transferase Family from Locusta migratoria

Glutathione S-transferases (GSTs) are multifunctional enzymes that are involved in the metabolism of endogenous and exogenous compounds and are related to insecticide resistance. The purpose of this study was to provide new information on the molecular characteristics and the positive selection of locust GSTs. Based on the transcriptome database, we sequenced 28 cytosolic GSTs and 4 microsomal GSTs from the migratory locust (Locusta migratoria). We assigned the 28 cytosolic GSTs into 6 classes—sigma, epsilon, delta, theta, omega and zeta, and the 4 microsomal GSTs into 2 subclasses—insect and MGST3. The tissue- and stage-expression patterns of the GSTs differed at the mRNA level. Further, the substrate specificities and kinetic constants of the cytosolic GSTs differed markedly at the protein level. The results of likelihood ratio tests provided strong evidence for positive selection in the delta class. The result of Bayes Empirical Bayes analysis identified 4 amino acid sites in the delta class as positive selection sites. These sites were located on the protein surface. Our findings will facilitate the elucidation of the molecular characteristics and evolutionary aspects of insect GST superfamily.


Introduction
The migratory locust (Locusta migratoria) is an important agricultural pest in East and South Asia and the Pacific Region. The locust has developed insecticide resistance because of frequent applications of insecticides. Increased glutathione Editor: Joshua B. Benoit, University of Cincinnati, United States of America information on the GSTs from L. migratoria and help researchers uncover the molecular characteristics and evolutionary aspects of insect GSTs.
In the present study, we sequenced and classified 32 full-length GST sequences from L. migratoria, and analyzed their tissue-and stage-dependent expression patterns by using real time quantitative PCR (RT-qPCR). We heterologously expressed 25 of these GST genes by using E. coli expression system We purified the GSTs by using NTA-Ni 2+ affinity chromatography. Finally, we identified positive selection sites of the GSTs by using the branch model, site model, and Bayes empirical Bayes (BEB) methods, and labeled these sites by using 3-dimensional homology models. Our findings will facilitate the elucidation of the molecular characteristics and evolutionary history of insect GSTs.

Molecular cloning and phylogenetic analysis
Total RNA was extracted by using TRIzol (TaKaRa, Japan), and mRNA was subsequently isolated by using the PolyATtract mRNA isolation systems. The cDNA was synthesized by using the SMARTer RACE cDNA amplification kit. RACE PCR was performed by using Fastpfu DNA polymerase, according to the manufacturer's instructions. The PCR product was subcloned into the pEASY-Blunt zero vector and sequenced by Sangon Biotech (Shanghai, China). The phylogenetic tree was constructed based on full-length DNA sequences, by using the neighbor-joining algorithm with mega 5.0 software.

Analysis of tissue-and stage-dependent expression patterns
For the tissue-specific expression profile analysis, we dissected 12 different tissues, including the foregut, midgut, hindgut, gastric caecum, Malpighian tubules, cuticle, spermary, ovary, trachea, fat bodies, antenna, and muscles (from legs and back) from adult individuals of L. migratoria. For the stage-specific expression profile analysis, we collected whole bodies of L. migratoria individuals at 8 different developmental stages, including the egg (5 days and 10 days), and the 1st, 2nd, 3rd, 4th, and 5th-instar nymphs and adults. We extracted total RNA from these samples, and used 4 mg total RNA to synthesize cDNA, by using oligo-d(T) and RevertAid H minus reverse transcriptase. The primers used for RT-qPCR analysis are shown in S1 Table. The thermal cycling profile consisted of two steps, including 94˚C for 5 s, 60˚C for 31 s, followed by melting curve analysis. The fluorescence intensity was monitored during each annealing step. Relative transcript levels were determined by using the double standard curve method with b-actin as a reference gene. The experiment was repeated with 3 biological replicates, each with 2 technical replicates.All data were analyzed by the one-way ANOVA, LSD test.

Protein expression and enzymatic activity assay
The open reading frames of the GSTs were subcloned into the pET28a vector, and the recombinant vectors were used to transform E. coli BL21 (DE3). The recombinant GSTs were induced by using 1 mM isopropyl-b-D-thiogalactopyranoside (IPTG) at 25˚C, and purified by using NTA-Ni 2+ -affinity chromatography. The enzymatic activities were measured spectrophotometrically, as described previously [3]. The protein concentration was determined by using the BCA method with BSA as the standard protein.

Homology structure modeling
The sequence of LmGSTD1 as a target sequence was submitted to BLASTP program (http://blast.ncbi.nlm.nih.gov/). Then the program searched against Protein Database (PDB) of known protein structures, and then the X-ray structure of Delta GST (3EIN) from Drosophila melanogaster were selected as template structure. Finally, the homology model of LmGSTD1 was generated by MODELLER 9v12 through comparative protein structure prediction [22].

Positive selection site analysis
To investigate the selective pressure among phylogenies, we used the branch model of the PAML package [23]. The simplest model (one-ratio) assumes a single v for the entire tree, whereas the most general model (free-ratio) assumes a v for each branch. To determine whether positive selection had acted on specific amino acid sites of the GSTs, we explored 5 models of CODEML [24]-the oneratio model (M0), nearly neutral model (M1a), positive-selection model (M2a), beta model (M7), and beta and v model (M8). We conducted two likelihood ratio tests (LRTs), which compared M1a with M2a, and M7 with M8, to verify whether the difference ratio v differed significantly from 1 for each pairwise comparison. The LRT compared the likelihood scores of the pairwise models, with the constraint of v51, and without such constraint: LR52 |ln12ln2|. If the LRT was significant, we used the BEB [25] method to identify positive selection sites in the alignments. Finally, we labeled the positive selection sites on the homology structure of the corresponding GSTs.

Identification of GST genes from L. migratoria
Based on the transcriptome of L. migratoria (S1 Data), we identified candidate GST gene fragments by keyword searching. We subsequently cloned 28 cytosolic GSTs and 4 microsomal GSTs from L. migratoria, including 10 known cytosolic GST genes of L. migratoria (S2 Data, Table 1), by using RACE PCR. The estimated numbers of GSTs may be lower than the actual numbers of GSTs in the locust genome, because of the limitations of transcriptome technology. However, our data are in good agreement with the number of GSTs in the genome of L. migratoria (28 cytosolic GSTs) [26], implying that almost all of the cytosolic GST genes of L. migratoria were detected based on the transcriptome, and were identified and sequenced by using RACE PCR.
With the exception of LmGSTe2 and LmGSTe3, the cytosolic GSTs of L. migratoria had an acidic theoretical pI, and a relatively high molecular mass (22.1-27.0 kDa) ( Table 1). To enhance the accuracy of our phylogenetic tree, we retrieved.130 known GST sequences of the representative insects (S3 Data). Based on sequence similarity, the results of our phylogenetic analysis identified 28 cytosolic GSTs, belonging to 6 different cytosolic classes-sigma, epsilon, delta, omega, theta, and zeta ( Fig. 1, S2 Table). It was shown that each class of GSTs from most insects clustered in a single clade with a high bootstrap value, thereby further supporting the classification of the L. migratoria GSTs. The sigma GSTs were most numerous (10 members), and were followed by the delta and epsilon GSTs (7 and 5 members, respectively), the omega and theta GSTs (3 and 2 members, respectively), and the zeta class (1 member). The former unclassified GST (LmGSTu1) was reassigned with high confidence into the delta class (LmGSTd6). Furthermore, 9 of the 10 sigma GSTs contained a class-specific Tyr9 residue in the G-site; the omega GSTs had an ancestral Cys9 residue; and the zeta, theta, and delta GSTs contained a conserved Ser in the G-site (S1 Figure).
On the other hand, the 4 microsomal GSTs had an alkaline theoretical pI and a relatively low molecular mass (14.2-17.9 kDa, Table 1). The secondary structure prediction indicated that the 4 microsomal GSTs consisted of 3 or 4 transmembrane regions (S2 Figure). To construct our phylogenetic tree, we retrieved.100 known microsomal GSTs from GenBank [15] (S4 Data). Based on high bootstrap support, we clearly identified 9 subclasses of microsomal GSTs (Fig. 2). Among these, 3 microsomal GSTs (LmGSTm1, LmGSTm2, and LmGSTm3) were classified into class 1 (insect subclass), and LmGSTm4 was assigned into class 3 (MGST3 subclass). To the best of our knowledge, our study is the first to report an insect GST (namely, LmGSTm4) as a member of the MGST3 subclass.
The results of stage-dependent expression pattern analysis revealed that the GSTs could be classified into 2 groups, including the egg stage-expression group, and the nymph and adult stages-expression group (Fig. 3B). We observed that 12 GSTs (LmGSTd1, LmGSTd3, LmGSTd5, LmGSTd7, LmGSTe2, LmGSTe3, LmGSTe4, LmGSTe5, LmGSTs3, LmGSTs8, LmGSTo1, and LmGSTz1) showed relatively high expression in the egg stage (P,0.05). Among these, 7 GSTs (LmGSTd1, LmGSTd3, LmGSTd5, LmGSTe2, LmGSTe4, LmGSTs3, and LmGSTs8) showed a high expression level in the early egg stage (Fig. 3C) (P,0.05), 1 GST (LmGSTo1) was primarily expressed in the late egg stage, and 3 GSTs (LmGSTd7, LmGSTe3, and LmGSTe5) were expressed in the early and late egg stages (P,0.05). The remaining GSTs were mainly expressed in the nymphal and adult stages. The different mRNA expression patterns within each GST class may suggest different roles of these genes in different stages and tissues.

Enzymatic properties of recombinant GSTs from L. migratoria
The locust GSTs were selected for protein expression and purification. Most of the GSTs were successfully purified by using a Ni 2+ -NTA column. The exceptions were 4 microsomal GSTs, 1 sigma GST (LmGSTs6) and 2 epsilon GSTs (LmGSTe2 and LmGSTe3). However, the reason for the no-expression of these genes is unclear.
In order to explain the possible relationships between the catalytic activity and substrate diversity, we determined the substrate specificity of the recombinant locust GSTs toward 6 electrophilic substrates, including CDNB, FDNB, DCNB, NDB-Cl, pNPC, and pNPB ( Table 2). We observed that all the 25 purified cytosolic GSTs showed activity toward CDNB, FDNB, and NDB-Cl; 19 of the purified cytosolic GSTs showed activity towards DCNB (the exceptions being the 3 omega and 1 zeta GSTs, and 2 of the sigma GSTs (LmGSTS10 and LmGSTS2); 8 GSTs were able to catalyze pNPB (the exceptions being the 9 sigma, 3 omega, 4 delta and 1 zeta GSTs); and 18 GSTs were capable of catalyzing pNPC, (the exceptions being the 3 omega and 1 zeta GSTs, and 3 sigma GSTs).
We determined the kinetic constants of the 25 GSTs, by using CDNB and GSH as substrates ( Table 3). The apparent K m CDNB varied ,70-fold among the GST classes (61.5-fold among the delta class, 28.1-fold among the sigma class, 14.6-fold among the epsilon class, and 3.9-fold among the omega class). By contrast, the catalytic efficiency (k cat /K m ) of CDNB differed markedly among the GST classes  (675.4-fold among the delta class, 103.3-fold among the sigma class, 12.6-fold among the omega class, and 6.0-fold among the epsilon class). On the other hand, the apparent K m GSH varied ,50-fold among the GST classes (45.5-fold among the sigma class, 30.2-fold among the theta class, 8.7-fold among the delta class, and 1.6-fold among the epsilon class). By contrast, the catalytic efficiency (k cat /K m ) of GSH differed markedly among the GST classes (2482.4-fold among the delta class, 761.8-fold among the sigma class, 7.5-fold among the omega class, and 6.3-fold among the epsilon class).
The observed variations of enzyme catalytic properties not only reflect environmental changes in the GSTs active site, but also embody the diversity of the physiological substrates. This may partially explain why insects possess so many different GSTs. Positive selective sites of GSTs from L. migratoria In order to detect the influence of selection in the delta GSTs, we used branch method to detect selective pressure [27][28][29]. Under the one-ratio model, the loglikelihood values were lnL524573.55, with estimates of v50.10449 for the delta GSTs. Under the free-ratio model, the log-likelihood values were lnL524558.31 for the delta GSTs. The LRTs of the 2 assumptions indicated that the free-ratio model was significantly more likely (P,0.01) for the delta GSTs, implying that the selective pressure varied among branches. The estimated v values in most branches of the delta GSTs were ,1, indicating that the GST genes in the delta GST had been under purifying selection. The three tests, 0-A, 0-B, and 0-C, suggested that v A , v B and v C were significantly greater than the background ratio v 0 (Table 4). While the other three tests, A-A9, B-B9, and C-C9, suggested that v A , v B and v C were also significantly greater than one. These results suggested that the three major branches of delta class were under positive selection pressure. Our present results focused on whether the specific sites have undergone positive selection. Therefore, we chose 5 models of CODEML to detect positive selection sites. We conducted 2 LRTs, which compared M1a with M2a, and M7 with M8, to test whether each GST class had evolved under positive selection (Table 5 and S5 Data). The one-ratio model (M0) yielded estimated v values of 10.449% for the delta GSTs; this result implied that, on average, negative selection played a dominant role during evolution. We further showed that the M8 model was considerably better than the M7 model for delta GSTs, and provided strong evidence for positive selection. Under the M8 model, 0.7% of delta sites were positively selected. Further, the results of BEB analysis revealed that 4 amino acid sites of the delta GSTs were identified as positive selection sites. Among the 4 candidate sites, 2 sites (I193 and T199) were located in the C-domain of GSTs, whereas the other 2 sites (V9 and G11) were located in the N-domain of GSTs.
The results of protein 3-dimensional-structure modeling suggested that the overall structures of the delta GSTs, which are involved in positive selection pressure, were relatively conserved, especially regarding the b-sheet and a-helix (S3 Figure). Therefore, in order to clearly display detailed protein structures, we selected the homology structure of LmGSTD1 as a representative structure, and used the structure to label the positive selection sites (Fig. 4). We observed that most of the positive selection sites were situated either at the functional a-helix or at nearby essential sites. For instance, in delta GSTs, the V9 and G11 sites were close not only to the G-site, but also to the dimer interface. Further, the T199 site was close to the interface between the N-and C-domain residues.

Discussion
In the present study, we identified and sequenced cDNAs putatively encoding 28 cytosolic GSTs and 4 microsomal GSTs based on the transcriptome database. Our phylogenetic analysis revealed that the 28 cytosolic GSTs belonging to 6 different cytosolic classes (sigma, epsilon, delta, omega, theta, and zeta), and the 4 microsomal GSTs belonging to 2 classes-class 1 (insect subclass) and class 3 (MGST3 subclass). The sigma class (10 members), delta class (7 members), epsilon class (5 members), and omega class (3 members) have apparently undergone gene expansion. However, it remains unclear whether these genes play different roles in different tissues and at different developmental stages. To explore these matters, we determined the gene expression patterns at the mRNA level, and the enzymatic properties at the protein level.
We observed 12 GSTs highly expressed during the egg stage. Among the GSTs that were highly expressed in the egg stage, some showed a high expression level in the early egg stage, others were primarily expressed in the late egg stage, and others were expressed in both the early and late stages (Fig. 3C). Because egg is a non-mobile and non-feeding stage, expression of these genes may imply their roles in metabolizing endogenous substances during the embryonic development. Other GSTs were mainly expressed in the larval and adult stages, and are therefore more likely responsible for detoxification in the larval and adult stages. Tissue-specific expressed GSTs may play a vital part in the respective organs or tissues through their detoxification activity. For instance, the pi, rho, and omega GSTs of coho salmon (Oncorhynchus kisutch) are located in the olfactory rosettes and are critical to maintaining olfaction [30]. GSTM1 is situated in the mouse aorta, and regulates the proliferation and migration of vascular smooth muscle cells [31]. The expression level of omega GSTs in the human brain is associated with risk of Alzheimer's and Parkinson's disease [32]. The existence of differential gene expression profiles of locust GSTs suggests that their detoxification abilities are restricted within corresponding tissues or organs, according to physiological demands or regulation mechanisms in vivo. In the present study, we observed differences in the substrate specificities and steady kinetic constants of locust GSTs. The architecture of the functional region may have been altered by amino acid substitutions. The G-site residues of the N-terminal domain are known to be relatively conserved; on the other hand, the H-site residues of the C-terminal domain are relatively flexible, thereby contributing to adaptation to a wider spectrum of toxic reagents. Classic studies in the literature have paid more attention to the obvious functional changes caused by amino acid replacements in critical regions [33]. A single amino acid replacement located directly in the active site can markedly alter the substrate specificity of N-acetyl-L-ornithine transcarbamylase and N-succinyl-L-ornithine transcarbamylase [34]. Similarly, 6 substitutions located directly in the adenosine-binding pocket of E. coli isocitrate dehydrogenase systematically inverted coenzyme specificity [35]. The results of recent studies have shown that mutations located close to the active sites and other functional regions can alter the substrate preference or catalytic activity. In pine, tau GSTs contains positively selected residues located adjacent to the G-site or distant from the active center, and are related to the enzymatic activities [33]. Mutations of superoxide dismutase also indicated that residues located close to the active site can markedly alter the catalytic efficiency [36]. These observations imply that the positively selected residues of the delta GSTs, which are located close to essential catalytic residues, represent a mechanism for substrate diversity of the GSTs family [33]. The active sites of GSTs require specific amino acid residues and accurate orientation in protein structures, thereby resulting in greater functional and structural constraints. On the other hand, the protein surface region is generally relatively flexible. This may explain why the 4 positive selection sites of the delta class were spread on protein surface.