Integrated Transcriptomic and Proteomic Analyses Suggest the Participation of Endogenous Protease Inhibitors in the Regulation of Protease Gene Expression in Helicoverpa armigera *

Insects adapt to plant protease inhibitors (PIs) present in their diet by differentially regulating multiple digestive proteases. However, mechanisms regulating protease gene expression in insects are largely enigmatic. Ingestion of multi-domain recombinant Capsicum annuum protease inhibitor-7 (CanPI-7) arrests growth and development of Helicoverpa armigera (Lepidoptera: Noctuidae). Using de novo RNA sequencing and proteomic analysis, we examined the response of H. armigera larvae fed on recombinant CanPI-7 at different time intervals. Here, we present evidence supporting a dynamic transition in H. armigera protease expression on CanPI-7 feeding with general down-regulation of protease genes at early time points (0.5 to 6 h) and significant up-regulation of specific trypsin, chymotrypsin and aminopeptidase genes at later time points (12 to 48 h). Further, coexpression of H. armigera endogenous PIs with several digestive protease genes were apparent. In addition to the differential expression of endogenous H. armigera PIs, we also observed a distinct novel isoform of endogenous PI in CanPI-7 fed H. armigera larvae. Based on present and earlier studies, we propose potential mechanism of protease regulation in H. armigera and subsequent adaptation strategy to cope with anti-nutritional components of plants.


Insects adapt to plant protease inhibitors (PIs) present in their diet by differentially regulating multiple digestive proteases. However, mechanisms regulating protease gene expression in insects are largely enigmatic. Ingestion of multi-domain recombinant Capsicum annuum protease inhibitor-7 (CanPI-7) arrests growth and development of Helicoverpa armigera (Lepidoptera: Noctuidae).
Using de novo RNA sequencing and proteomic analysis, we examined the response of H. armigera larvae fed on recombinant CanPI-7 at different time intervals. Here, we present evidence supporting a dynamic transition in H. armigera protease expression on CanPI-7 feeding with general down-regulation of protease genes at early time points (0.5 to 6 h) and significant up-regulation of specific trypsin, chymotrypsin and aminopeptidase genes at later time points (12 to 48 h). Further, coexpression of H. armigera endogenous PIs with several digestive protease genes were apparent. In addition to the differential expression of endogenous H. armigera PIs, we also observed a distinct novel isoform of endogenous PI in CanPI-7 fed H. armigera larvae. Based on present and earlier studies, we propose potential mechanism of protease regulation in H. armigera and subsequent adaptation strategy to cope with anti-nutritional components of plants. Molecular  Adaptations of insects to a variety of plant compounds are partly attributed to their ability to breakdown or eliminate different phytotoxins (1). Although plants produce various antifeedants and insecticidal molecules to deter insects from feeding on them, insects are known to overcome these using various specialized mechanisms (1,2). For instance, plant defensive protease inhibitors (PIs) 1 interfere with the insect protein digestion whereas insects counteract by producing battery of digestive proteases with wide specificity. Gut proteolytic activity, especially in the lepidopteran insects is comprised of a variety of proteases with broad activity and diverse specificity. Insect proteases are responsive to the type and content of the diet with respect to their expression and activities. Resistance to plant PIs has been noted in several insects feeding on a variety of inhibitors (3)(4)(5)(6)(7). These studies provide clues to presume remarkable diversity and plasticity in the insect digestive processes as counter-defense mechanisms in insects. In general, three adaptive mechanisms, to counter ingestion of plant PIs have been prominently observed in insects. These include, (1) overproduction of proteases so that the concentration of PIs would be insufficient to inhibit proteolytic activity (5,6), (2) incorporation of change in the PI binding site of the protease to make it PI insensitive (4,6), and (3) expression of proteases that can recognize the cleavage site in the PIs and degrade them (3). Insects also overexpress protease from one class to compensate with the inhibition of another class (8).
Many insects use combination of multiple strategies to avoid the antinutritional effects of plant PIs. However, the mechanism or factors by which the insects recognize the presence of PI and adjust their gene expression accordingly, remain largely unknown in phytophagous insects. Hyperproduction of proteases to compensate the inhibition of activity was attributed to feedback mechanisms in response to dietary PIs. Blood feeding insects including mosquito and blackfly showed induction of trypsin-encoding genes after a blood meal (9,10). Promoters of inhibitor-induced trypsin and chymotrypsin genes from larval midguts of Helicoverpa zea and Agrotis ipsilon showed presence of different regulatory motifs suggesting diverse regulatory mechanism of protease expression (11). Analysis of CmCatB, a cathepsin B gene from the cowpea bruchid (Callosobruchus maculatus) larval midgut, revealed intricate transcriptional regulation underlying the adaptive measures (12). Later CmCatB was found to play a role in cowpea bruchid adaptation by rendering it less susceptible to soybean cysteine PI inhibition (12,13). It is yet to be determined how insects sense the presence of dietary PIs and how the signal of amino acid deficiency is transmitted, leading to subsequent activation of counter defense-related genes. Investigations in mammals indicate that cholecystokinin (CCK) is the most significant regulator of the physiological pathways related to the secretion of intestinal digestive enzymes (14). Two other peptides, a monitor peptide and the CCK-releasing factor (CCK-RF) stimulate the release of CCK by interacting with cell surface receptors in the intestine, which later triggers the secretion of digestive enzymes into the intestine. CCK-RF-like or monitor-peptide-like factors have not yet been identified in phytophagous insects, but similar mechanisms may be responsible for regulation of digestive enzymes in the midgut (6). For instance, a hemolymph circulating decapeptide trypsin-modulating oostatic factor (TMOF) has been found to downregulate the synthesis of trypsin in the larval gut of Helicoverpa virescens (15).
Altogether, various studies suggest presence of several mechanisms for regulation of proteases both at transcriptional and post-translational level. However, understanding the post-translational regulation of protease activity by endogenous PIs has received less attention in insects and remains a major challenge despite biochemical and comparative genomic data on insect endogenous PIs. Usually, the insect endogenous PIs with varied proteinase inhibitory activities seem to regulate a range of physiological responses like their mammalian counterparts. In addition to inhibitors of digestive enzymes and pathogen-encoded virulence factors, endogenous protease inhibitors in Drosophila melanogaster are shown to be involved in many aspects of the innate immune response (16). Therefore, in the present study we sought to determine the potential role of endogenous PIs in the regulation of protease gene expression against plant PIs in H. armigera. First, we comprehensively examined transcriptional and proteomic responses of H. armigera to ingested multi-domain recombinant Capsicum annuum protease inhib-itor (CanPI-7). High throughput transcriptomic and proteomic analysis suggested that CanPI-7 ingestion influences general metabolism of the insect as well as proteolytic enzymes and endogenous protease inhibitors. Further, functional assays have been performed to evaluate qualitative and quantitative changes in proteases and PI activities in the larval digestive track and hemolymph of H. armigera feeding on CanPI-7. Finally, the probable role of insect endogenous protease inhibitors in regulation of gut proteases after ingestion of plant protease inhibitor has been discussed.

EXPERIMENTAL PROCEDURES
Insect Culture and Feeding Assays-H. armigera larvae were procured from Division of Insect Ecology, Indian Council of Agriculture Research-National Bureau of Agricultural Insect Resources, Bangalore, Karnataka, India. Feeding experiments were carried out with neonates of H. armigera maintained in the laboratory at optimum growth conditions (27 Ϯ 2°C, 60 Ϯ 5% relative humidity and a photoperiod of 14 h light and 10 h dark). Artificial diet (AD) was prepared based on our earlier report (17). The major ingredients of the diets were chickpea flour, sorbic acid, ascorbic acid, methyl P-hydroxy benzoate, and vitamin and mineral mix. 150 g of recombinant Capsicum annum protease inhibitor (CanPI-7) was incorporated per gram of AD for the feeding bioassays.
Experimental Design and Statistical Rationale-In present study, first instar larvae were fed on AD or CanPI-7 incorporated AD for 48 h. Each larva was maintained in an individual vial. A set of 200 larvae (100 larvae each on AD and PI incorporated AD diet) was maintained in the laboratory at optimum growth conditions (27 Ϯ 2°C, 60 Ϯ 5% relative humidity and a photoperiod of 14 h light and 10 h dark) and whole larvae were harvested at various time intervals (0.5, 2, 6, 12, 24, and 48 h). At each stage of bioassay, the harvested insect tissues were snap frozen in liquid nitrogen and stored at Ϫ80°C until further use. These tissues were used for transcriptome sequencing. For proteomics, samples obtained from different time points on CanPI-7 exposure were pooled into three stages i.e. early (0.5, 2, and 6 h), mid (12 and 24 h) and late (48 h) stages. Each sample was acquired in AB-Sciex 5600 Triple TOF mass-spectrometer using two biological replicates and two technical replicates in DDA mode for spectral library preparation. SWATH-MS was carried out using three replicates to minimize retention time variation and for better statistical multivariate data analysis.
RNA Sequencing-H. armigera larval samples (in triplicate) from each time point of feeding assay were processed for RNA library preparation. Total RNA was isolated from the whole-body homogenates of insect tissues using Trizol reagent (Invitrogen, Carlsbad, CA) based on the manufacturer's protocol. RNA was quantified and checked for purity and integrity using agarose gel electrophoresis, Nanodrop (Thermo Scientific, Waltham, MA) and the Agilent 2100 Bioanalyzer. Total RNA (1 g) passing all the quality check was used to isolate poly(A)-Tailed mRNA using poly T oligo beads. The purified mRNA was fragmented in the range of 100 to 140 bases (the optimum at around 120 bases) and cDNA was synthesized using TruSeq RNA sample preparation kit v2 (Illumina, San Diego, CA) according to manufacturer's protocol. The end repair, A-tailing and adapter ligation was performed as per the manufacturer's instructions. The libraries were then subjected to PCR enrichment (15 cycles) and again validated using Bioanalyzer. Libraries were then sequenced (in triplicate) in a paired-end 100 base run using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) for cluster generation on C-Bot and TruSeq SBS Kit v3-HS for sequencing on the Illumina HiSeq1000 platform according to manufacturer recommended protocols. RNA sequencing was per-formed using the HiSeq1000 sequencing system from Illumina at Centre for Cellular and Molecular Platforms (C-CAMP), Bengaluru, Karnataka, India. Raw sequencing data were processed using CASAVA software from Illumina to generate files in FASTQ format along with a QC report. The obtained sequence tags from the Illumina sequencing were subjected to primary analysis in which low-quality tags and adaptor contaminants were discarded.
Sequence Analysis, Functional Annotation, De Novo Assembly of Transcripts and Gene Expression Quantification-Adapters were trimmed using Cut adapt v1.2.1 and the read quality was assessed using FastQC v0.10.1 program (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The low-quality reads were removed and reads with Phred Score Ն30 were processed for further analysis. The two FASTQ files (R1 and R2) generated for each sequenced sample were merged and subjected to assembly using Trinity software (http:// trinityrnaseq.sourceforge.net/) with default k-mer size of 25 bp (18). The non-redundant full-length transcriptome was then generated by clustering the assembled transcripts with sequence length longer than 200 bp at Ͼ90% identity using uclustv1.2.22q clustering tool (19) and by extracting the centroid sequence of each cluster. The nonredundant transcriptome was validated by mapping back the good quality reads from all the 12 libraries separately using Bowtie2 v2.2.1 program (20) with default parameters (http://bowtie-bio.sourceforge. net/index.shtml). Transcripts with 70% or more mapping coverage and minimum alignment of 5 reads from either of the libraries (control and treated libraries of a particular time point) were termed as true transcripts. A set of non-redundant transcripts (unigenes) from all the libraries was assembled in final assembly. The assembly was then annotated by homology search against National Center for Biotechnology Information Non-Redundant (NCBI-NR) protein database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/) using standalone blast package v2.2.29ϩϪ1.x86_64.rpm (ftp://ftp.ncbi.nih.gov/blast/executables/ LATEST/). BLAST hits with e-value Յ 0.001 and query coverage above 50% were considered as best hits. Protein identifiers were then mapped to these hits using Uniprot id mapping tool (http://www. uniprot.org/mapping/). Similarly information such as Entrez gene identifiers, species information, gene ontology (GO) and pathway annotation for these hits were extracted using either Uniprot KB (http://www. uniprot.org/), DAVID Bioinformatics (http://david.abcc.ncifcrf.gov/) or Flink at NCBI (http://www.ncbi.nlm.nih.gov/Structure/flink/flink.cgi). We also used an online annotation server, Fast Annotator (http://fastannotator. cgu.edu.tw), to annotate these transcripts particularly with enzyme EC numbers and Pfam domain information. Expression levels of true transcripts in the individual libraries (all the controls and treated samples at different time points) were assessed by mapping good quality reads using BOWTIE2 (20). Mapped reads were further normalized using DESeq method (21). Transcripts with fold change Ն1.5 over control and p value Յ0.05 were considered as differentially expressed. The Padj value provided by DESeq was used to determine the False Discovery Rate (FDR) for statistical analysis of significant mRNA changes.

Proteomics Experimental Design and Statistical Rationale-Protein Extraction and In-solution Trypsin
Digestion-A high throughput label free quantitative proteomic analysis was conducted to understand digestive physiology of H. armigera larvae when fed on CanPI-7 diet. For proteomic analysis, samples obtained from different time points on CanPI-7 exposure (each set containing 100 insects) were pooled into three stages i.e. early (0.5, 2, and 6 h), mid (12 and 24 h) and late (48 h) stages. Proteins from whole insects were extracted according to the method of Schuster and Davies, (22) with few modifications. Whole insect tissues were finely ground using mortar and pestle in liquid nitrogen. Tissue (100 mg) was homogenized in 1 ml of extraction buffer (0.7 M sucrose, 0.5 M Tris-HCl, 50 mM EDTA, 0.1 M KCl, 2% [v/v] beta mercaptoethanol and 5% insoluble polyvi-nylpolypyrrolidone), vortexed thoroughly and centrifuged at 13,000 ϫ g for 20 min at 4°C. The supernatant was transferred to a fresh tube and equal volume of water-saturated phenol was added. The mixture was vortexed thoroughly and centrifuged at 13,000 ϫ g for 30 min at 4°C. The phenol phase was precipitated overnight with 5 volumes of 0.1 M ammonium acetate in methanol at 20°C. The pellet was washed twice with 0.1 M ammonium acetate in methanol and once with 100% acetone, air dried and resuspended in lysis buffer (8 M urea, 2 M thio urea and 50 mM DTT). The suspension was centrifuged and clear solution containing total proteins was used for tryptic digestion. Protein concentration was determined by Bradford's method (23). Total 100 g of proteins were reduced by 100 mM DTT for 15 min at 60°C, alkylated with 200 mM iodoacetamide for 30 min in dark and kept overnight at 37°C for tryptic digestion using Promega sequencing grade trypsin (Promega, Madison, WI). The digestion reaction was stopped after 16 h by adding concentrated formic acid and further incubating for 10 min at 37°C before brief vortex and centrifuge. The peptides were desalted by using Zip-tip C18 ((Millipore, Billerica, MA)), concentrated by vacuum centrifuge and stored at Ϫ80°C until further use.
Instrumentation Method-Samples were acquired in TripleTOF 5600 (AB Sciex) using instrumental methods, Parameters, acquisition methods, peptide spectral library, SWATH MS and Data processing described by Korwar et al. (24) with few modifications. In brief, Peptide digest (3 g) were separated by Eksigent C18-reverse phase column (100*0.3 mm, 3 m, 120Å) using Eksigent MicroLC 200 system (Eksigent, Dublin, CA). The sample was loaded onto the column with 97% of mobile phase A (100% water, 0.1% FA) and 3% of mobile phase B (100% ACN, 0.1% FA) at 8 l/min flow rate. Peptides were eluted with a 120 min linear gradient of 3 to 50% mobile phase B with the flow rate of 8 l/min. The column temperature was set to 40°C and auto sampler at 4°C. The same chromatographic conditions were used for both DDA and SWATH acquisition.
Shot Gun Method (DDA)-All samples were analyzed on AB-Sciex 5600 Triple TOF mass-spectrometer in positive and high-sensitivity mode. The dual source parameters were optimized for better results: ion source gases GS1, GS2, curtain gas at 25 psi. Temperature 200°C and ion spray voltage floating (ISVF) at 5500 V. The DDA acquisition consist of full scan (MS) and information dependent MS/ MS. The accumulation time in full scan was 250 ms for a mass range of 350 -1800 m/z. The parent ions are selected based on the following criteria: ions in the MS scan with intensities more than 120 counts per second (CPS), charge stage between ϩ2 to ϩ5, mass tolerance 50 mDa and once a precursor ion was fragmented by MS/MS its mass and the mass of its isotopes were excluded for a period of 15 s. Ions were fragmented in the collision cell using rolling collision energy with an additional CE spread of Ϯ 15 eV.
SWATH MS (DIA)-In SWATH-MS mode, the instrument was specifically tuned to optimize the quadrupole settings for the selection of precursor ion selection windows 25 m/z wide. Using an isolation width of 26 m/z (containing 1 m/z for the window overlap), a set of 34 overlapping windows was constructed covering the precursor mass range of 400 -1250 m/z. SWATH MS/MS spectra were collected from 100 to 2000 m/z. Ions were fragmented in the collision cell using rolling collision energy with an additional CE spread of Ϯ 15 eV. An accumulation time (dwell time) of 96 ms was used for all fragment-ion scans in high-sensitivity mode, and for each SWATH-MS cycle a survey scan in high-resolution mode was acquired for 100 ms resulting in a duty cycle of 3.33 s. The source parameters are like that of DDA acquisition.
Peptide Spectral Library-Samples from early stage, mid stage and late stage were acquired in 2 biological replicates and 2 technical replicates in DDA method and 3 technical replicate in SWATH MS. All DDA mass spectrometric files were searched using ProteinPilot soft-ware (version 5.0.1, AB Sciex) with the Paragon algorithm against the H. armigera six frame translated transcriptome (number of entries in database ϭ 213314). The search parameters were as follows: sample type: identification; cys alkylation: iodoacetamide; digestion: Trypsin (specificity -C terminus of Lys and Arg); instrument: Triple TOF 5600; special factors: None; and False Discovery Rate (FDR): 5%. FDR was calculated using ProteomicS Performance Evaluation Pipeline Software (PSPES) installed within ProteinPilot software. The Protein Pilot output file was used as a standard spectral library. For each stage, a combined spectral library was prepared containing biological and technical replicates of both control and CanPI-7 fed samples.
SWATH Data Process-All samples were acquired in 3 technical triplicates using the SWATH-MS, data independent acquisition method. The standard spectral library was loaded into the Peak view software (version 1.2.03, AB Sciex). The spectral alignment and targeted data extraction of SWATH-MS data was performed using Peak view software with the following parameters: Number of peptides per protein 6, number of transitions per peptide 10, Peptide confidence 95%, FDR threshold 1%, XIC extraction window 3 min, XIC width 30 ppm and shared peptides were excluded. All the data independent acquisition files were loaded and exported in to MarkerView software (version 1.2.1.1 AB Sciex), where they were used for the further quantitative and statistical analyzed. Normalization was performed using total area sum. The statistics were performed using t test and the significant p value was considered Ͻ0.05. Proteins with minimum two peptides and coverage Ͼ30% were used for further study.
Statistical Analysis-Multivariate data analysis was done using SIMCA-P software (version 13.0, Umetrics). Principle Component Analysis (PCA) was employed on the whole data set to check overall trend in data and to establish CanPI-7 induced proteomics change using Pareto scaling.
Coexpression Analysis-In order to investigate whether protease genes and endogenous PI genes were coexpressed, we constructed the weighted coexpression network using WGCNA (25) based on the expression data of test samples. 20791 genes with expression profile in all the six time points were kept for the analyses. A matrix of Pearson correlation coefficients between all pairs of genes was calculated and transformed into a matrix (an adjacency matrix) of connection strengths by raising the correlation matrix to a power of 18. The resulting adjacency matrix was then converted to a topological overlap matrix (TOM) by the TOM similarity algorithm. Genes were hierarchically clustered based on TOM similarity. The Dynamic Tree Cut algorithm was used to cut the hierarchically clustering tree, and modules were defined as branches from the tree cutting. The minimum module size was set to 30 genes and the cut height was set as 0.975. As most PI genes were included in the turquoise module and the turquoise module contained 9610 genes and 46171245 connections between genes, we took the top 80% quantile strong connection (weighted connection cutoff, 0.3; the weighted connection ranging from 0 to 0.57 with 0.16 as the mean value) to visualize it in Cytoscape3 (26) and "prefuse force directed layout" was chosen.
RNA Isolation, Preparation of cDNA and qRT-PCR-As described previously by Lomate et al. (27), total RNA was extracted from whole insect tissues of early, mid and late stage using Trizol reagent (Invitrogen) and synthesis of the first strand cDNA was carried out with High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster, CA) using oligo-dT as per manufacturer's protocol. Relative transcript abundance of protease and PI was determined by quantitative real-time PCR (qRT-PCR) using 7900HT Fast Real-Time PCR System (Applied Biosystems) and Fast start Universal SYBR Green Master (Roche Diagnostics, GmbH, Germany) with the following conditions: 95°C denaturation for 10 min, followed by 40 cycles of 95°C for 3 s, with primer annealing and extension at 60°C for 30 s. Following amplification, a melting dissociation curve was generated using a 62 to 95°C ramp with 0.4°C increment per cycle to monitor the specificity of each primer pair. ␤-actin (Accession no.: AF286059) was used as reference gene for normalization. Relative transcript abundance calculations were performed using the comparative CT (⌬CT) method described by Schmittgen et al. (28). Gene accession numbers and primer details are mentioned in supplemental Table S1. Two biological replicates each constituting three technical replicates were used for the experiment.
Activity Assay, Electrophoretic Visualization and Proteomic Identification of H. armigera Endogenous PI-Whole gut tissue from fourth instar larvae fed on AD and CanPI-7 diet was homogenized in Tris-HCl buffer (pH-7.8) in 1:1 ratio and kept on ice for 2 h. The suspension was centrifuged at 13,000 ϫ g, 4°C for 20 min and resulting supernatant was used as source of H. armigera gut proteases (HGP). Hemolymph was collected from fourth instar larvae reared on AD and CanPI-7 diet. Pooled hemolymph from 20 insects was considered as one biological replicate and three biological replicates were used for enzyme inhibition assays. 100 l of hemolymph was briefly centrifuged for removal of insoluble material and dissolved in 1 ml of Tris-HCl buffer (pH 7.8). The solution was passed through Amicon Ultra 3kDa MWCO (Millipore) and the volume was set to 100 l which was used as source of endogenous PI for enzyme assays. Inhibition of proteolytic activity of bovine trypsin was estimated using chromogenic substrate benzoyl-L-arginyl p-nitroanilide (BApNA). Bovine trypsin and HGP were titrated against hemolymph PIs and IC50 was calculated. The gel X-ray film contact print method (29) was also used to visualize inhibitory activity of hemolymph protease inhibitor. Total 0.25 TIU of total hemolymph protein was resolved on 12% native PAGE, incubated with Tris-HCL buffer for 10 min followed by 0.04% trypsin for 10 min and then washed with the buffer. A strip of the gel was processed to visualize PI bands and corresponding horizontal strip of the gel was further processed for proteomic analysis The HaPI-2 band was excised and washed with 50% acetonitrile (ACN) in 50 mM ammonium bicarbonate followed by dehydration with 100% ACN. Gel pieces were reduced with 10 mM DTT for 45 min at 56°C and thiol groups were subsequently alkylated using 55 mM iodoacetamide for 30 min in dark at room temperature. The gel pieces were then dehydrated to remove excess iodoacetamide and digested with 12.5 ng/l trypsin (Promega, Madison, WI) at 37°C overnight. The resulting peptides were extracted using 50% ACN in 50 mM ammonium bicarbonate acidified with 0.1% formic acid. The extracted peptides were desalted by using Zip-tip C18 (Millipore, Billerica, MA), vacuum dried and stored at Ϫ80°C until further use. The samples were acquired in Triple TOF 5600 (AB Sciex) in duplicates using above mentioned column. The sample was loaded onto the column with 97% of mobile phase A (100% water, 0.1% FA) and 3% of mobile phase B (100% ACN, 0.1% FA). Peptides were eluted with 60 min gradient of 3% to 85% mobile phase B, keeping the column temperature 40°C and auto sampler temperature 4°C. Flow rate for sample loading and elution was kept 8 l/min. For acquisition, DDA method (MS and MS/MS) was kept same as mentioned earlier. Raw files of both replicates were searched using Protein Pilot software (version 5.0.1, AB Sciex) keeping the same parameters and database used earlier and proteins identified with Ͻ5% FDR, coverage Ͼ30 and Ͼ2 peptides were searched for protease inhibitors.

RESULTS
High throughput RNA sequencing and protein identification data reveals H. armigera transcriptome and proteomic details-We generated twelve cDNA libraries comprising samples of six time points from a time course experiment with H. armigera larvae on exposure to recombinant CanPI-7 through AD. The detailed statistics of sequencing, transcript assem-blies and quality scores are described in supplemental Tables S2, S3, and S4. We obtained 83228 unique transcripts with the length of Ͼ200 bases from 12 libraries. Details of length distribution, sequence similarity, and Gene Ontology (GO) analysis are provided in supplemental Fig. S1. In proteomic analysis, we identified total of 631,609 and 614 proteins in early (0.5, 2, and 6 h), mid (12 and 24 h) and late (48 h), time points, respectively with 1% FDR, minimum 2 peptides and minimum coverage of 30% (supplemental Table S5). Among these, 56.5% proteins were common to all the stages, early and mid-stage shared 7.5% proteins, mid and late stage shared 6.6% proteins whereas late and early stage shared 4.6% proteins (supplemental Fig. S2).

CanPI-7 Ingestion Significantly Influenced Transcript Levels and Protein Expression in H. armigera-Number of transcripts
was increased with feeding stage after CanPI-7 ingestion except at 6 h. Rapid increase in transcript number at 2 h of CanPI-7 indicate the sudden metabolic response of H. armigera to plant PI (Fig. 1A). Total 7,426 transcripts were differentially expressed in H. armigera larvae from 0.5 to 48 after CanPI-7 ingestion, indicating the influence of CanPI on overall gene expression in H. armigera. We also specifically identified transcripts involved in regular metabolism in AD-fed larvae. These transcripts might be suppressed in CanPI-7 fed larvae. Detailed statistics of H. armigera transcript is provided in Fig.  1A, which includes number of transcripts i) common in CanPI-7 and AD fed larvae, ii) specific to CanPI-7 or AD fed larvae, and iii) differentially expressed on CanPI-7 feeding. Principle component analysis (PCA) of identified proteins showed clear distinction illustrating significant change in the proteome of insects fed on AD and CanPI-7 diet (Fig. 1B). Out of the total identified proteins, 71 proteins showed significant differential expression (p Ͻ 0.05 and fold change Ͼ1.5) in the early stage, 276 proteins in the mid stage and 220 proteins in the late stage according to the ratio of normalized intensities of proteins from the CanPI-7 and AD fed insects. Unique 22, 174 and 116 proteins were differentially expressed in the early, mid and late stage, respectively (Fig. 1C). Maximum number of proteins were differentially expressed in the mid stage (including proteases and protease inhibitors) followed by the late stage.
H. armigera Proteases Show Remarkable Expression Plasticity on CanPI-7 Exposure-We examined the transcript and protein expression patterns of proteases across all the stages of CanPI-7 feeding. Among these, the major protease types include trypsins, chymotrypsins, serine proteases, aminopeptidases and carboxypeptidases ( Fig. 2A). Protease transcript abundance pattern was consistent with their expression at protein level (Fig. 2B). A total of 53 proteases including 12 trypsins, 7 chymotrypsins, 9 serine proteases, 8 aminopepti- dases, and 6 carboxypeptidases were differentially expressed at transcript level in CanPI-7 fed larvae. Expression of 4 trypsins, 7 chymotrypsins, and 1 serine proteases increased significantly both at transcript and protein level from 6 through 48 h with their maximum expression at 48 h after ingestion of CanPI-7 ( Fig. 2A). Besides serine proteases, 6 aminopeptidases and 2 carboxypeptidases showed increased expression at 24 and 48 h after CanPI-7 ingestion ( Fig. 2A). Similarly, trypsin and chymotrypsin proteins were significantly expressed at late stage after feeding. Consistent with the transcriptomic analysis, trypsin, chymotrypsin and aminopeptidase were the major differentially expressed proteases identified in H. armigera in the proteomic analysis. These differentially expressed proteases were further analyzed for the presence of signal sequences in their transcript sequence. Based on gene ontology analysis, we found that among 48 differentially expressed proteases, 37 (77%) possess signal sequence in the transcript sequence (supplemental Table S6), which signifies that these are mostly secretary digestive proteases. Thus, it might be possible that proteases with signal sequence (excluding those named as hemolymph proteases) perhaps secret in the gut lumen for digestion because H. armigera contains a bank of proteases secreted in the gut lumen from the cells of microvilli. The overall expression data was validated by qRT-PCR. Expression of total 17 proteases was validated and the results were consistent with the expression pattern observed in the transcriptomic and the proteomic analysis (Fig. 3). In qRT-PCR analysis selected trypsins, chymotrypsins, other serine proteases and aminopeptidases showed significantly high expression at the late stage after CanPI-7 ingestion whereas two carboxypeptidases showed higher expression at the early stage (Fig. 3).
Endogenous PIs of H. armigera Differentially Expressed on CanPI-7 Feeding-During our analysis, we also found a high expression of endogenous H. armigera protease inhibitors at transcript and protein level in CanPI-7 fed larvae as compared with the AD fed larvae (Fig. 4A, 4B). Moreover, the expression of most of these endogenous PIs was significantly higher from 6 to 48 h (mid and late stage) after CanPI-7 ingestion. The expression of chymotrypsin inhibitor CI-8A, two serine protease inhibitors, and a serpin 2 and inter-alpha-trypsin inhibitor was significantly higher from 6 to 48 h after CanPI-7 ingestion (Fig. 4A). However, some endogenous H. armigera PIs such as serine protease inhibitor 34, serpin and serpin-1 variant j, and inducible metalloproteinase inhibitor were found to be constitutively expressing in high amount after CanPI-7 feeding (Fig. 4A). The expression levels of H. armigera endogenous PIs was also validated by qRT-PCR analysis (Fig. 4C) and the results were consistent with transcriptomic and proteomic data (Fig. 4).

FIG. 3. Validation of relative abundance of selected differentially expressed H. armigera protease transcripts by qRT-PCR.
Two biological replicates with three technical replicates each were performed. Y-axis in the histograms represents fold change in transcript abundance and error bars represent the standard error of the mean. Statistical differences in the expression of each gene between AD and CanPI-7 fed larvae indicated with * representing significant differences (p Յ 0.05; Student's t test). Seventeen selected protease transcripts are shown comprising certain genes expressing "very highly," "highly," "moderately," and "low." NS: Not significant.

H. armigera Endogenous PI Genes Coexpressed with Protease Genes and Inhibit H. armigera Gut Protease Activity-To
assess the interactions of endogenous PIs with gut proteases, we undertook in silico coexpression analysis and biochemical activity assays. Coexpression analysis revealed 12 modules of highly correlated genes. Turquoise is the biggest module, containing around 9610 genes. Most of the PI genes (7 out of 9) were present in the turquoise module. Further, H. armigera endogenous PI genes expression in turn correlated with expression status of many other genes, thus acting as hub genes (genes with higher degree of connectivity) in the module (supplemental Fig. S3). Similarly, we observed significant correlation between endogenous PIs and multiple proteases in H. armigera (Fig. 5A). The details of coexpressing protease and endogenous PI candidates and their interactions have been provided in supplemental Table S7 and S8. The correlated expression status of multiple H. armigera endogenous PIs and proteases implies a cohesive interplay between them for regulation of overall proteases and proteolytic environment of insect gut. Several of these proteases have varied functional specificities namely, trypsins, chymotrypsins, aminopeptidases, carboxypeptidases and cathepsin which are coexpressed with the H. armigera endogenous PI genes (Fig.  5A). Interestingly, most of these protease genes showed increased expression in the middle and/or the late stages. To test the potency of H. armigera endogenous PIs, we per- FIG. 4. Heat maps showing differential expression of H. armigera endogenous protease inhibitors after feeding on CanPI-7. A, Relative abundance of endogenous PI transcripts in H. armigera larvae fed on CanPI-7 with respect to control. The relative expression was calculated on the basis of read count of protease transcript sequences and heat maps were generated. B, Heat map showing differential expression of endogenous PI proteins in CanPI-7 fed H. armigera larvae with respect to the larvae fed on AD. Data is consistent with the results obtained for relative abundance of protease transcripts. C, Validation of relative abundance of selected endogenous PI transcripts by qRT-PCR. Y-axis, fold change in transcript abundance.
formed activity visualization and protease inhibition assays with insect hemolymph PIs. The hemolymph from CanPI-7 fed larvae showed presence of an additional inhibitory band (isoform) as compared with that from AD fed larvae (Fig. 5B). Endogenous H. armigera PIs present in the larval hemolymph were further analyzed for their inhibitory potential toward bovine trypsin and HGP. The IC50 for bovine trypsin was 2.3-fold lower in case of hemolymph from CanPI-7 fed larvae than that from AD fed larvae (supplemental Fig. S4A, S4B). Hemolymph PIs from AD fed larvae showed 1.3-fold higher IC 50 for the HGP from CanPI-7 diet fed larvae than that from AD fed larvae. However, the hemolymph PIs from larvae fed on CanPI-7 diet showed comparable IC 50 toward either type of HGP (supplemental Fig. S4C, S4D). Overall, after estimating Coexpression network analysis was carried out using relative transcript abundance data of protease and endogenous PI transcripts. Endogenous PIs were used as bait and they are showing coexpression with multiple proteases. Inhibitory potency of hemolymph endogenous PIs from AD and CanPI-7 fed insects toward bovine trypsin and H. armigera gut proteases (HGP). B, Visualization of endogenous PI activity in the hemolymph of AD and CanPI-7 fed H. armigera larvae. Hemolymph was separated on polyacrylamide gel and PI activity bands were visualized on X-ray film. Results show clear distinction in PI activity profile from hemolymph of AD and CanPI-7 fed larvae with additional PI isoform present in CanPI-7 fed larvae. C, Inhibitory activity assay of Endogenous PIs from AD and CanPI-7 fed larval hemolymph against bovine trypsin and HGP. The IC 50 values of AD and CanPI-7 fed larval hemolymph for the inhibition of bovine trypsin and HGP vary significantly (p Յ 0.05; Student's t test). the IC 50 results it was evident that the hemolymph PIs from CanPI-7 fed larvae had higher inhibitory potential toward bovine trypsin as well as the HGP as compared with the hemolymph PIs from AD fed larvae (Fig. 5C). The higher inhibitory potential of the hemolymph from CanPI-7 fed larvae can possibly be attributed to the novel isoform observed in activity visualization. To characterize this novel PI isoform proteomic analysis was carried out. Two proteins were identified with coverage Ͼ 30% and number of peptides Ͼ2 (supplemental Table S9). The novel H. armigera endogenous PI isoform (HaPI-2) was identified as chymotrypsin inhibitor CI-8A (accession number AAK52495).

DISCUSSION
The efficacy of plant defensive PI has been hampered by the rapid adaptation of insect pests to the inhibitors. The close evolutionary association between phytophagous insects and their host plants has led to sophisticated physiological responses to dietary PIs in insects (1,7,30). In-depth understanding of insect adaptation to plant defense is necessary to use naturally occurring candidate defense genes such as PIs for insect pest management. We earlier demonstrated that C. annuum four domain PI (CanPI-7) show strong antimetabolic activity toward H. armigera on ingestion through artificial diet (18). Moreover, in a series of biochemical and molecular studies, we previously revealed that H. armigera tend to adapt with the effect of plant PIs by regulating protease gene expression temporally and spatially (18,(31)(32)(33)(34)(35). Present study reveals the plasticity in the regulation of H. armigera digestive proteases not only for digestion but also in weakening to the challenge posed by CanPI-7. Our study provides first comprehensive analysis on transcriptional and proteomic response of H. armigera larvae feeding on CanPI-7, enlisting up and down regulated candidate proteases and endogenous PIs in response to CanPI-7. Further, data suggests the possible role of endogenous PIs present in hemolymph in the regulation of protease gene expression in H. armigera.
CanPI-7 ingestion resulted into dramatic changes in overall protease complement of H. armigera over the period of time. The physiological response of H. armigera to CanPI-7 was prominent with differential expression of a variety of proteases. Our results indicate that H. armigera compensates for the inhibition of initial protease activity by an induced late protease activity, which likely to be taking care of inhibition of early protease by CanPI-7. This is caused by a decrease in expression of several initial trypsin genes and increase in other trypsin, chymotrypsin and aminopeptidase genes in later time points. Compensation of inhibition of endopeptidase activities by increased activities of exopeptidases was evident in H. armigera (8). Activity of "late" trypsin expressing genes is perhaps controlled by "early" expressing trypsin genes (9) and peptides produced by the action by trypsin enzymes (and other proteases). Also food proteins (PI-pro-teins) can act as signals to induce the expression of the late trypsin genes (10). Transcriptome analysis has revealed presence of numerous proteases in H. armigera and those are differentially expressed to counteract the effect of CanPI-7. Previously, a large decrease in cysteine protease activity in Tribolium castaneum was observed against a synthetic inhibitor of cathepsin L-like proteases and E-64 (36). However, the microarray analysis revealed that few cysteine protease genes are downregulated by E-64, whereas several others are up-regulated to compensate for the inhibited activity (37). Similarly, sequencing study of suppression subtractive hybridization cDNA library from the midgut of fall armyworm (Spodoptera frugiperda) larvae fed on soybean PI revealed that some constitutively expressed protease genes are overexpressed, whereas synthesizing some new proteases when inhibitor was present in the diet (38). Our results of transcriptomic and proteomic analysis in H. armigera are consistent with previous studies.
How insects regulate the expression of different proteases over the time to minimize effect of the plant PIs is largely unknown. Role of feedback mechanisms in response to dietary PIs leading to hyperproduction of proteases to compensate for the loss of activity has been suggested (16,39). Although analyzing these regulatory mechanisms in the present investigation, we found significant change in the endogenous PI complement in the larvae fed on diet supplemented with CanPI-7. Several H. armigera endogenous PIs were differentially expressed in the CanPI-7 fed larvae suggesting their potential role in protease regulation. Internal regulation of protease activity is the primary function of PI in several organisms (15,40,41). Endogenous serine PIs such as serpins have been found to be involved in the regulation of insect immunity related genes and necessary for the regulation of several serine protease cascades (41). However, our results indicate that endogenous PIs might play important role in the regulation of digestive protease genes in H. armigera in dealing with CanPI-7. Coexpression analysis highlights that endogenous PI genes coexpress with several digestive protease genes linking their involvement in activating/deactivating protease gene expression. Time course experiments indicate that the digestive protease genes in H. armigera probably express sequentially wherein the product of early protease (trypsin) genes activates subsequent protease genes and final product eventually regulates the protease expression by feedback mechanism.
In general, insect endogenous PIs have been known to circulate in hemolymph although they might be present in various body compartments including gut. For transcriptomic and proteomic analysis, we have used whole insect, therefore we could not ascertain the localization of H. armigera endogenous PIs identified. However, in biochemical analysis we have extracted hemolymph PIs from the larvae and assessed their inhibitory potency toward bovine trypsin and HGP. Hemolymph PIs from CanPI-7 fed larvae showed significantly high inhibitory potential toward HGP and bovine trypsin as compared with PIs isolated from AD fed larvae. Interestingly, we found an additional endogenous PI isoform in hemolymph of CanPI-7 fed insects. Activity of induced endogenous PIs might not only inhibit gut proteases but also activate or deactivate mechanisms leading to the regulation of protease gene expression in the gut. However, endogenous PI based mechanism of protease gene regulation in H. armigera could be highly complex. By analogy with mammalian physiology (14,42), it has been proposed that protease secretion in the lepidopteran gut is regulated by a peptide hormone system and peptides present in the diet regulate the protease gene expression in the gut (6,43). However, Bown et al. (44) proposed that it is highly unlikely that protease regulation can be mediated through peptides derived from food proteins in H. armigera. Therefore, internal factors such as monitor peptides (similar to mammalian counterparts) or endogenous PI could be potentially important candidates to regulate digestive proteases because they are already known to regulate immunity related serine protease cascades (15). Immunological studies suggest that monitor peptides are present in insect gut cells, and one type has been isolated (45). Therefore, it is possible those monitor peptides sense the presence of food in the gut and transmit a signal to a receptor in gut. Thus, binding of monitor peptide to this receptor triggers the expression of early proteases. Furthermore, initially synthesized proteases activate other proteases and the end product of this cascade eventually interferes the binding of monitor peptide to gut receptor that ultimately stops the protease expression. Similarly, when external PI is present in the diet, H. armigera induces the production of endogenous PIs and these PIs could control protease expression and activation. It could be achieved by hijacking feedback mechanism of protease expression by inhibiting the regulatory protease that cleaves the binding of endogenous PIs to gut receptor (Fig. 6). Eventually, induced expression of proteases continues, which results in protease gene expression over time after the ingestion of CanPI-7 in H. armigera. Interaction between hemolymph PIs and gut proteases is apparent from the above results, supporting their potential role in protease regulation. Above model has been proposed based on present conclusions as well as several related studies cited and discussed in present paper. However, precise molecular mechanisms involved need to be investigated further to substantiate the findings.
Knowledge of protease gene regulation in lepidopteran insects can be used to make plant PIs a more effective and alternative pest management strategy. Comprehensive transcriptomic and proteomic analysis of CanPI-7 fed H. armigera generated the list of differentially expressed genes and proteins. These findings suggest a complex architecture behind the well-ordered expression of H. armigera proteases on ingestion of plant PIs in the gut. Biosynthesis of insect pro-FIG. 6. Proposed mechanism of protease regulation by endogenous protease inhibitors in Helicoverpa armigera. During normal metabolism monitor peptides interact with the gut receptor stimulating the biosynthesis of proteases. Initially synthesized proteases activate other protease genes which aid in efficient digestion. The end product of this protease cascade disrupts the interaction between monitor peptide and gut receptor, which eventually halts the protease production. However, on ingestion of CanPI-7, endogenous PIs are induced with an unknown mechanism. These induced PIs bind the final protease of the cascade that keeps monitor peptide-gut receptor interaction active, which results into continuous production of gut protease. These proteases are most likely insensitive to inhibition by  teases and their regulation on exposure to plant PIs is a multifaceted process that remains enigmatic. For the first time differential expression of endogenous PIs in response to plant PIs was noted. Subsequent coexpression network analysis complimented with biochemical data indicated that the endogenous PIs of H. armigera might be involved in the regulation of protease expression. Over-production of proteases in insects to minimize the antinutritional effects of plant PI is attributed to the increased activity of endogenous PIs. Endogenous insect PIs likely arrest feedback mechanism and upregulate protease gene expression in H. armigera to adapt to dietary anti-nutritional constituents.