Mapping Anopheles stephensi midgut proteome using high-resolution mass spectrometry

Anopheles stephensi Liston is one of the major vectors of malaria in urban areas of India. Midgut plays a central role in the vector life cycle and transmission of malaria. Because gene expression of An. stephensi midgut has not been investigated at protein level, an unbiased mass spectrometry-based proteomic analysis of midgut tissue was carried out. Midgut tissue proteins from female An. stephensi mosquitoes were extracted using 0.5% SDS and digested with trypsin using two complementary approaches, in-gel and in-solution digestion. Fractions were analysed on high-resolution mass spectrometer, which resulted in acquisition of 494,960 MS/MS spectra. The MS/MS spectra were searched against protein database comprising of known and predicted proteins reported in An. stephensi using Sequest and Mascot software. In all, 47,438 peptides were identified corresponding to 5,709 An. stephensi proteins. The identified proteins were functionally categorized based on their cellular localization, biological processes and molecular functions using Gene Ontology (GO) annotation. Several proteins identified in this data are known to mediate the interaction of the Plasmodium with vector midgut and also regulate parasite maturation inside the vector host. This study provides information about the protein composition in midgut tissue of female An. stephensi, which would be useful in understanding vector parasite interaction at molecular level and besides being useful in devising malaria transmission blocking strategies. The data of this study is related to the research article “Integrating transcriptomics and proteomics data for accurate assembly and annotation of genomes”.


a b s t r a c t
Anopheles stephensi Liston is one of the major vectors of malaria in urban areas of India. Midgut plays a central role in the vector life cycle and transmission of malaria. Because gene expression of An. stephensi midgut has not been investigated at protein level, an unbiased mass spectrometry-based proteomic analysis of midgut tissue was carried out. Midgut tissue proteins from female An. stephensi mosquitoes were extracted using 0.5% SDS and digested with trypsin using two complementary approaches, in-gel and insolution digestion. Fractions were analysed on high-resolution mass spectrometer, which resulted in acquisition of 494,960 MS/ MS spectra. The MS/MS spectra were searched against protein database comprising of known and predicted proteins reported in An. stephensi using Sequest and Mascot software. In all, 47,438 peptides were identified corresponding to 5,709 An. stephensi proteins. The identified proteins were functionally categorized based on their cellular localization, biological processes and molecular functions using Gene Ontology (GO) annotation. Several proteins identified in this data are known to mediate the interaction of the Plasmodium with vector midgut and also regulate In-gel and in-solution trypsin digestion of proteins followed by LC-MS/MS analysis using LTQ-OrbitrapVelos and LTQ-Orbitrap Elite mass spectrometer (Thermo Scientific).

Data source location
Goa and Bengaluru, India Data accessibility Data are available here and via a web application ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD001128.

Value of the data
This data set is the largest catalogue of proteins identified from the midgut tissue of female An.

stephensi.
Data provides information about midgut proteins involved in different biological and molecular functions, immunity and vector parasite interaction. Overall it enables better understanding of mosquito-parasite interaction and malaria transmission.
This data could be utilized in future for the development of novel targets for control of disease transmission.

Data
Presented here is the processed data corresponding to the proteomic analysis of midgut tissue of female Anopheles stephensi [1]. The processed data set contains 494,960 MS/MS spectra, which led to Table 1 List of agonistic proteins identified which support Plasmodium development in mosquito midgut.

S. No
An. stephensi ID (Indian strain) Genename Corresponding An. gambiae ortholog ID Protein description Retinoid and fatty-acid binding glycoprotein, also known as lipophorin or ApoII/I 5  identification of 47,438 peptides and 5,709 proteins. All the proteins and peptides identified in this study are listed in Supplementary Table S1 and S2. A total of 127 proteins, which play important roles in vector immunity, have been identified in midgut of female An. stephensi mosquitoes. Another 39 proteins, known to be involved in parasite development in the vector, were also identified in this study [2]. Of these, 16 proteins were found to be agonistic in nature thus support Plasmodium development in the mosquito host (Table 1) and 23 proteins were found to be antagonistic in nature hence inhibit Plasmodium development in mosquito host ( Table 2). The Gene Ontology annotation for all the identified proteins were fetched from the VectorBase database [3]. Protein-protein interaction networks were mapped using STRING (version 1.1.0).

Sample preparation
Female An. stephensi mosquitoes were obtained from the insectary of ICMR-National Institute of Malaria Research, Field Unit, Goa, where cyclic colony of this mosquito species is maintained at a temperature of 27 7 2°C, relative humidity of 70 7 5% and a photoperiod: scotoperiod of 12:12 h (light:dark). Midguts were dissected from the 500 female An. stephensi. The midguts collected were homogenized in 200 µl of 0.5% SDS using ultrasonication. The extracted proteins were then quantified by Bicinchoninic acid assay (Pierce ® .Cat#: 23225). The proteins extracted were then subjected to ingel and in-solution trypsin digestion followed by fractionation on off-gel fractionator and reversephase liquid chromatography [1].

In-gel digestion
Two hundred micrograms 200 µgof proteins was resolved on 10% SDS-PAGE gel. The gel was stained using Colloidal Coomassie 33 stain (Invitrogen, Carlsbad, CA). Excess stain was removed by giving multiple washings with 10% methanol. The protein lanes were cut into 22 gel pieces and subjected to in-gel trypsin digestion as described previously [4].

In-solution digestion
Four hundred micrograms of protein was subjected to in-solution trypsin digestion. The trypsindigested peptide mixtures obtained were divided into two equal parts for further separation by using off-gel fractionator and basic Reverse-Phase Liquid Chromatography (bRPLC). Off-gel fractionator (Agilent 3100) was used for fractionating the trypsin digested peptides. Peptides were separated using IPG strip (pH 3-10) by focusing for 50 kVh with maximum current of 50 µA and maximum voltage set to 4000 V. After fractionation, a total of 12 fractions were collected and acidified using 1% TFA and stored at − 80°C until LC-MS/MS analysis. The remaining digested peptides were fractionated by using bRPLC approach. Peptides were resolved using solvent B (10 m M triethyl ammonium bicarbonate, pH 8.5 in 95% Acetonitrile) with a gradient of 5-60% and 1 ml flow rate per minute for over 60 min. Ninety six fractions were collected using automatic fraction collector, which were further concatenated to 24 fractions, vacuum dried and stored in − 80°C freezer until further LC-MS/MS analysis as previously described [1].

Mass spectrometry analysis
In this study, a total of 58 LC-MS/MS runs, of which, 24 bRPLC fractions were performed on LTQ-Orbitrap Elite (Thermo Scientific, USA) mass spectrometer interfaced with Easy-nano LC II nano flow liquid chromatography system (Thermo Scientific), while the remaining 34 fractions (including in-gel and off-gel fractions) were analyzed on LTQ-OrbitrapVelos mass spectrometer interfaced with Proxeon Easy nLC system (Thermo Scientific, Bremen, Germany). The peptides from each fraction were reconstituted in 0.1% formic-acid and loaded on pre-column (75 µ × 2 cm) packed with magic C18 AQ (MichromBio-resources, USA) 5 µ particle and 100 Å pore size at flow rate of 5 µl per minute. Peptides were resolved at 250 nl/min flow rate using a linear gradient of 10-35% solvent B (0.1% formic acid in 95% Acetonitrile) over 75 min on an analytical column, of 75 µ × 60 cm, 5 µ particle and 100 Å pore size for Elite and 75 µ × 15 cm, 3 µ particle and 100 Å pore size for Velos was packed using nitrogen pressure cell at 2500 psi. To reduce the back pressure 60 cm analytical column was operated in a heated insulator at 60°C temperature using butterfly column heater (Phoenix S&T, Inc. PA, USA) and was fitted on flex ion source that was operated at 2.5 kv voltage (Only for Elite). The analysis on mass spectrometry was carried out in a data dependent manner with a full scans in the range of m/z 350-2000. Full MS scans were measured at a resolution of 120,000 for Elite and 30,000 for Velos at m/z 400 [1]. Fifteen to twenty most abundant precursor ions were selected from MS scans and fragmented using higher-energy collisional dissociation (HCD). Fragment ions were acquired at a resolution of 30,000 for Elite and 15,000 for Velos. Singly charged ions were excluded and dynamic exclusion was set to 30 s. The steps involved in the proteomic analysis of midgut tissue using mass spectrometry is shown in Fig. 1.

Data analysis
The data obtained was processed using Proteome Discoverer software (version 2.1, Thermo Fisher Scientific, Bremen, Germany) and searched using Sequest and Mascot search algorithm against VectorBase protein database of An. stephensi, i.e., Astel2.2. The search parameters included trypsin as the proteolytic enzyme allowing up to two missed cleavages, methionine oxidation was set as a dynamic modification while carbamido-methylation at cysteine was set as static modification. Peptide mass error tolerance and fragment mass error tolerance were set to 20 ppm and 0.1 Da, respectively. The protein and peptide data were extracted with search result parameters as peptide rank one and peptide confidence set as high. For the entire data set, false discovery rate (FDR) was calculated by enabling the peptide sequence analysis using a decoy database and a cut-off of 1% was used for identifications. The identified proteins were functionally categorized based on their subcellular localization, biological processes and molecular function using gene ontology (GO) based annotations available for An. stephensi (SDA 500) strain in VectorBase database Supplementary Table  S3. Proteins identified were found to be involved in different molecular functions such as catalytic activity (48%), binding activity (30%), transporter activity (8%), structural (8%), receptors (2%) and others (1%). Biological process-based categorization showed that a majority of proteins played a role in metabolism (32%), cellular processes (31%), localization (10%), biogenesis (8%), response to stimulus (5%), biological regulation (5%), development (2%), multicellular organismal process and others (2%). The proteins have been described based on their cellular localization as shown in Fig. 2A-C. The information for An. stephensi protein orthologs in Anopheles gambiae was fetched using Biomart tool provided through VectorBase Supplementary Table S4. Thirty nine proteins were identified that are  known to be involved in parasite development in mosquito. A total of 127 immunogenic proteins were identified using ImmunoDB (http://cegg.unige.ch/Insecta/immunodb/) Supplementary Table S5. The proteins identified were analyzed using online STRING tool to generate an interacting map for all the midgut proteins (Fig. 3, Supplementary Table S6) [5,6].