Proteome dataset of chili pepper plant (Capsicum frutescens) infested by broad mite (Polyphagotarsonemus latus)

The dataset presented in this article is associated with the TMT (Tandem mass tag) labeled proteomics of chili pepper plant (Capsicum frutescens) infested by a broad mite (Polyphagotarsonemus latus). Data was captured using a nano liquid chromatography system coupled with high-resolution Orbitrap FusionTribridmass spectrometer. Proteomics data was analyzed using the Proteome Discoverer version 2.4 tool using MASCOT and SequestHT algorithms. We have identified a total of 5,807 proteins supported by 48,555 unique peptides and 1,279,655 peptide-spectrum matches. Individually, 5,186 proteins were detected in healthy leaf samples, 5,193 in infested leaf sample, 5,194 proteins in healthy meristem sample, and 5,196 proteins in infested meristem samples. Datasets obtained from reciprocal blast against the Arabidopsis thaliana proteome database enabled the prediction of protein-protein interactions, and subcellular localization of differentially expressed proteins, which are also included in this article. Data presented in this article has been deposited in the ProteomeXchange Consortium via the PRIDE repository, which can be accessed through the accession ID: PXD018653.


a b s t r a c t
The dataset presented in this article is associated with the TMT (Tandem mass tag) labeled proteomics of chili pepper plant ( Capsicum frutescens ) infested by a broad mite ( Polyphagotarsonemus latus ). Data was captured using a nano liquid chromatography system coupled with high-resolution Orbitrap FusionTribridmass spectrometer. Proteomics data was analyzed using the Proteome Discoverer version 2.4 tool using MASCOT and SequestHT algorithms. We have identified a total of 5,807 proteins supported by 48,555 unique peptides and 1,279,655 peptide-spectrum matches. Individually, 5,186 proteins were detected in healthy leaf samples, 5,193 in infested leaf sample, 5,194 proteins in healthy meristem sample, and 5,196 proteins in infested meristem samples. Datasets obtained from reciprocal blast against the Arabidopsis thaliana proteome database enabled the prediction of protein-protein interactions, and subcellular localization of differentially expressed proteins, which are also included in this article. Data presented in this article has been deposited in the ProteomeXchange Consortium via the PRIDE repository, which can be accessed through the accession ID: Sample collection was carried out from a broad mite-infested field. Plants grew at an ambient temperature of 27 °C with a relative humidity of 65-85%. Meristem and leaf samples of 12-weeks-old healthy and infected plants were collected for the proteomics analysis.

Description of data collection
The plant sample was homogenized using liquid nitrogen, proteins were extracted, disulphide bonds were reduced and alkylated. Proteins were subjected to trypsin digestion. Peptides were ladled with tandem mass tags (TMT) and fractionated using basic reverse-phase liquid chromatography (bRPLC). Peptide fractions were subjected to high-resolution mass spectrometry analysis to obtain peptide sequence information. Data was searched against Capsicum annuum (Pepper Zunla 1 Ref_v1.0) protein database to fetch peptide spectrum matches, from which a list of corresponding peptides and proteins were obtained. Identified proteins were subjected to bioinformatic analysis. Data

Value of the Data
• This study provides insight into the comparative proteomic analysis of broad mite ( Polyphagotarsonemus latus ) infestation in chili ( Capsicum frutescens ) . • The dataset describes the proteomic landscape of plant-mite interaction. These findings can broadly aid ecological and agri-proteomic research to prevent and manage pest-induced crop damage. • Altered signaling pathways and hormone regulation mechanisms can be explored further to study the pest-induced plant hyper responses. • The role of differentially expressed proteins in plant defense mechanisms can be assessed to design novel, eco-friendly molecules to deal with plant pathogens and post-infestation damage.

Data Description
This data set describes the proteins identified in Capsicum frutescens infected by a broad mite ( Polyphagotarsonemus latus ). Fig. 1 represents the overall pipeline of the experimental design. TMT-labeled shotgun proteomics analysis led to the identification of 5807 protein groups represented by 48,555 unique peptides and 1279,655 peptide-spectrum matches. Individually, 5186 proteins were detected in healthy leaf samples; 5193 in infested leaf samples; 5194 proteins in healthy meristems; and 5196 proteins in infested meristems. A list of all proteins including the differentially regulated proteins are presented in supplementary Table 1. Among the total proteins, we identified a large number of regulatory proteins such as kinases, phosphatases, transcription factors etc. The proteins categorized according to the activities are presented in supplementary Table 2. Data provided in this article were analyzed and represented in detail in the article, "Plant-Pathogen Interactions: Broad Mite ( Polyphagotarsonemus latus )-Induced Proteomic Changes in Chili Pepper Plant ( Capsicum frutescens )" [1] . Identified proteins were subjected to reciprocal BLAST(Basic Local Alignment Search Tool;blastp) against the Arabidopsis thaliana proteome to fetch the respective counterparts in Arabidopsis thaliana (supplementary Table 3). The Arabidopsis protein IDs were further analyzed by Gene Ontology analysis as represented in Fig. 2 . The SUBA4 tool [2] was used to fetch the subcellular location of the proteins. Out of all identified proteins, the largest number of the proteins represented extracellular origin. Fig. 3 and supplementary Table 4 presents output data obtained from protein localization prediction [3] . Fig. 4 illustrates protein-protein interactions of extracellular proteins as predicted by STRING tool (Search Tool for the Retrieval of Interacting Genes/Proteins) [4] . The output data of STRING analysis is presented in Supplementary Table 5. The interactive network diagram was created using Cytoscape [5] and attached as supplementary file 6. List of identified capsicum proteins Fig. 1. Representative experimental workflow of the dataset described in this study. Broad mite ( Polyphagotarsonemus latus ) infested birds eye chili ( Capsicum frutescens ) samples were collected and proteins were extracted, quantified using BCA assay. Trypsin digested peptides were labeled with tandem mass tags (TMT), subjected to basic reverse-phase liquid chromatography-based fractionization, and analyzed in a high-resolution orbitrap mass spectrometer. The acquired data is searched against Capsicum annuum (Pepper Zunla 1 Ref_v1.0) dataset using Sequest HT and MASCOT algorithms. mapped to different KEGG Orthology and proteins mapped to KEGG pathways, KEGG brite, KEGG modules are presented in supplementary Table 7. A partial list of proteins identified in Capsicum frutescens leaf and meristem associated with plant-pathogen interaction is presented in Table 1 .
The data serves as a useful resource for scientists working on Capsicum frutescens and in the field of plant-pathogen interactions. Compared to our previously published article, here, we extended our analysis to all the proteins identified in this study to gather an overview of protein networks operative in Capsicum frutescens under control and infected conditions. The differentially regulated proteins identified may serve as targets that may be utilized in future to generate pathogen resistant crop plants.

Sample collection
Samples were collected from the broad mite-infested field. Plants grew at an ambient temperature of 27 °C, relative humidity was 65-85%. Leaf and apical meristems of healthy and infested plants were collected from 12 week old plant and snap-frozen in liquid nitrogen, stored at −80 °C until further analysis.

Trypsin digestion
Based on BCA assay, 100 μg protein in 600μL of Triethylammonium bicarbonate (TEABC) buffer were taken from each sample for reduction, alkalization and trypsin digestion. Disulfide bonds were reduced using 5 mM dithiothreitol (DTT) at 60 °C for 45 min, alkylated using 5 mM iodoacetamide (IAA) in dark at room temperature for 30 min. Samples were subjected to enzymatic digestion using sequencing grade TPCK (L-(tosylamido-2-phenyl) ethyl chloromethyl ketone) treated trypsin (trypsin: protein ratio of 1:20) overnight at 37 °C. Completion of trypsin digestion was checked by running SDS-PAGE of pre-digest and post-digest samples. Tryptic peptides were vacuum dried and resuspended in 50 mM TEABC buffer . TMT (Thermo Fisher Scientific, catalog number: 90,061) labeling was carried out as per the suppliers' protocol. TMT-sixplex kit labels 126, 128, 129 and 130 were used for the infected leaf, infected meristem, healthy meristem and healthy leaf samples, respectively. 2 μg equivalent labeled peptides were pooled and analyzed in an orbitrap mass spectrometer to check the labeling efficiency. After confirmation, TMT-labeled peptides were pooled from all conditions. The pooled sample was fractionated using the basic reverse-phase liquid chromatography (bRPLC) into 96 fractions. These fractions were pooled and concatenated into 12 fractions. The separation was carried out in a XBridge C18, 4.6 × 250mmcolumn (Waters, Milford, USA; catalog number: 186,003,117). A gradient of 0-100% of solvent A (10 mM TEABC in the water at, pH 8.5) to solvent-B [90% ACN(Acetonitrile) in 10 mM TEABC] was used for 120 min. Separation was carried out using Hitachi's chromatography systems connected to UV-Visible detector set to 280 nm attached to a fraction collector (Gilson FC 204).

Desalting and cleanup
The fractionated peptides were desalted using SCX (strong cation exchange) stage tips (3M TM Empore TM Discs). STAGE (STopAnd Go Extraction) tips were prepared using 250μL pipette tips. Peptides were resuspended in trifluoracetic acid [TFA (0.1%)]. SCX columns were equilibrated using 70μL 70% ACN and thrice 70μL 0.1% TFA. Columns were loaded with peptides, flow-through is discarded. Peptides bound to the resin, are washed with 0.2% TFA. After washing, peptides were eluted out of the column using 70μL50mM of ammonium acetate in 50% acetonitrile and 0.5% formic acid.

Mass spectrometry analysis
After the label check run, pooled TMT labeled peptides were subjected to peptide sequencing using nano-LC-MS (Liquid chromatography-mass spectrometer) system Easy-nLC-1200 nanoflow liquid chromatography system (Thermo Fischer Scientific) connected to Orbitrap Fusion Tribrid mass spectrometer. Data were acquired using three technical replicates. Desalted pooled, labeled peptides were reconstituted in 0.1% formic acid and loaded onto a 2 cm trap column (NanoViper, 3 μm C18 Aq, Thermo Fisher Scientific). Peptides were separated using a 15 cm analytical column (NanoViper, 75 μm silica capillary, 2 μm C18 Aq) at a flow rate of 300 nL/min. Peptide separation was carried out in a gradient program 5-35%, for 120 min consisting of solvent A (0.1% formic acid) to solvent B (80% acetonitrile in 0.1% formic acid). Global MS survey scan range was set to 40 0-160 0 m/z (120,0 0 0 mass resolution at 20 0 m/z ) in a data-dependent mode using Orbitrap mass analyzer. The maximum injection time was 5 ms. Peptides with a charge state of 2-6 were considered for the analysis. The dynamic exclusion rate was set to 30 ms. For MS/MS analysis, data acquisition was carried out at top speed mode with 3 s cycles and subjected to higher collision energy dissociation with 34% normalized collision energy. MS/MS scans were carried out at a range of 10 0-160 0 m/z using the Orbitrap mass analyzer at a resolution of 60,0 0 0 at 200 m/z . The maximum injection time was 120 ms. Internal calibration was carried out using a lock mass option ( m/z 445.120 0 025) from ambient air.

Data analysis
Mass spectrometry derived raw data, were searched against the NCBI (National Center for Biotechnology Information) Capsicum annuum protein database(Pepper Zunla 1 Ref_v1.0)using Proteome Discoverer V2.4 platform (Thermo Scientific) with SequestHT and MASCOT algorithms to fetch protein lists. The search parameters were set as following: precursor mass tolerance was set to 20 ppm, fragment mass tolerance was set to 0.05 Da. TMT label at the peptide N terminus and Lys-residues, acetylation at the protein N terminus and oxidation of methionine were set as variable modifications. Carbamidomethylation of cysteine was set as a fixed modification. Other search parameters included two missed cleavages by trypsin and a 1% false discovery rate (FDR) maximum and minimum peptide length was set to 144 and 7, respectively. Since Capsicum sp. protein IDs are not supported by Gene Ontology and STRING analysis tools, the protein list obtained from the Proteome Discoverer platform was subjected to reciprocal BLAST (blastp) against the Arabidopsis thaliana data with the settings 60% identity and 80% query coverage to obtain the respective counterparts in Arabidopsis thaliana proteome. Gene Ontology analysis was carried out using PANTHER [7] . The protein-protein interaction analysis was carried out using STRING V.11.0 tool. Protein subcellular localization annotation was carried out using the SUBA4 tool. Arabidopsis thaliana protein IDs obtained from reciprocal BLAST were used as input to carry out data analysis. KEGG analysis was carried out using KEGG mapper version 4.3 [8] . Capsicum gene IDs were used to fetch KEGG pathways, KEGG Brite, and KEGG modules. Uniprot-Retrieve/ID mapping [9] and NCBI Datasets [10] tools were used to fetch corresponding capsicum ID's which were required for data analysis.

Data availability
The mass spectrometry raw data and the Proteome Discoverer output MSF files obtained from this study are publically available at ProteomeXchange Consortium ( http://proteomecentral. proteomexchange.org ) through the PRIDE partner repository (Project accession: PXD018653).

Supplementary Materials
Supplementary material associated with this article can be found in the online version at doi: 10.1016/j.dib.2021.107095 .