Proteome dataset of Hemileia vastatrix by LC–MS/MS label-free identification

Here we describe the proteome of the fungus Hemileia vastatrix by label free mass spectrometry (LC–MS/MS). H. vastatrix is the causal agent of coffee rust disease, causing great economic losses in this crop. The objective of our work was to identify H. vastatrix proteins potentially involved in host colonization and infection, by exploring the shotgun proteomics approach. A total of 742 proteins were identified and are associated with several crucial molecular functions, biological processes, and cellular components. The proteins identified contribute to a better understanding of the metabolism of the fungus and may help identify target proteins for the development of specific drugs in order to control coffee rust disease. All data can be accessed at the Centre for Computational Mass Spectrometry – MassIVE MSV000087665 -https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=cc71ad75f767451abe72dd1ce0019387


a b s t r a c t
Here we describe the proteome of the fungus Hemileia vastatrix by label free mass spectrometry (LC-MS/MS). H. vastatrix is the causal agent of coffee rust disease, causing great economic losses in this crop. The objective of our work was to identify H. vastatrix proteins potentially involved in host colonization and infection, by exploring the shotgun proteomics approach. A total of 742 proteins were identified and are associated with several crucial molecular functions, biological processes, and cellular components. The proteins identified contribute to a better understanding of the metabolism of the fungus and may help identify target proteins for the development of specific drugs in order to control coffee rust disease. All data can be accessed

Value of the Data
• The proteins identified in this study contribute to better understand the metabolism of Hemilea vastatrix . • The data obtained can help researchers and agricultural industries to identify target proteins for the development of specific drugs in order to control coffee rust disease. • The dataset of H. vastatrix proteins represents valuable information that contributes to the Pucciniales proteome repertoire.

Data Description
The dataset described here was obtained from the proteome analysis of the fungus Hemileia vastatrix Berkeley and Broome (Basidiomycota, Pucciniales). A total of 742 proteins of H. vastatrix were identified using the PEAKS software and the proteins were deposited in the MassIVE repository under the ID MSV0 0 0 087665. All the files presented in the MassIVE repository are File in the MZID (mzIdentML) format, containing the identification results exported by the Peaks software, based on the search of the peaklist in the database, both mentioned above, also used for validation in the MassIVE workflow. Search_engine_files All files generated by Peaks software, in which the complete set of spectra and protein identification were analyzed. The tables of identified peptides and proteins can be found in the "export" subfolder.

Supplementary_files
The iteractive sequence of mappings (Uniprot, DB2DB on BioDBnet and BLASTKoala) and their results. described in Table 1 . The proteins identified in H. vastatrix germinating urediniospores ( Fig. 1 ) were classified according to their molecular functions, cellular components, and biological processes categories ( Fig. 2 ).

Experimental Design, Materials and Methods
The urediniospores of H. vastatrix (race II, isolate Hv01) were collected from artificially infected leaves of C. arabica (var. Catuaí Amarelo) plants grown in a greenhouse ( Fig. 1 ). Approximately 10 mg of urediniospores were spread in 10 mL distilled water and allowed to germinate in Petri dishes kept in the dark at 24 °C.
Germinated spores ( > 80%) and non-germinated spores (altogether called germinating urediniospores (gU) sample) were harvested after 24 hours by centrifugation at 120 0 0 rpm for 2 min. The gU from five Petri dishes were collected into one single tube to form the H. vastatrix gU sample used for protein extraction as described by Ribeiro et al. [1] . For tryptic digestion, the sample was solubilized with 60 μL of 50 mM ammonium bicarbonate (NH 4 HCO 3 pH 8.5), then 25 μL of RapiGestTM SF -Waters (0.2% v/v) was added. The sample was reduced with dithiothreitol (100 mM), alkylated with iodoacetamide (300 mM) and proteins were digested using 200 ng of trypsin at 37 °C for 19 h.
The LTQ Orbitrap Elite mass spectrometer was operated in data-dependent acquisition (DDA) mode, generating MS1 spectra in the Orbitrap analyzer (with resolution of 120 0 0 0 FWHM at 400 m/z) between the masses 300-1650 m/z and dynamic exclusion of 10 ppm. The 15 most intense ions were chosen for each MS1 spectrum automatically with charges higher than two and directed toward higher energy collision-induced dissociation (HCD). The configuration for HCD was: 2.0 m/z isolation window with automatic gain control (AGC) of 1 × 10 6 , and maximum fill time of 100 ms, with normalized collision energy at 35% and threshold for the selection of 3000.
Alignment of spectra and quantification of peptides were performed using Progenesis ® QI for proteomics v.1.0 software [3] and proteins were identified using Peaks ® 7.0 software [4] . The sequences were deduced from the fragmentation information and the search performed in the Uniprot (Universal Protein Resource) repository in May 2021, filtered to the order Pucciniales (Taxon ID: 5258). The search was performed based on de novo sequencing and PSM with the following parameters: tolerance for the mass of the precursor of 10 ppm, and of 0.05 Da for the fragments, tolerance of up to 2 missing cleavages, carbamidomethylation of cysteines as a fixed modification and methionine oxidation as a variable modification. Protein identifications were considered as being reliable at FDR < 1%, presenting at least two unique peptides. Finally, the proteins identified were functionally annotated using Blast2GO software [5] .

Ethics Statements
This research involved neither human participants nor animals.

Declaration of Competing Interest
The authors declare that they have no known financial interests and personal relationships that could inappropriately influence the work reported in this paper.

Data Availability
MassIVE