Data for proteomic profiling of Anthers from a photosensitive male sterile mutant and wild-type cotton (Gossypium hirsutum L.)

Cotton is an important economic crop, used mainly for the production of textile fiber. Using a space mutation breeding technique, a novel photosensitive genetic male sterile mutant CCRI9106 was isolated from the wild-type upland cotton cultivar CCRI040029. To study the male sterile mechanisms of CCRI9106, histological and iTRAQ-facilitated proteomic analyses of anthers were performed. This data article contains data related to the research article titled iTRAQ-Facilitated Proteomic Profiling of Anthers From a Photosensitive Male Sterile Mutant and Wild-type Cotton (Gossypium hirsutum L.)[1]. This research article describes the iTRAQ-facilitated proteomic analysis of the wild-type and a photosensitive male sterile mutant in cotton. The report indicated that exine formation defect is the key reason for male sterility in mutant plant. The information presented here represents the tables and figures that detail the processing of the raw data obtained from iTRAQ analysis.


Specifications
Total anther protein was extracted from mutant and wild-type plants by triplicate using a TCAacetone method. Three replicates iTRAQ-facilitated proteomic analysis were conducted for protein identification and quantification. Any protein changed with aZ 1.5-fold difference and a p-Value r 0.05 in at least two replicates would thus be considered as a significant DEP in our data. Data source location Cotton anther samples were collected in Anyang, Henan Province, China. iTRAQ-facilitated proteomic analysis were conducted in Beijing Genomics Institute, Shenzhen, Guangdong Province, China.

Data accessibility
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD002209. The reviewer account: username, reviewer23539@ebi.ac.uk; password: 3ts0ERFU.
Value of the data An iTRAQ-based proteomic analysis in cotton anthers. Identification of 6,121 high-confidence proteins in cotton anther. There are 325 proteins show differential expression patterns between WT and MT. The data enrich the understanding of the molecular regulatory mechanisms of male sterility.

Experimental design
Using a space mutation breeding technique, a novel photosensitive genetic male sterile mutant CCRI9106 was isolated from the wild-type upland cotton cultivar CCRI040029. Histological and iTRAQ-facilitated proteomic analyses of anthers were performed to explore male sterility mechanisms of the mutant.

Plant growth and anther collection
Two G. hirsutum L. genotypes, a PGMS mutant CCRI9106 and its WT line, CCRI040029, were used in this study. CCRI040029 was an elite upland variety bred in our lab, and the mutant line, CCRI9106, was created by space mutation in 2010 [2]. They were grown in an agronomic field in Anyang (Henan, China) from April to October (Fig. S1), and in Sanya (Hainan, China) from October to early April (Fig. S2). Thirty rows (8 m in length Â 0.8 m in width) were prepared for each genotype, and every 10 rows formed one replicate. To test the pollen fertility, anthers were stained with Alexander's solution. Additionally, anthers from both MT and WT at different development stages were collected for further analysis.

Scan electron microscopy
For SEM (Fig. S3), anthers were infiltrated with 2.5% (v/v) glutaraldehyde in phosphate buffer (0.1 M, pH 7.2), dehydrated in a graded series of ethanol (from 30% to 100%), treated in acetone for 15 min, and transferred to isoamyl acetate for 20 min. The samples were then dried with a CO 2 critical-point drying system (HITACHI HCP-2, Japan). Subsequently, pollen grains were coated with gold:palladium and imaged using a scanning electron microscopy (HITACHI S-530, Japan).

Protein extraction and quantification
For protein extraction, a TCA-acetone (trichloroacetic acid) method [3] was selected, performed according to Pang et al. with minor modifications [4]. In brief, $1.5 g of frozen anther was ground with 10% polyvinyl polypyrrolidone (w/w) in liquid nitrogen using a mortar and pestle. The resulting fine powder was mixed with 10% (w/v) TCA in cold acetone containing 0.07% (w/v) 2-mercaptoethanol for at least 2 h and subsequently centrifuged at 12,000 g for 1 h at 4 1C. The pellet was washed first with cold acetone containing 0.07% (w/v) 2-mercaptoethanol and then with 80% cold acetone and finally was suspended in lysis buffer (7 M urea, 2 M thiourea, 4% CHAPS, 20 mM dithiothreitol, 2% EDTA-free protease-inhibitor). The supernatant was centrifuged at 120,000 g for 90 min at 4 1C and used for further assays. Next, the purified proteins underwent a reductive alkylation reaction. The concentration of the protein solution was determined with the 2-D Quant Kit (GE Healthcare, USA) with bovine serum albumin as a standard. The supernatants were stored at -80 1C until required.

LC-MS/MS analysis
A splitless nanoACQuity (Waters, USA) system coupled with Triple TOF was used for analytical separation. The system uses microfluidic traps and nanofluidic columns packed with Symmetry C18 (5 μm, 180 μm Â 20 mm) for online trapping, desalting, and nanofluidic columns packed with BEH130 C18 (1.7 μm, 100 μm Â 100 mm) for analytical separations. Solvents were purchased from thermo fisher scientific and composed of water/acetonitrile/formicacid (A: 98/2/0.1%; B: 2/98/0.1%). A portion of 2.25 μg (9 μL) sample was loaded, and trapping and desalting were carried out at 2 μL/min for 15 min with 99% mobile phase A. At a flow rate of 300 nL/min, analytical separation was established by maintaining 5% B for 1 min. In the following 64 min, a linear gradient to 35% B occurred in 40 min. Following the peptide elution window, in 5 min the gradient was increased to 80% B and maintained for 5 min. Initial chromatographic conditions were restored in 2 min.
Data acquisition was performed with the AB SCIEX Triple TOF 5600 System (Concord, USA) fitted with a Nanospray III source (Concord, USA) and a pulled quartz tip as the emitter (New Objectives, Woburn, USA). Data was acquired using an ion spray voltage of 2.5 kV, curtain gas of 30 PSI, nebulizer gas of 15 PSI, and an interface heater temperature of 150 1C. The MS was operated with a RP greater than or equal to 30,000 FWHM for TOF MS scans. For IDA, survey scans were acquired in 250 ms and as many as 30 product ion scans were collected if exceeding a threshold of 120 counts per second (counts/s) and with a 2 þ to 5þ charge-state. Total cycle time was fixed to 3.3 s. Q2 transmission window was 100 Da for 100%. Four time bins were summed for each scan at a pulser frequency value of 11 kHz through monitoring of the 40 GHz multichannel TDC detector with four-anode/channel detection. A sweeping collision energy setting of 3575 eV coupled with iTRAQ adjust rolling collision energy was applied to all precursor ions for collision-induced dissociation. Dynamic exclusion was set for 1/2 of peak width (18 s), and then the precursor was refreshed off the exclusion list.
The table lists the cut-off points (variation), and the corresponding coverage (%) of quantified proteins. a. "Cut off at" means the variation between the fold change and 1, and the fold change is calculated between two samples in the three experements. b. "Number" means the number of proteins meet the cut off value. c. "Total" means the total number of proteins quantified in at least two exprements. d. "Coverage (%)" is calculated as the "Number" divided by the "Total",and the higer coverage at a smaller cut off value means the better repeatability.

Database search and quantification
Protein identification and quantification were simultaneously performed using the Mascot 2.3.02 software (Matrix Science, Boston, USA). Searches were made against our cotton_AD_nr database, including 38,460 sequences from the G. raimondii genome [5] and 43,097 from the G. arboretum genome [6], the putative contributors of the D and A subgenomes, respectively, of the G. hirsutum L. genome (AADD). The search parameters were set as follows: trypsin was chosen as the enzyme with one missed cleavage allowed; the fixed modifications of carbamidomethylation were set as Cys, and variable modifications of oxidation as Met; peptide tolerance was set as 0.05 Da, and MS/MS tolerance was set as 0.1 Da. The peptide charge was set as Mr, and monoisotopic mass was chosen. An automatic decoy database search strategy was employed to estimate the false discovery rate (FDR). The FDR was calculated as the false positive matches divided by the total matches. In the final search results, the FDR was less than 1.5%. The iTRAQ 8-plex was chosen for quantification during the search. For protein identification, only peptides with significant scores (Z20) at the 99% confidence interval were used, and each confident protein included at least one unique peptide. For protein quantitation, "median" was chosen for the protein ratio type, only unique peptides were used to quantify proteins. The median intensities were set as normalization. We assigned the 6121 proteins detected from at least two replicates as finally identified proteins in this study (Table S1).
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository [7] with the dataset identifier PXD002209. DEPs are classified into three GO categories: biological process, molecular function and cellular component. "Cluster Frequency" means number of DEPs in the list. "P-value" means the reliability of each term, only terms with P-valueo 0.05 are shown. "Proteins" are the DEPs annotated to the term.
We performed the analysis of biological replicates at each stage. The average CV of each stage ranges from 0.19-0.24, indicating high repeatability of our data (Table 1). Any protein changed with a Z1.5-fold difference and a p-Valuer0.05 in at least two replicates would thus be considered as a significant DEP in our data (Table S3).

Primer name
Gene Name Primer Sequences GhUB7 TAGAGTCCGCTTCTACCTT GhUB7-R1 ACGATTACGGAAAATCAAAGCC this study were blasted for the closest Arabidopsis homolog with E-value r10 À 10 (Table S5). After a survey of the literatures, we updated a previously published list [11] of genes affected pollen development or pollen tube growth from 215 to 323 genes in Arabidopsis (Table S6).

RNA extraction and quantitative real-time PCR (qPCR)
To verify whether the differences in protein abundance were reflected at the transcriptional level, and to confirm the authenticity and accuracy of the proteomic analysis, 12 genes, one gene randomly selected from each cluster, were analyzed by qPCR at all three stages in WT and MT plants (Fig. S5). Total RNA from anther samples was extracted using the RN38-EASYspin Plus Plant RNA Kit (Aidlab, China) according to the manufacturer's protocol. Approximately 1 mg RNA was reverse transcribed to cDNA using SuperScriptIII (Invitrogen, USA) following its protocol. And qPCRs were carried out using SYBR Green PCR Master Mix (Roche Applied Science, Germany) on an ABI 7500 real-time PCR system (Applied Biosystems, USA) with three replicates. Data were processed using the 2 À DDCt method, and the GhUBQ7 (GhUBQUTIN7, DQ116441) was used as an endogenous reference gene and stage 1 was set as reference sample for data normalization. All the primer pairs used were shown in Table 3.