A collection of transcriptomic and proteomic datasets from sesame in response to salt stress

Soil salinity is a major abiotic factor affecting the growth and development of important crops such as sesame (Sesamum indicum L.). To understand the molecular mechanisms of this oilseed crop in response to salt stress, we examined the transcriptome and proteome profiles of two sesame varieties, with contrasting tolerances to salinity. Here, RNA sequencing and quantitative proteomic analyses of 30 samples from salt-tolerant and -sensitive sesame seedlings under salt stress were carried out. These data can be used for differential gene expression and protein accumulation analyses, based on a genetic aberration or phenotypic differences in sesame responses to salt stress. Our dataset provides an extensive resource for understanding the molecular mechanisms underlying the adaptation of sesame to salt stress, and may constitute useful a resource for increasing the tolerance of major crop plants to raised salinity levels in soils.


Specifications
Agricultural and Biological Sciences Specific subject area Plant transcriptomics Type of data Table, Image and Figure  How data were acquired  Illumina HiSeqTM 40 0 0 sequencing platform  Data format Raw and analyzed Parameters for data collection 30 samples of 14 day old seedlings prepared from WZM3063 and ZZM4028 varieties with contrasting tolerances to salt. Shoot samples were collected at 0 (control), 2, 6, 12, and 24 h after salt treatment for RNA and protein extraction, cDNA library preparation and sequencing, iTRAQ labeling and LC-MS/MS analysis. Description of data collection The RNAseq dataset was collected from paired-end sequencing of sesame cDNA libraries using Illumina HiSeq X ten platform with 2 × 150 bp reads. The raw reads were recorded in a FASTQ file. Raw reads were filtered to remove reads containing adapter or reads of low quality, and clean reads were mapped to sesame genome v.1.0 [1] . The iTRAQ dataset were collected using an AB SCIEX nanoLC-MS/MS system (Triple TOF 6600). The unique peptides were mapping the sesame protein database (assembly S_indicum_v1.0) [2] . Data source location City: Wuhan Country: China Data accessibility The RNA-Seq and iTRAQ raw data were deposited in the Sequence Read Archive of NCBI, under accession number SRP186970 and the ProteomeXchange with identifier PXD013013. Direct URL to data: https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP186970 ; http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD013013 Value of the data • These RNA-seq and iTRAQ data obtained from the selected 2 sesame varieties which represent the first complete set of transcriptome and proteomic data generated from sesame varieties with contrasting tolerances to salt. • These datasets permit comparative transcriptomics and proteomics between salt-tolerant and salt-sensitive sesame varieties. Differential gene and protein expression profiles between varieties could help in understanding the salinity response and tolerance mechanisms of sesame, which helps plant breeders develop traditional breeding and biotechnological approaches to improve stress resistance in sesame. • These datasets will be of value for future characterization of functional genes and proteins involved in salt stress responses in sesame. • These datasets are also expected to provide valuable information for the study of molecular mechanisms underlying salt tolerance in other plants.

Data description
This dataset aims to provide the transcriptomic and proteomic profiling of 30 samples, from salt-tolerant and salt-sensitive sesame varieties. Fig. 1 provides an overview of our study design. In this work, 30 RNA libraries were sequenced using the Illumina HiSeq X ten platform and 150 bp paired-end reads were generated. Approximately 55 million RNA-seq reads were generated in each sample. After filtering, clean reads were mapped to the sesame genome v.1.0, resulting in 26,620 genes. Using weighted gene co-expression network analysis (WGCNA), 11 co-expression gene modules involved in responses to salt stress were identified in sesame ( Fig. 2 A and B). At the same time, 30 protein samples, labeled with iTRAQ tags, were analytically separated using an AB SCIEX nanoLC-MS/MS system (Triple TOF 6600). In total, 405,606 spectra and 16,921 unique peptides were generated and 6771 protein species were identified after mapping the sesame protein database (assembly S_indicum_v1.0). Finally, the relationship between mRNA and protein expression levels of differentially expressed genes (proteins), at different salt stress time points, were analyzed ( Fig. 3 ). Stringent technical design at each experimental stage enabled the generation of high-quality RNA-seq and iTRAQ data sets which will be of value for future characterization of genes and proteins expressed in sesame during salt stress responses. These datasets are also expected to provide valuable information for the study of molecular mechanisms underlying salt tolerance in other plants.

Plant materials and sample selection
The seeds of two sesame varieties were sown and germinated in a box containing halfstrength Hoagland solution. The whole cultivation process was accomplished in a growth chamber with a 16/8 h light/dark cycle at 28 °C [1] . 14 day old seedlings of salt-tolerant WZM3063 (ST) and salt-sensitive ZZM4028 (SS) varieties were used for this study. Plants were subjected to salt treatment (150 mM NaCl) at different time points. We collected shoot samples at 0 (control), 2, 6, 12, and 24 h after salt treatment, for RNA and protein extraction. These samples, containing three independent biological replicates, were immediately frozen in liquid nitrogen and stored at −80 °C until use.

RNA extraction, library preparation and sequencing
For each sample, an EASYspin Plus kit (Aidlab, Beijing, China) was used to extract RNA following manufacturer's recommendations. The RNA concentration was measured using a Qubit R RNA Assay Kit and Qubit R 2.0 Fluorometer (Life Technologies, CA, USA) and the RNA integrity number (RIN) was assessed using the RNA Nano 60 0 0 Assay Kit for the Bioanalyzer 2100 system (Agilent Technologies, CA, USA). RNA libraries were prepared using 3 μg RNA per sample, using a NEBNext R Ultra TM RNA Library Prep Kit for Illumina R (NEB, USA), following manufacturer's instructions. Library preparations were sequenced on an Illumina HiSeq X ten platform at the Novogene Corporation (Beijing, China) and 150 bp paired-end reads were generated, using methods described previously [3] .

RNA-seq data analysis
The raw data (Data Citation 1: NCBI Sequence Read Archive SRP186970) were filtered using Fastq clean v2.0, and clean reads were obtained by removing low quality reads and those containing adapter or ploy-N reads, according to parameters previously reported [4] . At the same time, the Q20, Q30 and GC contents of the clean data were calculated; all downstream analyses were based on these clean, high-quality data. An index of the sesame genome was built using Bowtie v2.2.3 and paired-end clean reads were aligned to the reference genome using TopHat v2.0.12. HTSeq v0.6.1 was used to count the read numbers mapped to each gene, and then the FPKM (fragments per kilobase of transcript per million fragments mapped) for each gene were calculated based on the length of the gene and read count. Correlation analysis of relationships among biological replicates was performed using the software R package (version 3.4.3). The relationship among gene clusters on normalized read counts was analyzed using a WGCNA package (version 1.68) in R [5] . Genes corresponding to the different co-expression modules are listed in Table S1. Differential expression analysis of the two groups was performed using the DESeq R package (version 1.18). Genes with an adjusted P value < 0.05 were assigned as statistically significant differentially expressed.

Protein extraction, iTRAQ labeling and LC-MS/MS
Protein was extracted from each sample using methods described previously [6] . Protein concentrations were measured using a Bradford assay and protein quality was analyzed on SDS-PAGE. The supernatant from each sample, containing precisely 0.1 mg of protein, was reduced by DTT, underwent iodoacetamide alkylation and was digested with Trypsin Gold (Promega, Madison, WI) at 37 °C for 16 h. After digestion, peptides were applied to a C18 cartridge to remove urea; desalted peptides were then dried by vacuum centrifugation. Desalted peptides were labeled with iTRAQ reagent (iTRAQ R Reagent-8PLEX Multiplex Kit, Sigma) following manufacturer's instructions. Differently labeled peptides were mixed equally and then desalted in 100 mg SCX columns. The iTRAQ-labeled peptide mix was fractionated using a C18 column (waters BEHC18 4.6 × 250 mm, 5 μm) on a Rigol L30 0 0 HPLC operating at 1 ml/min and subsequently analyzed on an AB SCIEX nanoLC-MS/MS system (Triple TOF 6600) at Novogene Genetics, Beijing, China.

iTRAQ data analysis
The raw LC-MS/MS data (Data Citation 2: ProteomeXchange PXD013013) were analyzed using Proteome Discoverer 2.2 software (PD 2.2, Thermo). Search parameters included a mass tolerance of 10 ppm for the precursor ion scans and a mass tolerance of 0.02 Da for the product ion scans. Carbamidomethyl was specified in PD 2.2 as a fixed modification. The oxidation of methionine, acetylation of the N-terminus and iTRAQ 8-plex of tyrosine and lysine were specified in PD 2.2 as variable modifications. A maximum of two mis-cleavage sites were allowed. Protein identification and relative abundance quantitation was carried out based on the sesame genome annotation database ( https://www.ncbi.nlm.nih.gov/genome/?term=sesamum ) as previously reported [7] . For protein identification, proteins with at least one unique peptide were identified at a false discovery rate of < 1.0% at the peptide and protein levels. Proteins containing similar peptides that could not be distinguished based on MS/MS analysis, were grouped separately. Reporter quantification (iTRAQ 8-plex) was used for iTRAQ quantification as described previously [8] . Pro-  tein quantification results were statistically analyzed using the Mann-Whitney Test and significant ratios, defined as P value < 0.05 and fold-changes > 1.5 or < 0.67, were used to screen differentially expressed proteins (DEP) [2] . Correlation analysis of biological replicate samples was performed using the IBM SPSS Statistics package version 22 and a heatmap was generated using the Morpheus web server ( https://software.broadinstitute.org/morpheus/ ). Finally, R software version 3.4.3 was used to analyze the relationship between mRNA and protein expression levels of selected genes or proteins.

Data records
The RNA-Seq and iTRAQ raw data were deposited in the Sequence Read Archive (SRA) of NCBI, under accession number SRP186970 (Data Citation 1) and the ProteomeXchange with identifier PXD013013 (Data Citation 2). Detailed descriptions of the raw data in the SRA and Pro-teomeXchange are provided in Tables 1 and 2 , respectively. In addition, RPKM gene expression and protein relative quantification data of different sam ples are included in Tables S2 and S3, respectively.

Quality control of RNA and protein
RIN is positively correlated on uniquely mapped reads in RNA-Seq, and all RNA samples with Agilent Bioanalyzer RIN scores above 6.3 were used to construct RNA libraries. Protein quality was analyzed by SDS-PAGE and all protein samples, used for this study, showed high quality (Fig. S1). Quality values for RNA and protein samples are listed in Tables 3 and 4 , respectively.

Quality evaluation of RNA-seq and iTRAQ data
The quality of the RNA-seq data was assessed and all samples were deemed of high quality in this study ( Table 3 ). For each sample, over 87.47% of the clean reads with a Q20 rate between 97.29 and 98.31% and Q30 rate between 93.39 and 95.66% were mapped to unique locations in the sesame genome ( Table 3 ). Using correlation analysis of the biological replicates, correlations between the replicates was high (R 2 > 0.91, Table S4).
In this study, 30 protein samples, labeled with iTRAQ tags, were divided into five run groups ( Table 4 ). To evaluate the quality of iTRAQ data, the length distribution of peptides, distribution of the precursor ion tolerance, distribution of the unique peptide number, distribution of protein sequence coverage and protein mass distribution for each run group were analyzed ( Fig. 4 ). To evaluate the reliability of protein quantification data, the correlation coefficient of protein expression among 30 samples was measured and a high correlation between biological replicates was recorded (R 2 > 0.88, Fig. S2).

Declaration of Competing Interest
Authors declare no conflict of interest.