Dataset for transcriptome and physiological response of mature tomato seed tissues to light and heat during fruit ripening

Seed vigor is an estimate of how successfully a seed lot will establish seedlings under a wide range of environmental conditions, with both the embryo and the surrounding endosperm playing distinct roles in the germination behaviour. Germination and seedling establishment are essential for crop production to be both sustainable and profitable. Seed vigor traits are sequentially acquired during development via genetic programs that are poorly understood, but known to be under the strong influence of environmental conditions. To investigate how light and temperature have an impact on the molecular mechanisms governing seed vigor at harvest, RNA sequencing was performed on Solanum lycopersicum cv. Moneymaker seed tissues (i.e. embryo and endosperm) that were dissected from fruits that were submitted to standard or high temperature and/or standard or dim light. The dataset encompassed a total of 26.5 Gb raw data from mature embryo and endosperm tissues transcriptomes. The raw and mapped reads data on build SL4.0 and annotation ITAG4.0 are available under accession GSE158641 at NCBI Gene Expression Omnibus (GEO) database. Data on seed vigor characteristics are presented together with the differentially expressed gene transcripts. GO and Mapman annotations were generated on ITAG4.0 to analyse this dataset and are provided for datamining future datasets.


b s t r a c t
Seed vigor is an estimate of how successfully a seed lot will establish seedlings under a wide range of environmental conditions, with both the embryo and the surrounding endosperm playing distinct roles in the germination behaviour. Germination and seedling establishment are essential for crop production to be both sustainable and profitable. Seed vigor traits are sequentially acquired during development via genetic programs that are poorly understood, but known to be under the strong influence of environmental conditions. To investigate how light and temperature have an impact on the molecular mechanisms governing seed vigor at harvest, RNA sequencing was performed on Solanum lycopersicum cv. Moneymaker seed tissues (i.e. embryo and endosperm) that were dissected from fruits that were submitted to standard or high temperature and/or standard or dim light. The dataset encompassed a total of 26.5 Gb raw data from mature embryo and endosperm tissues transcriptomes. The raw and mapped reads data on build SL4.0 and annotation ITAG4.0 are available under accession GSE158641 at NCBI Gene Expression Omnibus (GEO) database. Data on seed vigor characteristics are presented together with the differentially expressed gene transcripts. GO and Mapman annota-tions were generated on ITAG4.0 to analyse this dataset and are provided for datamining future datasets. © 2020 Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ) Table   Subject Biological sciences Specific subject area Omics: Transcriptomics Plant Science: Plant Physiology Type of data

Value of the Data
• This is a tissue-specific seed transcriptome dataset for tomato obtained from fruits that were ripened ex planta under four environmental conditions (high/standard temperature and standard/low light intensity). • These data are a useful resource for the scientific community studying the developmental programs of various seed tissues and working on the effect of maternal environments on seed vigor. Annotation files of Gene Ontology or Mapman format are provided for the recently published SL4.0 genome version and can be used for enrichment analysis and data mining. • These data provide new insights on tissue-specific molecular processes affected by heat and light leading to defects in seed vigor. They allow the identification of candidate genes as well as molecular markers that might predict seed vigor on tomato. • The dissection of seeds into embryo and endosperm will contribute to decipher what the underlying molecular events are in the different tissues that determine seed vigor.

Data Description
This article presents a dataset of mRNA sequencing transcriptome profiling from isolated embryo and endosperm tissues of mature tomato ( Solanum lycopersicum cv. Moneymaker) seeds that were isolated from fruits that were ripened ex planta from breaker stage onwards under  Table 1 Vigor characteristics of seeds extracted from fruits that ripened ex planta at different temperature and light conditions. After drying, seeds were imbibed in the dark at 20 °C in water, 71 mM Nacl or a PEG 80 0 0 solution corresponding to −0.3 MPa. Data are the mean of three replicates of 50 seeds. Values in brackets represent standard deviation. DL = dim light; HL, high light; HT, high temperature; ST, standard temperature, t50, time of imbibition necessary to reach 50% germination).  Table 1 . Table 2 shows the quality of the transcriptome data and the mapped sequences on the reference tomato transcriptome build SL4.0 and annotation ITAG4.0 that is available at the Solgenomics website (ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG4.0_release/ [1] ). On average 21 million reads out of the 25 million read sequenced per sample were mapped on the reference transcriptome ( Table 2 ). Sequencing quality was checked using FastQC mean quality scores ( Fig. 2 ). All samples displayed high quality scores with Phred scores around 35.
The number of up-or down-regulated genes (DEGs) between standard maturation condition (standard temperature (ST) + high light (HL)) and stressful environments (ST + dim light (DL), high temperature (HT) + HL and HT + DL) are shown in Table 3 . GO and Mapman annotations of ITAG4.0 were generated and are provided in Table S1 and Table S2, respectively. Enriched GO terms for up-and down-DEGs between standard ripening condition and stressful environments are shown in Fig. 3 for the embryo and Fig. 4 for the endosperm.

Plant material and growth conditions
Plants of Solanum lycopersicum cv. Moneymaker were grown under controlled greenhouse conditions in 10 L pots containing substrate (Irish peat, perlite, coconut fiber; 50/40/10; v/v/v), watered with a nutrient solution and supplemented with 16 h of 250 μmol m −2 s −1 light. The day and night temperatures were respectively maintained at 23 °C and 20 °C. Breaker fruits (i.e. 63 DAF) were collected from the 3rd to 6th trusses and transferred to a growth chamber for 7 days under 4 different environments: standard temperature (ST, 23 °C day/20 °C night) + high light (HL, 16 h photoperiod 300 μE m −2 s −1 ), ST + dim light (DL, 16h photoperiod 25 μE m −2 s −1 ), high temperature (HT, 32 °C day/26 °C night) + HL or HT + DL. For seed vigor analyses, seeds were collected and incubated for 1 h in 0.4 g/L pectolytic enzyme solution (Lafazym CL®, Laffort, France) f ollowed by extensive washing with water. Then, seeds were blotted dry on filter paper and rapidly dried at 43% RH under airflow at room temperature for 2d and stored in hermetically sealed bags at 4 °C prior to seed vigor tests. For RNA extraction, three replicates of 10 seeds were collected from the equatorial section of 2 different fruits for each replicate. Embryo and endosperm were hand-dissected, then immediately frozen in liquid nitrogen and stored at −80 °C.

Seed vigor tests
To assess final percentage of germination, triplicates of 50 dried seeds were imbibed on filter paper (Whatman No1) in 9 cm diameter Petri dishes at 20 °C in the dark, either in water for 8 days, in −0.3 MPa polyethylene glycol (PEG 80 0 0, Sigma) solution or in 71 mM NaCl solution (equivalent to −0.3 MPa) for 15 days. Seeds were considered germinated when the radicle had protruded 1 mm from the seed coat. Germination speed in water was determined by daily scoring of germinated seeds and calculated as the time for the seed lot to reach 50% of germination (t50) using the fit of a three-parameter log-logistic model.

RNA analysis and functional annotation
After quality control of fastq files using FastQC [2] , high-quality reads were mapped onto the reference tomato transcriptome build SL4.0 [1] and transcript abundances were quantified with Salmon algorithm (version 0.14.1) [3] using the quasi-mapping mode and the '-validateMappings' and '-seqBias' options. Before mapping, the reference genome was indexed with Salmon using k-mers of length 31. Coverage estimates and statistics of the reads mapping are presented in Table 2 . Differential expression of transcripts were calculated via DESeq2 [4] . Transcripts were considered differentially expressed if log 2 fold change (FC) was above 1 or below −1 and if Benjamini-Hochberg adjusted p -value threshold was below 0.05. Data on total counts and differential gene expression can be found at https://data.mendeley.com/datasets/ 6h44fvz8x9/1 . Gene Set Enrichment Analysis (GSEA) on GO Terms were performed with hypergeometric test using clusterProfiler package (v3.10.1) in R [5] . GO Terms were considered as enriched if Bonferroni adjusted p-value threshold was below 0.05. Gene Ontology (GO) annotation on SL4.0 was generated using OmicsBox ( https://www.biobam.com/omicsbox/ , [6] ) (Table  S1) and Mapman annotation was generated using Mercator v4 [7] (Table S2). The total gene enrichment analysis can be found at https://data.mendeley.com/datasets/6h44fvz8x9/1 .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.