RNA sequencing data for heat stress response in isolated medicago truncatula seed tissues

Legumes are important crop species as they produce highly nutritious seeds for human food and animal feed. In grain legumes, sub-optimal conditions affect seed developmental timing leading to impairment of seed quality traits acquired during seed maturation. To understand the molecular mechanisms of heat stress response in legume seeds, we analysed transcriptome changes of three seed tissues (i.e. em bryo, endosperm and seed coat) at four developmental stages, during seed maturation, from seed filling to mature dry seeds, collected under optimal and heat stress conditions in the model legume, Medicago truncatula (reference genotype A17). The total RNA sequencing generated a dataset of 48 samples, representing more than 57 Gb fastq raw data. Mapping, quantification and annotation of the data were based on fifth release of Medicago truncatula genome and provided expression profiles of 44,473 transcripts in seed tissues at different developmental stages and under optimal and stress conditions. Time-course and pairwise comparisons between optimal and stress conditions showed that 9182, 8315 and 3481 genes were differentially expressed due to heat stress in embryo, endosperm and seed coat respectively. Moreover, it highlighted a common set of 975 genes that were differentially expressed in all the seed tissues.


a b s t r a c t
Legumes are important crop species as they produce highly nutritious seeds for human food and animal feed. In grain legumes, sub-optimal conditions affect seed developmental timing leading to impairment of seed quality traits acquired during seed maturation. To understand the molecular mechanisms of heat stress response in legume seeds, we analysed transcriptome changes of three seed tissues (i.e. em bryo, endosperm and seed coat) at four developmental stages, during seed maturation, from seed filling to mature dry seeds, collected under optimal and heat stress conditions in the model legume, Medicago truncatula (reference genotype A17). The total RNA sequencing generated a dataset of 48 samples, representing more than 57 Gb fastq raw data. Mapping, quantification and annotation of the data were based on fifth release of Medicago truncatula genome and provided expression profiles of 44,473 transcripts in seed tissues at different developmental stages and under optimal and stress conditions. Time-course and pairwise comparisons between optimal and stress conditions showed that 9182, 8315 and 3481 genes were differentially expressed due to heat stress in embryo, endosperm and seed coat respectively. Moreover, it highlighted a common set of 975 genes that were differentially expressed in all the seed tissues.
© . After mRNA enrichment by rRNA depletion and oligo dT selection, RNA was fragmented and reverse transcribed to double-strand cDNA (dscDNA) by N6 random primer. The synthesized cDNA was subjected to end-repair and then was 3 adenylated. Adaptors were ligated to the ends of these 3 adenylated cDNA fragments. The ligation products were purified and many rounds of PCR amplification were performed to enrich the purified cDNA template using PCR primer, splint oligo and DNA ligase, followed by sequencing on BGISEQ-500 platform, generating an average 24 M reads of 50 bp per sample. Data format Filtered raw reads (FASTQ) Analysed RNA-seq data files (counts and DEG lists) Parameters for data collection Total RNAs were extracted from isolated embryo, endosperm and seed coat of four developmental stages of Medicago truncatula (A17) seeds that were harvested during maturation phase before and after acquisition of desiccation tolerance (respectively S1 and S2) and at the onset and after longevity acquisition (respectively S3 and S4) under standard temperature (20 °C day/ 18 °C night) and under high temperature (26 °C/24 °C). Description of data collection RNA-seq dataset was collected from single-end sequencing of cDNA libraries using BGISeq500 platform with 50 bp reads. Raw reads were filtered to remove adapters and low-quality reads, then mapped to Medicago truncatula reference transcriptome (version 5). Total mapped reads and number of transcripts (counts and TPM) were estimated using Salmon algorithm. Differential expressions of genes between standard and heat stress conditions were calculated using ImpulseDE2 and DEseq2 algorithms.

Value of the Data
• These data represent valuable seed transcriptome dataset of heat stress in the model legume Medicago truncatula because it has been generated from isolated seed tissues (embryo, endosperm and seed coat) along the whole seed maturation. • These data are useful resources for scientific communities working on seed and legume seed quality but also on plant stress biology to understand specific and common stress response pathways.
• These data provide new insights about molecular processes affected during heat stress in different seed tissues and candidate genes and molecular markers to predict seed quality in sub-optimal conditions.

Data Description
This manuscript presents a transcriptomic dataset obtained from dissected seed tissues (embryo, endosperm and seed coat) at four developmental stages of Medicago truncatula (A17) seeds during maturation phase [i.e. before and after desiccation tolerance (respectively S1 and S2) and at the onset and after longevity acquisition (respectively S3 and S4)] produced under standard temperature (20 °C day/ 18 °C night) and heat stress (26 °C/24 °C). Table 1 displays all sample names with information regarding seed tissues, treatments, developmental stages and numbers of 50 bp reads sequenced per sample. Figure S1 provides sequence quality histograms obtained from FastQC [1] and merged into a single graphic using MultiQC [2] . All 48 samples displayed Phred quality scores about 35, which corresponds to a base calling accuracy of about 99.95%. Salmon algorithm [3] was used to map and quantify raw reads to the reference Medicago truncatula transcriptome version 5 and corresponding count table is provided as Table S1 . After mapping and quantification, these 48 samples were, then, hierarchically clustered based on their dissimilarity scores to validate reproducibility of replicates ( Fig. 1 ). From this count table, differentially expressed genes were identified using time-course comparisons of the four developmental stages for embryo ( Table S2 ) and endosperm ( Table S3 ); and using pairwise comparisons of the two developmental stages for seed coat ( Table S4 ). Fig. 2 summarizes the differentially expressed genes (DEGs) in the three seed tissues between optimal and heat stress conditions (False Discovery Rate, FDR < 1%) and highlights a common set of 975 DEGs within the three seed tissues. Table 1 Summary of sample files with corresponding information related to seed tissues, seed developmental stages, growth conditions and numbers of cleaned 50 bp reads used for RNA-seq mapping.

Plant growth conditions and seed tissue sampling
Medicago plants were grown under standard conditions (20 °C/18 °C, 16 h photoperiod) in growth chamber. At flowering time, half of plants were kept at same optimal conditions (20 °C/18 °C, 16 h photoperiod) and half were grown under heat stress condition (26 °C/24 °C, 16 h photoperiod). Developing and mature seeds were collected from standard and stress conditions, seed tissues were quickly dissected then immediately frozen in liquid nitrogen before RNA extraction. According to our previous study [4] , four seed developmental stages were collected under standard conditions at 17 days after pollination (DAP), 26 DAP, 36DAP and 44 DAP and under heat stress conditions at corresponding developmental stages at 14 DAP, 17 DAP, 22 DAP and 28 DAP. These four developmental stages were shown to be the beginning of seed maturation and before acquisition of desiccation tolerance (S1), after the acquisition of desiccation tolerance and during seed filling (S2), at the onset of the acquisition of longevity and at the end of seed filling (S3) and mature seed (S4) according to our previous study [4] .

RNA isolation and sequencing
Total RNAs were extracted from embryo and endosperm that were collected at four stages during seed maturation under standard and heat stress conditions. RNAs were only extracted from S1 and S2 for seed coat, as during late maturation stage seed coat is dying and does not provide good quality RNA. All RNA extractions were performed using MACHEREY-NAGEL Nucle-oSpin® RNA Plus kit. RNA quality was checked using a nanodrop spectrophotometer ND-10 0 0 (NanoDrop Technologies) and a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). All samples with good qualities (260/280 and 260/230 absorbance ratio > 2; RNA Integrity Number, RIN > 8; 28S/18S > 1.7) were sent to Beijing Genomics Institute ( https://www.bgi.com ) (Hong Kong) for library preparation and sequencing on BGISEQ-500 platform, generating an average of 20 M reads of 50 bp per sample.

RNA-seq data analyses
After quality control of fastq files using FastQC [1] , high-quality reads were mapped onto the reference Medicago A17 transcriptome version 5 ( https://medicago.toulouse.inra.fr/MtrunA17r5. 0-ANR/downloads/1.7/MtrunA17r5.0-ANR-EGN-r1.7.fastaFiles.zip ) ( Table S5 ) [5] and transcript abundances were quantified with Salmon algorithm (version 0.14.1) [3] using the quasi-mapping mode and the '-validateMappings', '-useVBOpt' and '-seqBias' options. Reproducibility of replicates was validated by a hierarchical cluster analysis based on dissimilarity scores obtained from the normalized raw count values of the 48 samples using the 'cpm', 'dis' and 'hclust' functions in R. Differentially expressed genes (DEGs) were identified by time-course comparisons of the four developmental stages of embryo and endosperm using ImpulseDE2 [6] and by pair-wise comparisons of the two developmental stages of seed coat using DESeq2 [7] . All transcripts with change in expression showing adjusted p-values below 1% (no fold change cutoff) were considered as differentially expressed between standard and heat stress conditions and displayed on a Venn Diagram performed from the website: http://bioinformatics.psb.ugent.be/webtools/Venn/ .

Ethics Statement
This work does not contain any studies with human or animal subjects.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.