Reading between the Lines: RNA-seq Data Mining Reveals the Alternative Message of the Rice Leaf Transcriptome in Response to Heat Stress

Rice (Oryza sativa L.) is a major food crop but heat stress affects its yield and grain quality. To identify mechanistic solutions to improve rice yield under rising temperatures, molecular responses of thermotolerance must be understood. Transcriptional and post-transcriptional controls are involved in a wide range of plant environmental responses. Alternative splicing (AS), in particular, is a widespread mechanism impacting the stress defence in plants but it has been completely overlooked in rice genome-wide heat stress studies. In this context, we carried out a robust data mining of publicly available RNA-seq datasets to investigate the extension of heat-induced AS in rice leaves. For this, datasets of interest were subjected to filtering and quality control, followed by accurate transcript-specific quantifications. Powerful differential gene expression (DE) and differential AS (DAS) identified 17,143 and 2162 heat response genes, respectively, many of which are novel. Detailed analysis of DAS genes coding for key regulators of gene expression suggests that AS helps shape transcriptome and proteome diversity in response to heat. The knowledge resulting from this study confirmed a widespread transcriptional and post-transcriptional response to heat stress in plants, and it provided novel candidates for rapidly advancing rice breeding in response to climate change.


Introduction
Global food security is being threatened by the rapid population growth in a changing environment. One of the most devastating factors of climate change for agriculture is the increase in average surface air temperatures, which not only causes a reduction in yield but also compromises grain quality [1,2]. Rice (Oryza sativa L.) is particularly susceptible to heat stress [3]. It is one of the most important crops in the world and it plays a major role in the human diet, accounting for 9% of the global crop production [4]. Heat stress is detrimental during both rice vegetative and, mostly, reproductive phases [3,5]. Every 1 • C rise in the minimum growing temperature reduces yield by at least 10%, also resulting in damaged/chalky grains [6,7]. Nevertheless, many rice-producing regions are in agricultural land predicted to face increases in average temperature of between 2.0 to 4.9 • C by 2100, posing a significant threat to rice production [8,9]. To react and adapt to the threat of ever-increasing average temperatures, society needs to invest in new rice cultivars that maintain high productivity despite these arduous environments [10]. From a crop improvement perspective, it is possible to develop increasingly heat-resistant rice cultivars. For this, it is important to understand the responses that allow rice to tolerate heat [11].
Plants depend on the perception and appropriate response to temperature for their survival. The thermal perception system(s) responsible for transducing temperature information to the plant cellular processes is one of the great unknowns in plant science. Two events have been hypothesised as the primary temperature sensing events: alteration of membrane of temperature memory, allowing plants to survive subsequent and otherwise lethal heat stress conditions [54,59]. Therefore, it has become clear that heat stress impacts splicing regulation and, consequently, the transcriptome [52,60,61]. However, transcriptome studies analysing heat stress in rice have completely overlooked at AS. Therefore, the present work aimed at carrying out an RNA-seq meta-analysis to identify AS regulation in response to high-temperature stress in rice leaves.

Two-Thirds of the Rice Transcriptome Is Affected by Heat Stress at the Transcriptional and Post-Transcriptional Levels
To explore RNA-seq studies that contain data on rice leaf transcriptome in response to heat stress, we searched the Sequencing Read Archive (SRA) repository, available at the National Centre for Biotechnology Information (NCBI) database. This search returned 5 SRA studies of interest ( Table 1). The experimental designs and sequencing strategies of each study follow the needs of the original studies and, therefore, differ in several aspects such as cultivar and read-length (Table 1 and Supplementary Table S1). Heat stress treatment, in particular, varied in terms of heat intensity (from 6 • C to 20 • C increase) and duration of stress (from 10 min to 12 days). To ensure that heat stress response was the only factor being tested in our study, a subset of RNA-seq datasets were extracted from these 5 SRA studies (SRP071314, SRP101342, SRP190858, SRP110860 and SRP065945) and are hereinafter referred to as experiments 1 to 5, respectively (Table 1 and Supplementary  Table S2; more details in the Methods section).  1 After the heat stress, some plants were returned to the 30 • C chamber (heat shock recovery).
After download and pre-processing of the RNA-seq datasets of interest, transcriptspecific expression data was generated using Salmon [63] and the rice reference transcriptome available at the MSU Rice Genome Annotation Project Database [64]. This transcriptome contains transcripts from 55,548 genes, of which 6463 have two or more transcript isoforms (Supplementary Figure S1), allowing for AS studies in rice.
Differential gene expression and differential AS analysis were carried out with 3D RNA-seq [65] on an experiment-by-experiment basis, where we ensured that heat was the only factor being tested in our study. First, low expressed genes and transcripts were removed and expression data were normalised across samples. This identified a total of 26,834 genes expressed in at least one of the five heat stress experiments (Table 2), of which 60.6% were expressed in all experiments (Supplementary Figure S2A). The large overlap of expressed genes among the experiments confirms not only the similarity of the leaf/shoot transcriptomes from 2-to 5-week-old rice plants but also the power of the RNA-seq technology and analysis methods in detecting gene expressions across different sequencing strategies. Next, contrast groups were set up to compare the equivalent time-points with or without heat stress to control for time-of-day variation in expression due to photoperiodic and circadian changes. Differential gene expression (DE) analysis identified 17,143 genes that were significantly differentially expressed in at least one of the five heat stress experiments when compared with control samples (Table 2 and Supplementary Table S3). Experiments 3 and 5 greatly contributed to the identification of DE genes. This is probably due to increased statistical power mainly resulting from the higher number of biological replicates in experiment 3, and the higher number of RNA-seq datasets analysed in experiment 5. On average,~40% of DE genes were upregulated and~60% downregulated (Supplementary Figure S3). Next, we studied the overlap between the sets of DE genes from each experiment. Figure 1A shows a very small overlap among all experiments, reflecting differences in experimental designs of the original studies, mainly the different heat stress treatments and cultivars. To detect differentially alternatively spliced (DAS) genes, the expression changes between individual transcripts were compared to the gene level between conditions. We identified differential alternative splicing in 2162 genes in at least one of the five heat stress experiments ( Table 2 and Supplementary Table S3), of which 1,355 were also DE genes (regulated by both transcription and AS) in at least one of the five experiments and 807 genes that were not DE (regulated by AS only) in any of the five experiments. Experiment 3 showed the highest number of DAS genes, which were overrepresented for the GO terms 'gene expression' and 'RNA processing', among others (Supplementary Figure S4). The high number of DAS genes in Experiment 3 is most likely due to the high average read length (Supplementary Table S1), which increases quantification accuracy of transcript-specific expression, allowing for a more comprehensive investigation on AS [66]. Next, we studied the overlap between the sets of DAS genes. For figure clarity, experiments 1 and 2, which identified only 3 and 2 DAS genes, respectively, were not included. Figure 1B shows a small overlap among all experiments, reflecting differences in experimental designs of the original studies, as observed for the overlap of DE genes. In summary, of the 26,834 genes expressed in rice leaves/shoots, 66.9% exhibited significantly altered levels of differential gene and/ or transcript-level expression, suggesting a widespread transcriptional and post-transcriptional response to heat stresses.

Novel Putative Heat Stress Response Genes in Rice
The identification of DE and DAS genes raised the question of whether these have been described previously as heat-response genes. Of the five studies analysed here, only two have carried out a differential gene expression analysis that identified heat response genes [19,22] (Supplementary Table S4A). An additional two RNA-seq studies analysed the rice heat response while taking into account time-of-day variations in gene expression. Gonzáles-Schain et al. [67] have identified 630 DE genes in pollinated pistils and anthers of rice plants (cultivar N22) after 6 h at 38 • C, compared to 29 • C controls. Liu et al. [68] have identified 7573 DE genes in WT rice seedlings (cultivar Nipponbare) after 3 h at 45 • C, compared to 29 • C controls. Both studies have disclosed DE gene lists using MSU gene IDs (Supplementary Table S4A), which allowed a comparison with the sets of DE and DAS genes identified here. As expected, 99% of the heat response DE genes identified in Dossa et al. [19] and Wilkins et al. [22] were also identified here, while there was only a 69% overlap with DE genes identified by Gonzáles-Schain et al. [67] and Liu et al. [68] ( Figure 2), reflecting differences in experimental designs among the studies, mainly heat stress treatment and tissue sample. Crucially, we identified an additional 10,602 novel DE-only heat stress-response genes (Supplementary Table S4B).
The list of novel DE-only genes included 51 epigenetic proteins, 317 transcription factors, 27 heat-shock proteins, and 90 splicing-related factors (Supplementary Table S4C). The analysis here also identified 1,690 novel DAS heat stress-response genes ( Figure 2). Therefore, in addition to the previously identified heat-response genes in rice, we provide a substantial contribution of novel genetic components underlying the response to temperature increase in rice leaves/shoots.

Key Regulators of Gene Expression Are Regulated by Heat-Induced AS
To understand the importance of heat-induced differential alternative splicing in rice, we asked if AS was potentially affecting the levels or function of key DAS genes. We focused our analysis on putative regulators of gene expression from four different categories: epigenetic proteins, transcription factors, heat-shock proteins and splicingrelated factors. We observed heat-induced DAS in 9, 104, 15 and 45 genes from these categories, respectively (Table 3). Below we report important gene examples of each category and provide detailed evidence that heat stress-dependent AS is an important further layer of regulation that controls the function of genes related to the heat stress response in rice leaves.

Example of DAS Genes Encoding Epigenetic Proteins
OsHDAC6 (LOC_Os06g37420), a novel DAS-only gene, is a histone deacetylase (H3-K14 specific) localised in chloroplasts [69] and it is involved in drought and salt stress responses [70]. Upon heat stress, OsHDAC6 undergoes a rapid differential alternative splicing only (not DE) of two transcript isoforms that differ by an alternative 3 splice site of exon 11 ( Figure 3A). Isoform LOC_Os06g37420.1 codes for a protein containing the full histone deacetylase (Hist_deacetyl) domain (PFAM accession PF00850.12), whereas isoform LOC_Os06g37420.2 has a premature termination codon (PTC), which is probably recognised by the nonsense-mediated RNA decay (NMD) pathway, preventing its translation. The proportion of LOC_Os06g37420.1 transcripts decreased from 74% to 43% of the total transcripts (average ∆PS = 0.31) when rice plants were transferred from 30 • C to 40 • C for 90 min, respectively ( Figure 3A). In summary, OsHDAC6 expression is rapidly altered post-transcriptionally upon heat stress, and we speculate it might affect its protein levels, consequently impacting the epigenome.
LOC_Os01g40670, a novel DE&DAS gene, encodes a member of the Single Myb Histone (SMH) family protein, a plant-specific group of proteins involved in telomere structure and metabolism [71]. In control conditions, this gene produces two transcript isoforms, in equal proportions, that differ by removal or retention of the fifth intron ( Figure 3B). The AS alters the C-terminal end of the predicted proteins, near a domain involved in stabilising protein dimer formation, known as the coiled-coil domain [71], suggesting the AS might affect protein function. We observed a significant up-regulation of this gene upon 24 h of heat stress (DE), where the proportion of intron-5-containing transcripts increased to 96% of the total transcripts (DAS; Figure 3B), which might impact telomere function.

Example of DAS Genes Encoding Transcription Factors
LOC_Os08g05510, a novel DAS-only gene, encodes a member of the MYB transcription factor family found in the nucleus [72]. This gene produces two transcript isoforms that differ by skipping or including the fifth exon ( Figure 4A). The AS alters the C-terminal end of the predicted proteins, suggesting the AS might affect protein function. The relative abundance of these transcripts is reversed in response to 24 h of heat stress ( Figure 4A), which suggests this gene is involved in the stress response of rice. LOC_Os10g10990, a novel DAS-only gene, encodes a transcription initiation factor IIF alpha-subunit-domain-containing protein (TIFII-alpha). This gene produces three transcript isoforms that differ at alternative 5 and 3 splice sites of exon 7, which contains the coding sequence for the TFIIF_alpha domain (PFAM accession PF05793.5) ( Figure 4B). As a result, AS can putatively affect the DNA binding function of LOC_Os10g10990, such as affecting the number of its DNA target sites. We observed an isoform switch event between transcripts LOC_Os10g10990.1 and LOC_Os10g10990.3 upon 30 min (Supplementary Figure S5A) and 24 h of heat stress ( Figure 4B) when compared to controls, which suggests different targets are controlled by LOC_Os10g10990 in different temperature conditions.
OsGAMYB (LOC_Os01g59660), a novel DE&DAS gene, encodes a member of the MYB transcription factor family involved in gibberellins-regulated events, such as floral organ and pollen development [73]. In rice leaves/shoots, this gene produces three transcript isoforms that differ by removal or retention of the third and/or the fourth introns ( Figure 4C). The I3R event shortens the C-terminal end of the predicted proteins when compared to the other isoforms, suggesting altered protein function. In control conditions, I3R isoforms represent only 12% of the total transcripts, while 10 min of heat stress increases its relative and absolute abundance to 66% of the total transcripts (average ∆PS = −0.54, Figure 4C), despite downregulation at the gene level, suggesting an involvement of OsGAMYB in early heat stress response of rice.
LOC_Os03g06630, a DE&DAS gene, is a heat stress transcription factor involved in the heat stress response of rice leaves, as identified previously [22,68] and confirmed here. This gene produces two transcript isoforms that either include or skip exon 2, which contains the coding sequence for the DNA binding domain HSF_DNA-bind (PFAM accession PF00447.10) ( Figure 4D). As a result, AS can putatively either impair DNA binding function or increase the number of DNA target sites of LOC_Os03g06630. In control conditions, both isoforms are equally abundant, but upon heat stress, exon-2-skipping isoform (LOC_Os03g06630.2) increases both its relative and absolute abundances ( Figure 4D), suggesting a differential control by LOC_Os03g06630 on downstream targets.

Example of DAS Genes Encoding Heat-Shock-Related Proteins
LOC_Os01g32870, a novel DAS-only gene, is a heat shock protein containing a DnaJ domain and it is actively regulated at the protein level during rice embryogenesis [74]. This gene shows temperature-dependent DAS-only, altering the relative levels of two isoforms that differ by two AS events: (1) retention or removal of the first intron, which alters the 5 untranslated region (UTR); and (2) an alternative 3 splice site of exon 4, which adds or removes, respectively, 3 amino acid residues of the predicted proteins, near the DnaJ domain (PFAM accession PF00226.24) ( Figure 5A). Post-transcriptional regulation by intron retention (IR) within the 5 -UTR can affect transport, stability and translatability of mRNAs [75], whereas changes in the protein primary structure can impact its function. These results suggest that AS of LOC_Os01g32870 in response to heat stress might impact its protein levels and function.
LOC_Os08g36150, a DE&DAS gene, encodes a member of the activator of the Hsp90 ATPase homolog 1-like family protein. Upon heat stress, this gene is upregulated at the gene level, as identified previously [22,68] and confirmed here. Our careful investigation at the transcript level demonstrates that LOC_Os08g36150 also undergoes a rapid heat-induced DAS. For example, in experiment 4, intron-1-retained (I1R) isoform LOC_Os09g31482.2 changes from 5% of the total transcripts in 22 • C control conditions up to 40% of the total transcripts after a 10-min-treatment at 42 • C (average ∆PS = −0.35) ( Figure 5B). The roles of both the fully spliced (LOC_Os09g31482.1) and the I1R transcripts are unknown but we can propose three hypotheses. First, both isoforms are translated, generating proteins with a complete and an incomplete, respectively, AHSA1 domain (PFAM accession PF08327.4), required for full functionality. Second, the PTC of I1R transcripts is recognised by the NMD pathway and it is degraded, which prevents its translation. Lastly, the I1R transcript is retained in the nucleus, as observed for many intron-containing transcripts in animals and plants [76][77][78], where it can be post-transcriptionally spliced, generating the fully spliced isoform. Therefore, it is likely the heat-induced AS in LOC_Os08g36150 is an important mechanism that controls its protein levels.

Example of DAS Genes Encoding Splicing-Related Factors
OsC3H60 (LOC_Os09g31482), a DE&DAS gene, encodes a U2 snRNP auxiliary factor small (35 kDa) subunit A and it is required in diverse biological roles by associating with RNA molecules [79]. In addition to differential gene expression due to heat stress, identified previously [22,68] and confirmed here, OsC3H60 also undergoes significant heat-induced AS. For example, in experiment 3, isoform LOC_Os09g31482.1 changes from 50% of the total transcripts in 28 • C control conditions down to 11% after a 24 h-treatment at 45 • C (average ∆PS = 0.39) (Supplementary Figure S5B). In experiment 5, isoform LOC_Os09g31482.3 changes from <1% of the total transcripts in 30 • C control conditions up to 50% after 60 min at 40 • C (average ∆PS = −0.5) ( Figure 6A). The two isoforms differ by an alternative 3 splice site of exon 2 that alters the 5 untranslated region (UTR) (Figure 6A), which can impact transcript stability and/or translatability, for instance adding/removing upstream ORFs [80], putatively controlling protein levels in a fast molecular response to temperature increase.
LOC_Os12g36740, a novel DAS-only gene, encodes a putative member of the serine/ arginine-rich (SR) splicing factor family, a group of proteins involved in pre-mRNA splicing [81]. In control conditions, this gene produces two transcript isoforms, in almost equal proportions, that differ by an alternative 3 splice site of exon 4, which either introduces or not a PTC that is likely recognised by the NMD pathway ( Figure 6B). Therefore, LOC_Os12g36740 might produce only one protein isoform, from transcript LOC_Os12g36740.1. We observed that LOC_Os12g36740 undergoes a DAS change only (not DE) upon 24 h of heat stress, where the proportion of PTC-containing transcripts increased to 75% of the total transcripts (average ∆PS = −0.27) (Figure 6B), which likely results in decreased protein levels.

Discussion
We carried out a robust meta-analysis of publicly available RNA-seq datasets to explore the rice heat response and identified 15,818 DE-only genes and 2162 DAS genes. These results go in line with the widespread transcriptional and post-transcriptional response to heat stress observed in other plant species [48,[51][52][53][54][55][56][57]. Notably, 10,602 DEonly and 1690 DAS genes were not identified in previous gene-level RNA-seq analyses [19,22,65,66] and are novel heat response genes, and included many key regulators of gene expression, namely heat-shock-related protein genes, transcription factors, epigenetic proteins and splicing-related factors, among others. Crucially, DAS of regulators of gene expression suggests further effects of AS at the epigenetic, transcriptional, post-transcriptional and post-translational levels.
Our detailed analysis on heat-induced AS of key regulators of gene expression suggests that the primary consequence and function of AS is to affect protein expression. We observe that this could be achieved qualitatively and/or quantitatively, as reported by other studies [24,25]. Regarding qualitative control, heat-induced AS likely increased the possible number of protein isoforms, which can have a different activity, localisation, stability and ability to interact with other proteins and substrates [25]. Concerning quantitative control, heat-induced AS could regulate protein expression levels by determining an mRNA cellular localisation, modifying how well an mRNA is translated, and/or decreasing mRNA stability and targeting it for degradation [25,82]. Such AS is especially crucial when altered protein levels are needed despite opposing transcriptional behaviour [83]. Therefore, the extensive AS information identified here demonstrates a much higher degree of complexity of gene regulation in rice in response to heat that has been significantly underestimated by analysis of RNA-seq data at the differential gene expression level only.
RNA-seq is a powerful tool to analyse the complex molecular mechanisms underlying plant response to environmental cues. It has the potential to detect genome-wide gene expression at the level of both genes and transcripts, providing a means to study post-transcriptional processes such as AS. Heat stress studies of rice plants using RNA-seq have only focused on differential gene expression analysis [19,22,67,68], overlooking at AS. This is partially explained by the fact that (1) accurate genome-wide quantification of transcript variants generated by AS and (2) the cost and complexity in analysing AS are neither trivial. We are now in a position where both of these issues have been solved and we can now readily unlock the discovery potential of underexplored publicly available RNA-seq data [65]. Here, we used Salmon [63] and the 3D RNA-seq [65] tools to analyse a subset of RNA-seq datasets extracted from 5 SRA studies. Validation of this pipeline-including choice of software, parameters and methods-for accurate quantification of transcript-specific abundances for AS analyses was demonstrated previously using data of the Arabidopsis cold-response [49,84]. Therefore, we are confident that our analysis allowed the generation of robust quantitative transcript-specific expression data and full differential expression and differential AS analyses, shedding light on the important problem of heat-induced AS in rice leaves.
We confirmed two important caveats on DAS analysis while carrying out our studies. Firstly, RNA-seq read length has a direct effect on the identification of DAS genes. Longer reads are generally better for (pseudo-)mapping because they are more likely to differentiate alternative transcripts, improving isoform-specific detection and quantification [66]. For example, experiment 3 analysed here had the highest average read length and it retrieved the highest number of DAS genes. Therefore, the longest reads (≥100 bp) should be used if alternative splicing analysis is the biological question one seeks to answer. Secondly, a high-quality reference transcript dataset (RTD) is crucial for a comprehensive alternative splicing study using the robust RNA-seq analysis carried out here [84,85]. The rice reference transcriptome we used [64] contains 6,463 genes that have two or more transcript isoforms. Although this dataset allowed the identification of DAS genes reported here, it is an underrepresentation of alternative splicing in rice genes [36]. Therefore, frequent improvements of the rice RTD holds the key if we are to continually expand our understanding of the regulations of gene expression in rice.
One final question that heat-sensitive AS raises is how variation in transcript isoforms is regulated by temperature changes. Temperature might have a direct effect on the rate of biochemical reactions that potentially affect transcription and splicing. Splicing is largely co-transcriptional and slower rates of RNA polymerase II elongation promote selection of alternative splice sites [86] and epigenetic modifications affect AS patterns in response to heat [87,88]. Additionally, heat is known thermodynamically to unfold mRNAs at AU-rich regions located in both 5 and 3 UTRs, which in turn facilitates RNA degradation [14]. Another possibility is that heat-affected transcript isoforms might involve miRNA regulation mechanisms [89]. Lastly, alternative pre-mRNA splicing is highly influenced by exonic and intronic regulatory cis-sequences as they serve as binding sites for trans-acting factors such as heterogeneous nuclear ribonucleoproteins and SR proteins, which then interact with the spliceosome [24]. The level and activity of hundreds of splicing factors change in response to temperature due to the transcriptional and post-transcriptional regulation, and the changes in cellular localisation and post-translational modifications, suggesting their involvement in thermal stress AS regulation [49,51,52,[90][91][92]. In our work, we identified DE and/or DAS genes coding for epigenetic proteins and splicing-related factors, suggesting they are potential candidates involved in the differential AS regulation in response to heat stress in rice.
Novel molecular candidates involved in rice stress tolerance is widely recognised as critical. Despite the current knowledge on rice heat-tolerance [3,7], breeding approaches to develop heat tolerance in rice has had limitations [3,93] and challenging factors are currently bringing further pressure onto breeding better crops for the future [94]. Therefore, RNA-seq data mining and DAS analysis, such as the ones carried out here, greatly contribute to expanding our knowledge on the effect of environmental stresses in plants.
In terms of future directions, molecular experiments might be used to further confirm our results and to explore this novel AS regulation in different HS conditions. Moreover, the construction of splicing and transcriptional networks from RNA-seq data on rice heat stress will further explain the roles of AS and transcriptional responses to thermotolerance, providing additional insight and strategies to rapidly advance rice breeding in response to changing environments.

Selection and Pre-Processing of RNA-seq Datasets
The flow chart of the meta-analysis carried out here is shown in Figure 7. The RNAseq studies analysed in this work have been described previously [14,19,20,22,60] and were selected from the SRA repository according to the following criteria: (1) studies should contain data from leaf samples of wild-type (non-genetically modified) rice plants for both control and heat stress treatments; (2) reads should have ≥ 50 bp, to increase the performance of transcriptome analysis focused on alternative splicing. For each SRA study identified, additional filtering was carried out to reflect the objectives of our study. Firstly, RNA-seq datasets containing data from abiotic stresses not related to heat, namely drought and salt stresses, were not included in our analysis. Additionally, in two of the studies [19,20], the heat stress treatment included pathogen-and mock-inoculated tissues. Only samples of healthy (mock-inoculated) tissues were analysed here. In the work of Luo et al. [62], the heat stress treatment was applied for 1, 3, 6, 12, and 24 h after the 0 h control. To take full account of the time-of-day variations in gene expression due to photoperiod [95] and circadian control [96], only time points 0 h (control) and 24 h (heat stress) were analysed. Aside from their temperature difference, time point 24 h is virtually identical to 0h control in terms of their zeitgeber, the only difference being they are 24 h apart, such that few time-of-day variations in transcript expression were expected when comparing these time points. Taking full account of the time-of-day in our analysis is especially important because the transcriptome dynamics of rice leaves is widely affected by the entrained circadian clock and temperature [97]. Lastly, in the work of Su et al. [14], RNA-seq libraries were generated from 14-day-old rice shoot tissue after a brief (10 min) treatment at 22 • C (control) or 42 • C (heat shock), which were the only datasets analysed here. The authors carried out additional RNA-seq experiments that quantified transcript abundance change over a longer time course post-heat shock, which they call it 'recovery period'. In this recovery period, plants that have undergone 10 min of 42 • C or 22 • C treatments were transferred to 30 • C conditions for 10 min, 50 min, 1 h 50 min and 9 h 50 min. Neither of those additional RNA-seq datasets was analysed here because the temperature increase in control samples (from 22 • C to 30 • C) compared against temperature decrease in heat shock samples (from 42 • C to 30 • C) adds additional covariates in the heat stress response, which is not the interest of our work. In summary, 345/637 RNA-seq datasets were selected from 5 SRA studies (Supplementary Table S2), which are referred to as experiments 1 to 5 (Table 1).
Datasets were downloaded using an in-house script. First, fastq-dump, available in the SRA-Toolkit version 2.10.9 http://ncbi.github.io/sra-tools/ (accessed on 8 January 2021), was run with the options "-gzip" and "-split-spot". Residual adaptor sequences at both 5 and 3 ends were removed from raw reads using Trimmomatic version 0.39 [98] with quality score threshold set at 33 and a minimum length of the trimmed read kept at 36.

Quantification of Transcript-Specific Expression
To deliver accurate quantitative transcript-specific expression data from 345 RNAseq datasets (Supplementary Table S2

Data-Exploratory Analysis with 3D RNA-seq
To perform a full differential gene expression and differential AS analysis for each study we used the 3D RNA-seq app [65]. The program's pipeline can be divided into three modules: data generation, data pre-processing and 3D analysis.
For the data generation module, time (when applicable), cultivar (when applicable), treatment, and biological replicates were set as factors. Then, read count and Transcript per Million reads (TPM) information were generated from each Salmon output file using the lengthScaledTPM method.
For the data pre-processing module, low expressed genes and transcripts were removed. This removal was carried out independently for each study, where an expressed transcript should have a minimum count per million (CPM) of 1, in at least m samples. Each cut-off setting (m) was chosen based on its effect on the mean-variance trend for the RNA-seq log 2 -transformed read counts, which should follow the negative binomial distribution. Next, to reduce sequencing biases, normalisation of read counts was carried out across the libraries for each study using the weighted Trimmed Mean of M-values (TMM) method.
For the 3D analysis module, contrast groups of simple pair-wise analyses were defined, where corresponding time-points from the heat stress samples were compared to those of the control samples. For the work of Wilkins et al. [22], not only control samples but also 'recovery from heat shock' samples were compared, in parallel, to heat shock samples. This approach would allow for the identification of variations in gene expression due to the introduction and withdrawal of the heat stress. Next, we applied the limma voomWeights pipeline for differential expression analysis at both gene and transcript levels, setting the following thresholds: absolute log 2 -fold change (L2FC) ≥ 1, adjusted p-value (FDR) < 0.05, and absolute change in percentage spliced (∆PS) ≥ 0.1 (F-test). The ∆PS is calculated as the percentage change in the abundance of a transcript compared to the total expression level of its gene.

Gene Functional Annotations
Venn diagrams were generated at http://bioinformatics.psb.ugent.be/webtools/ Venn/ (accessed on 19 March 2021). Putative gene functions and gene lists were obtained from the MSU Rice Genome Annotation Project Database [64] and Oryzabase [99] on 25 March 2021. Schematic diagrams of gene structures were made with the help of the Exon-Intron Graphic Maker program http://wormweb.org/exonintron (accessed on 22 February 2021). Gene Ontology statistical overrepresentation test was carried out with Panther version 16.0 http://pantherdb.org/ (accessed on 8 April 2021), where we chose the annotations set 'GO biological process complete' and the binomial test with no correction.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants10081647/s1, Figure S1: Distribution of the number of genes (y-axis) per number of transcripts per gene (x-axis) before (red) and after (blue) filtering low expressed transcripts and genes using the RNA-seq data extracted from Luo et al. [62]; Figure S2: Venn diagram showing the overlap of expressed genes from each experiment analysed here; Figure S3: Number of up-and down-regulated genes; Figure S4: GO analysis of DAS genes identified in Experiment 3; Figure S5: Heat-induced changes in transcript expression of two DAS genes; Table S1: Sequencing information for each study analysed here; Table S2: Meta-data table containing experimental design information of 345 RNA-seq datasets analysed here; Table S3: DE and DAS genes IDs and functions, organised by experiment; Table S4 [14,19,20,22,62]. The script used to carry out the data mining analysis is available at https://github.com/charbavito/pub_ project_rice_data_mining/blob/master/FTQ_script_v0.1.py. (accessed on 8 April 2021).