Identification, evolution and alternative splicing profile analysis of Serine/Arginine-Rich Protein Splicing Factors (SR Proteins) in poplar, Arabidopsis, grape, and papaya

Alternative splicing (AS) regulates gene expression and produces proteome diversity. Serine/Arginine-Rich Protein Splicing Factors (SR Proteins) are important splicing factors that play significant roles in spliceosome assembly and splicing regulation, and play roles in regulating plant stress. In this report, we analyzed 30 SR genes in Populus trichocarpa, 18 genes in Arabidopsis thaliana, 14 genes in Vitis vinifera and 9 genes in Carica papaya. The SR proteins contained RRM and RS conserved domains, and based on different structural domain organization were divided into six subfamilies (SR, SC, SCL, RS, RSZ and RS2Z). Gene duplication analysis revealed 94 paralogs and 408 orthologs in the four species, and the SR genes had undergone strong purifying selection. A number of stress-related cis-elements (ABRE, LTR, MBS, TC-rich repeats cis-acting element) were identified in the promoters of the SR genes. Microarray and RNA-seq data showed that SR genes expression in different tissues of the four species responded differently to abiotic stress. Poplar, Arabidopsis and grape SR genes had many splice isoforms. Moreover, 26 of 30 poplar SR genes had intron retention (IR) events, and the relative IR rates of 27 intron sites in the poplar SR genes changed significantly under cold, heat, drought and salt stress conditions. This study provides valuable resources for the gene structure, function, and evolution of poplar SR proteins.

Similarly, OsRSp29 and OsRSZp23 play a role in pre-mRNA splicing [17]. AtSC35 plays a key role in regulating the flowering locus C, which regulates flowering in Arabidopsis [9].
The SR genes are extensively alternatively spliced genes. In Arabidopsis, Palusa et al. find that 15 Arabidopsis SR genes produce 95 transcripts under hormone or abiotic stresses.
The cold, heat and hormone stresses significantly alter the AS of several Arabidopsis SR genes, indicating that they are involved in response to environmental stresses [18]. At the seedling stage, 13 Arabidopsis SR genes produce 75 transcripts, 53 of which contain premature termination codons (PTCs) [19]. In maize and sorghum, 92 and 62 SR transcripts are detected respectively [20]. Arabidopsis SR34b is involved in cadmium resistance [21]. However, the detailed function of most plant SR genes and the importance of a large number of AS events remain unclear.
Poplar, Arabidopsis, grape, and papaya belong to the dicot. Poplar, grape and papaya are important economic species. Arabidopsis is a classic model plant. In this report, we analyzed 30 SR genes in Populus trichocarpa (poplar), 18 genes in Arabidopsis thaliana (Arabidopsis), 14 genes in Vitis vinifera (grape) and 9 genes in Carica papaya (papaya).
Then we conducted comprehensive analysis of the SR genes from poplar, Arabidopsis, grape, and papaya, including basic characterization, phylogenetic analysis, conserved motifs, gene duplication analysis, promoter analysis, expression patterns, and splice isoforms. To analyze the relationship between AS and abiotic stress, we analyzed IR rates and the relative IR rates in poplar SR genes under cold, heat, drought and salt stresses.
This study provides valuable resources for the gene structure, evolution, and function of 5 poplar SR proteins.
Phylogenetic tree and conserved motifs analysis in poplar, Arabidopsis, grape, and papaya 6 The amino acid sequences of the aforementioned 71 SR genes were used for phylogenetic analysis. For genes with different transcripts, the primary transcript specified by phytozome and grape genome database was selected. Multiple sequence alignments of all SR proteins were carried out in ClustalX 1.83 with default settings [28]. A phylogenetic tree for all of the complete SR protein sequences was built using MEGA 7.0 with the neighbor-joining (NJ) method and 1000 bootstrap replicates [29]. In addition, the fulllength protein sequences were submitted to Multiple Expectation Maximization for Motif Elicitation (MEME) (http://meme.sdsc.edu/meme/itro.html) [30] with the following parameters: an optimum width of 6-200 residues and a maximum number of 10 motifs.
Gene structure analysis in poplar, Arabidopsis, grape, and papaya and AS profile analysis in poplar, Arabidopsis, and grape Exon/intron position information for all SR gene available alternative transcripts and proteins were obtained from the phytozome database (http://www.phytozome.net) [22], and grape genome database (http://genomes.cribi.unipd.it/grape/) [23]. The exon/intron structures were determined with the Gene Structures Display Server (GSDS) (http://gsds.cbi.pku.edu.cn/) [31]. Import the protein sequences of all available alternative transcripts into the MEME website (http://meme-suite.org/tools/meme) [30] to analyze the conserved motifs. And the same protein isoform conserved motif analysis was only shown once.
Paralogs and orthologs of SR genes in poplar, Arabidopsis, grape, and papaya To identify paralogs and orthologs of four dicots, the method was as follows: all local BLASTP was first searched for all potential duplicated gene pairs using the four dicots protein sequence (E < 1e-10, top 5 matches and m8 format output). As a result, orthologs and paralogs were obtained. The paralogs were then classified using MCScanX-transposed, including the WGD, TD, PD, TRD, and DSD. PAML 4.0 [32] was used to calculate Ka and Ks for duplicated gene pairs. Then the duplicated genes were divided into three types by calculating the rates of nonsynonymous substitution (Ka) and the rates of Ks. Ka/Ks > 1 was positively selected genes (PSGs), Ka/Ks = 1 was neutral genes, and Ka/Ks < 1 was negatively (or purifying) selected genes (NSGs).
Cis-acting element analysis in poplar, Arabidopsis, grape, and papaya Information on the 2.0-kbp sequences upstream of the transcriptional initiation sites of the SR genes of the four dicots was obtained from the phytozome database (http://www.phytozome.net/populus.php) [22], and grape genome database (http://genomes.cribi.unipd.it/grape/) [23]. The cis-acting regulatory elements were then predicted among the sequences of putative promoter regions using Plant CARE (http://bioinformatics. psb.ugent.be/webtools/plantcare/html/) [33]. Motifs with specific functions associated with abiotic stress were selected in this study.

IR in poplar SR genes
We obtained IR information about poplar SR genes under hot, cold, drought, salt in leaf, root and xylem from RNA-seq data [34]. Under certain stress, if the relative IR rate (stress treated samples vs. untreated control samples) (IRR_ratio_diff) of an intron site of some SR genes changed by more than 30%, we mapped the relative IR rate of this intron site into a heat map using R (R is a free software environment for statistical computing and graphics) (Fig. 8).
Expression pattern analysis of SR genes in poplar, Arabidopsis, grape, and papaya To explore the expression level characteristics of SR genes under different abiotic stresses in different tissues, we obtained the expression level data of the poplar and papaya SR 8 genes from RNA-seq data [34,35], and Arabidopsis and grape chip data GSE5620, GSE5621, GSE5623, GSE5624, GSE5628 and GSE31594 were obtained from the NCBI GEO database. The log2-transformed fold-change values were used for creating the heatmap.

Statistical analyses
The Pearson correlation tested determined the correlation between two independent variables. A one-way ANOVA test was performed to analyze the significance in the correlation analysis. Among all statistical tests, a P-value of < 0.05 was considered to be significant, and a P-value of < 0.01 was considered to be extremely significant. And if the correlation coefficient was < 0.1, there was no correlation, even if the correlation was significant. Note: * indicated significance at P < 0.05; ** indicated significance at P < 0.01.

Results
Identification and characterization of SR genes in poplar, Arabidopsis, grape, and papaya According to the method described by Barta and Kalyna [10], 18 Arabidopsis and 22 rice SR protein sequences were used as the query in a BLASTp (e-value cutoff = 1 e − 10) search to identify poplar, grape, and papaya SR protein sequences from the phytozome database (http://www.phytozome.net) [22], and grape genome database (http://genomes.cribi.unipd.it/grape/) [23]. SMART (http://smart.embl-heidelberg.de) [24] and PFAM (http://pfam.janelia.org/) [25] were used to confirm whether the candidate sequence had one or two N-terminal RRMs (RBD; PF00076), and the sequences were manually confirmed to have at least 50 amino acids in the downstream sequence with 20% of the RS or SR dipeptide [10]. Finally, 30 genes in Populus trichocarpa, 14 genes in Vitis vinifera and 9 genes in Carica papaya were identified (Table S1). Basic information describing the primary transcript of each genes from poplar, Arabidopsis, grape, and papaya were listed in Table S1, and include gene ID, NCBI accession number, gene length, CDS length, coding protein length, pI, MW and the prediction of their subcellular location (Table S1). In poplar, the SR genes were located on 14 of 19 chromosomes with no distribution on chromosomes 4, 7, 9, 11 and 17. Among them, the most poplar SR genes distributed on chromosome 2. Arabidopsis SR genes were distributed on all chromosomes.
Grape SR genes were distributed on 11 of 19 chromosomes (i.e., 1, 4, 6, 7, 8, 12, 13, 14, 15, 16 and 18) (Table S1). The statistical data showed that VvSCL34 had the longest gene length (18119 bp). CpRS50 coded for the longest protein (447 aa), whereas PtRSZ20 (180 aa) coded for the shortest protein among the 71 SR proteins identified. The pI values of six proteins in the 71 SR protiens were less than 10, whereas that of the remaining proteins were more than 10. Furthermore, the MW of these proteins ranged from 20.46 to 50.42 kDa with an average of 30.46 kDa. Predicted subcellular localization of the SR proteins indicated that most of these proteins were located in the nuclear, which was consistent with their putative roles as splicing factors.
Phylogenetic tree and conserved motifs analysis in poplar, Arabidopsis, grape, and papaya Based on different structural architectures ( Figure S1), the SC subfamily contained proteins with a single RRM followed by an RS domain. The SR subfamily proteins had two RRMs with an evolutionarily conserved SWQDLKD motif in their second RRM followed by an RS domain with a characteristic SR dipeptide. The RSZ subfamily consisted of SR proteins with one Zn knuckle. The plant-specific SCL subfamily (SC35-like) was similar to the SC subfamily but had an N-terminal extension that is rich in Arg, Pro, Ser, Gly and Tyr residues. Proteins of the plant-specific RS2Z subfamily had two Zn knuckles and an additional SP-rich region after the RS domain. The plant-specific RS subfamily contained two RRMs (without the SWQDLKD motif) followed by an RS-rich with many RS dipeptides [10].
And the protein sequences of the 71 putative SR genes were used to construct a phylogenetic tree to study the evolutionary relationship between these proteins (Fig. 1a).
For genes with different transcripts, the primary transcript specified by phytozome and grape genome database was selected. The phylogenetic tree showed a similar pattern of evolutionary divergence among poplar, Arabidopsis, papaya and grape, which reflected conserved evolution and function in different SR genes. The SCL subfamily constituted the largest clade, each containing 18 members and accounting for 25.4% of the total SR genes, whereas RS2Z had the lowest number of members (seven members of each group) and accounted for 9.8% of the total SR genes. SR genes in poplar, Arabidopsis, grape and papaya were distributed in all six groups. In poplar, seven genes belonged to the SR subfamily, eight to the SCL subfamily, six to the RS subfamily, three to the RSZ subfamily, two to the RS2Z subfamily and four to the SC subfamily.
We analyzed the conserved motifs of SR proteins using the online MEME software, as shown in Fig. 1b. Ten specific motifs were detected and detail information was provided in Table S2. Each assumed pattern was annotated by searching PFAM and SMART. Motifs 1, 2, and 5 were found to encode the N-terminal RRMs (RBD; PF00076), whereas motif 3 encoded proteins of RRM with an evolutionarily conserved SWQDLKD motif. Since the RRM containing an evolutionarily conserved SWQDLKD motif was only in the SR subfamily, we found that only the SR subfamily had motif 3. Motif 7 and 9 were the RS domain of each SR gene. Other motifs identified had unknown functions. As expected, members of the same subfamily had highly similar motifs, for example, the 3, 6, and 10 Motifs existed only in the SR subfamily, and the motif 5 existed only in the RS subfamily. It was suggested that SR proteins of the same subfamily had functional similarities.
Gene structure in poplar, Arabidopsis, grape, and papaya and AS profile analysis in poplar,

Arabidopsis, and grape
For intron/exon structure analysis (Fig. 2b, Fig. 2c), we found that the intron number of SR genes was 4 ~ 13 in general, and the SR subfamily had more introns, the distribution of 11 intron number was 10 ~ 13. Second, the number of introns in the SC subfamily was between 6 and 9. We found that 10, 7 and 5 SR genes had 5 introns in the RS, SCL and RSZ subfamilies, respectively, accounting for 76.9%, 38.8%, and 55.5% of all genes in the subfamily. There were 5 genes in the RS2Z subfamily with 6 introns, accounting for 71.4% of all RS2Z subfamily genes. It was worth noting that some genes contain introns in their 5' or 3' untranslated (UTR) regions. An extreme example was that AtSR34b contained 4 introns in the 3' untranslated region.
Some researches have reported that AS profile is common in SR genes [18]. For each gene, the primary transcript was placed at the top and the other alternative transcripts were listed below using phytozome database and grape genome database annotation files.
And the same protein isoform conserved motif analysis was only shown once (Fig. 3, Figure S2, Figure S3). Since papaya had no annotated multi-transcript information, it was not analyzed here. The results showed that a total of 79 transcripts were detected in 30 genes from poplar (Fig. 2, Fig. 3). PtSR34b and PtSCL29a had the most transcripts, i.e., 5.
A total of 68 transcripts were detected in 18 genes from Arabidopsis thaliana with AtRS40 having the highest number of transcripts, i.e., 8 ( Figure S2). A total of 65 transcripts were detected in 14 genes in grape ( Figure S2). VvSR28 was found to have the highest number (16) of transcripts ( Figure S3). Multiple transcripts of genes could produce multiple different protein isoforms. For example, AtRS41 had seven transcripts that could produce four protein isoforms ( Figure S2), and VvRS26 had ten transcripts that produce five protein isoforms ( Figure S3). PtRS29b had three transcripts that produce two protein isoforms (Fig. 3). Two transcripts of PtRS29b had undergone AS in the 5' untranslated (UTR) region, indicating potential mutations that controlled their transcriptional or translational efficiency (Fig. 2, Fig. 3).
Paralogs and orthologs of SR genes in poplar, Arabidopsis, grape, and papaya To further investigate the evolution of the SR family, duplicated gene pairs (paralogs and orthologs) analysis was used to investigate gene duplication events within the poplar, Arabidopsis, grape, and papaya. We identified 94 paralogs in four dicots (Fig. 4, Table S3).
Specifically, there were 51 paralogs within poplar, 22 within Arabidopsis, 15 within grape, and 6 within papaya. In addition, 68 SR genes including 30 poplar SR genes, 14 grape SR genes, 18 Arabidopsis SR genes and six papaya SR genes were shown to have a duplicated relationship, accounting for 95.8% of all SR gene family members. Transposed duplication, and dispersed duplication were found in the paralogs of four dicots, and whole-genome duplication (WGD) was found in the paralogs of poplar, grape, and Arabidopsis. Tandem duplication and proximal duplication were not found in the four dicots (Fig. 4, Table S3).
We next identified 408 orthologs among four dicots (Table S4). Specifically, there were 147 orthologs between Arabidopsis and poplar, 66 between Arabidopsis and grape, 44 between Arabidopsis and papaya, 67 between poplar and grape, 45 between poplar and papaya, and 39 between grape and papaya.
To study evolutionary selection process, K a value, K s value, and K a /K s ratios of all duplicated gene pairs were listed in Table S3 and Table S4. K s value of poplar paralogs varied from 0.0155 to 4.9695 (Table S3). The K s value of Arabidopsis, grape and papaya paralogs varied from 0.6421 to 2.9420, 0.9462 to 4.9368 and 1.3729 to 2.4849, respectively. The K a /K s ratios of poplar, Arabidopsis, grape and papaya duplicated gene pairs varied from 0.0555 to 1.4837, 0.0863 to 0.3738, 0.0644 to 0.4607 and 0.1128 to 0.4715, respectively. The results showed that all K a /K s ratios were smaller than 0.5 except that the ratio of PtSR34/PtSR33 pair was greater than 1, indicating that the SR genes had undergone strong purifying selection. In addition, for orthologs, we found that all K a /K s ratios were less than 1, orthologs had undergone positive selection (Table S4).

13
To further analyze the evolutionary relationship of the SR genes, we analyzed it by the K s value of the duplicated gene pairs (K s value could be used to determine the separation time of the duplicated genes). In poplar, according to Tang et al., the median K s of the duplicated genes associated with the γ triplication event was 1.54, and the total K s associated with P-WGD was 0.27 [36]. We detected 22 WGDs associated with amplification of the SR gene in the poplar genome. The values of K s in different WGDs showed two different ranges (Table S3): 0.2249-0.4493, and 0.9350-1.7387. The former K s range of duplicated genes might come from P-WGD event, the latter K s range of duplicated genes might come from γ triplication event. In Arabidopsis, the median K s value associated with β-WGDs and γ triplication event was close to the saturation median K s value of 2.00 [36].
Therefore, based on the K s value, β-WGDs and γ triplication event were indistinguishable.
The median K s associated with α-WGD was reported to be 0.86 [36]. The values of K s in WGD showed one ranges (Table S3): 0.6421-1.040. This might indicate that these Arabidopsis WGD only experience α-WGD. In grape and papaya, the overall median K s values of γ triplication event were 1.76 and 1.22, respectively [36]. Then, according to the K s value of the orthologous gene, it could be divided into a paralogous gene formed before polyploidization and a paralogous gene formed after polyploidization. So we mapped the evolutionary relationship of the SR genes of poplar, Arabidopsis, grape, and papaya (Fig. 5). The SR gene, which was not in the same line, indicated that the isolation of these genes was before the γ triplication event. For each row of SR genes of the corresponding species, they were produced by different WGDs. The results showed that most of the SR genes were lost after WGD. For example, in the γ triplication event, CpSC30 did not produce two other corresponding SR genes. Then the presence of two SR genes in the box 14 indicated that the two genes were genes produced by other duplicated types after the WGD. PtSR29 and PtSR34b were SR genes produced by other duplicated types after WGD.
In addition, we found that the duplicated genes generated by WGD and the duplicated genes generated by other recent duplicated types had highly similar gene structures, for example PtSR33, PtSR34, PtSR34c, which had the same number of transcripts, protein numbers, and similar intron numbers.
Cis-acting elements analysis in poplar, Arabidopsis, grape, and papaya The key roles of cis-acting elements in the promoter region have affected the tissuespecific or stress-responsive expression patterns. In this study, we identified four cisacting elements of environmental stress type, including those directly related to the ABA response element (ABRE), cis-acting element involved in defense and stress responsiveness (TC-rich repeats), low temperature responsive element (LTR) and MYB binding site involved in drought-inducibility (MBS). In poplar, six genes contained ABRE cis-acting elements, 17 contained LTR cis-acting elements, 15 contained MBS cis-acting elements and nine contained TC-rich repeats cis-acting elements. In Arabidopsis, papaya and grape, 32 contained ABRE cis-acting elements, 20 contained LTR cis-acting elements, 16 contained MBS cis-acting elements and 14 contained TC-rich repeats cis-acting elements (Fig. 6). These findings could aid further investigations into the stress-regulatory mechanisms of SR genes in plant.

Expression pattern analysis of SR genes in poplar, Arabidopsis, grape, and papaya
Since abiotic stress may adversely affect plant growth and development, stress-tolerant gene studies of plants are important. We obtained microarray (Arabidopsis and grape) and RNA-seq (poplar and papaya) data under various stress conditions from different tissues of the four dicots [34,35]. The results showed that only the CpRSZ20 gene was not expressed, and the rest were expressed (Fig. 7). In poplar, we found that three poplar SR genes (PtSCL25, PtSR34a, PtSCL23) were significantly down-regulated (the value of log2foldchange was larger than 2) in cold stress, whereas under heat stress three poplar SR genes (PtSR35, PtSCL25, PtSCL23) were up-regulated (the value of log2-foldchange was larger than 2) and the expression level of the two poplar SR genes (PtSR33, PtSR34) were declined. No significant changes were observed under drought and salt stress conditions (Fig. 7). Two papaya genes (CpSCL25, CpSCL35a) were significantly upregulated under drought stress. In Arabidopsis, the expression value of AtSCL28 was much lower than that of other Arabidopsis SR genes, but its expression value was significantly up-regulated under heat stress, and other Arabidopsis SR genes were not significantly changed (the value of log2-foldchange was larger than 2). In grape, no significant changes were observed in the expression levels of SR genes (Fig. 7).
In addition, we studied the correlation between transcript number, protein number, intron number, four cis-acting elements, and the expression values of SR genes ( Figure S4). The results showed that no correlation was found in Arabidopsis, grape, and papaya. In poplar, transcript number, protein number, intron number showed a negative correlation with partial heat stress. The four cis-acting elements showed no correlation with expression.

Intron retaining in poplar SR genes
In humans, ES accounts for 35.2% of AS, whereas IR represents only 0.01% of AS [37]. In contrast, IR is the most common type in plants [38]. And we obtained information on the IR events of the poplar SR genes from the RNA-seq data (Table S5). The results showed that 26 poplar SR genes underwent IR events; seven IR sites in PtSR34b, six IR sites in PtSR34c and 5 IR sites in PtRS29, PtRS29a, PtRS2Z33 and PtRS2Z34 (Fig. 3, Table S5).
These IR events greatly increased the complexity of the SR genes. In addition, PtSCL29b-2, PtSR34c-4, PtSCL29c-4 and PtRS29a-5 were without IR rates under standard conditions, whereas they had IR events under hot and cold stress conditions (Table S5). Under cold stress, the relative IR rates (treated tissues vs. untreated control samples) of PtSR29-1, PtSR29-1, PtSR29-3, PtSCL23-1, PtSCL23-2, PtSCL25-1 and PtSCL25-2 increased significantly (significant change indicated that relative IR rates changed more than 30% under a certain stress), whereas the relative IR rates of PtSR34b-3, PtRS29a-1 and PtRS29a-2 decreased significantly (Fig. 8, Table S5). Under heat stress, the relative IR rates of PtSC27-1 and PtSCL23-1 decreased significantly, whereas the relative IR rates of PtSR34b-3, PtSR34-2 and PtSR34c-2 increased significantly. Under drought stress, the relative IR rates of PtSCL31-1 in the leaf increased, whereas the ratio in xylem and root decreased. Under salt stress, the relative IR rates of PtRS29-2 in the leaf decreased whereas the ratio in the roots increased (Fig. 8, Table S5).

Discussion
In the previous study by Richardson DN et al, 20 poplar SR genes, 18 Arabidopsis SR genes, 9 grape SR genes were identified [39]. Here, we analyzed 71 SR genes from poplar, Arabidopsis, grape, and papaya (30 poplar SR genes, 18 Arabidopsis SR genes, 14 grape SR genes and 9 papaya SR genes) by combining bioinformatics. The results showed that the number of poplar SR genes in our study was 10 more than the previous study, and the grape SR gene was 5 more than the previous study. Previous studies had shown that there are no SC subfamily genes in poplar, and we found that the SC subfamily had four genes (Table S1). Previous research used the old version (v2) of P. trichocarpa and V. vinifera genome information [39], while our research used the version (v3.1) of P. trichocarpa genome information [22] and version (v2) of V. vinifera genome information [23]. The update of genome information might be the cause of the change in the number of SR genes.
Evolutionary assessment of SR genes in poplar, Arabidopsis, grape, and papaya At present, the evolutionary veins of these four dicots have been studied very clearly, and they originate from common ancestors. Previous studies have shown that Arabidopsis has experienced three rounds of polyploidization events (γ triplication event, α-and β-WGDs).
Poplar experienced two rounds of polyploidy duplication events (γ triplication event and P-WGD) [40]. Papaya and grapes only experienced a γ triplication event. Therefore, in theory, after a number of evolutionary events, the ratio of the four dicots should be 4 (Arabidopsis): 2 (poplar): 1 (papaya): l (grape) [41]. In this study, the ratio of the number of SR genes in the four dicots obtained was 18: 30: 14: 9. This suggested that the ratio of the SR genes was not the same as the theoretical ratio. In our study, we found that all of the four dicots except VvSR31 did not produce a new SR gene after the γ triplication event, which might mean that after the γ triplication event, the plant experienced a large the gene of the fragment was lost (Fig. 5). For the recent the recent P-WGD, we found that all poplar SR genes were genome-wide duplicated occurred in the P-WGD and was retained except PtSR29, PtSR34b, PtSC30, PtSC31, and PtRSZ20. And after the P-WGD, PtSR33, PtSR29, and PtSC30 had other types of duplicated events. In Arabidopsis, five SR genes did not produce a new SR gene in the recent α-WGDs, and only AtSCL30a and AtSCL30 were produced by other duplicated events. So the final poplar SR gene was more than Arabidopsis. The grape SR gene had a small number of gene duplications after the γ triplication event. Papaya had many SR genes lost before the γ triplication event, so in the end we only found 9 SR genes in papaya. It could be seen that the SR genes of the four dicots might have different degrees of gene expansion or deletion in the long-term evolution. The might be an important reason why their actual SR genes number ratio was different from the theoretical SR genes ratio.

AS events in the SR genes
Whether transcripts produced by AS are functional has long been a subject of debate.
Here, we constructed AS profile of the poplar, Arabidopsis, grape SR genes from the phytozome database and grape genome database (Fig. 3, Figure S2, Figure S3). The result showed that 77.4% (48 of the 62 genes) had multiple splicing isoforms and 48 genes produced 198 transcripts. These transcripts tripled the transcriptome complexity of the poplar, Arabidopsis, and grape SR genes. In addition, AS of many transcripts occured in the untranslated region, and their appearance did not result in a change in the putative protein sequence. The splicing isoforms might have RNA level functions. These 48 SR genes with multiple splicing isoforms produced 114 putative protein sequences. For conserved motif analysis, some transcripts produced truncated proteins, such as the fourth putative protein isoform of PtSCL29b lacking RS domain, which might cause functional changes in the protein.
Then, for the overall expression level of SR genes (Fig. 7), the results showed that only a small number of SR gene expression changed significantly under heat, cold, drought and salt stresses, and most of the SR genes expression were relatively stable. In previous studies [18], it was found that in the various hormonal and abiotic stresses tested in the Arabidopsis SR genes, temperature stresses (cold and heat) significantly changed the AS of the pre-mRNA of several SR genes, while the hormone changed only splicing of three SR genes. In our study, since the poplar SR gene contained a large number of introns, and IR events were the most common type of AS in plants [38]. We obtained information on the IR events of the poplar SR genes from the RNA-seq data. We found 26 of 30 poplar SR genes had intron retention (IR) events. It was found that different intron sites had changes in the IR ratio under heat, cold, drought and salt stresses (Table S5). Especially under heat and cold stresses, and the relative IR ratio changed more significantly (Fig. 8). This reflected that SR genes responsed stresses might be mainly from changing the AS of itself, rather than changing the overall expression value of the genes. In a sense, these intron-rich SR genes could provide raw materials for IR events and help plants to face heat and cold stresses. However, the transcripts produced by these IR events were mostly degraded by the nonsense-mediated mRNA decay (NMD) pathway due to the incomplete translation of the premature stop codon. The functional significance of these splicing isoforms was currently unclear.
In summary, we compared AS profile of the poplar, Arabidopsis, grape SR genes from the phytozome and grape genome database. And the IR events of the poplar SR genes under heat, cold, drought and salt stresses were also analysis and compared from RNA-seq data.
However, our research wass limited to currently available transcriptome data. In the future, the loss of functional mutants of SR gene is constructed, for example in the model plants Arabidopsis and poplar, to examine their biological importance to plants.

Conclusions
We had identified 71 SR genes from poplar, Arabidopsis, grape, and papaya, and

Consent for publication
Not applicable.

Availability of supporting data
The datasets are obtained from XX.

463.
Supplementary Information Figure S1. Domain Architecture of the Arabidopsis, poplar, grape and papaya SR protein subfamilies. The SC subfamily contained proteins with a single RRM followed by an RS domain. The SR subfamily proteins had two RRMs with an evolutionarily conserved SWQDLKD motif in their second RRM followed by an RS domain. The RSZ subfamily consisted of SR proteins with one Zn knuckle. The plant-specific SCL subfamily (SC35-like) was similar to the SC subfamily but had an N-terminal extension that is rich in Arg, Pro, Ser, Gly and Tyr residues. Proteins of the plant-specific RS2Z subfamily had two Zn knuckles and an additional SP-rich region after the RS domain. The plant-specific RS subfamily contained two RRMs (without the SWQDLKD motif) followed by an RS-rich with many RS dipeptides [10]. According to the method described by Barta Table S2. The paralogs analysis of SR genes within poplar, Arabidopsis, grape, and papaya.
The whole chromosomes were shown in a circle. The red, black, and green lines represented the whole-genome duplication, transposed duplication, and dispersed duplication of the SR genes, respectively. Panoramic picture to visualize the loss and expansion of the ancestral SR genes associated with paleopolyploidy events that had occurred in poplar, Arabidopsis, grape, and papaya. Note: The SR gene, which was not in the same line, indicated that the isolation of these genes was before the γ triplication event. For each row of SR genes of the corresponding species, they were produced by different WGDs.
The presence of two SR genes in the box indicated that the two genes were genes produced by other duplicated types after the WGD.  Expression of SR genes under stress in poplar, Arabidopsis, grape, and papaya.
The expression level data of the poplar and papaya SR genes was obtained from RNA-seq data [34,35], and Arabidopsis and grape chip data GSE5620, GSE5621,  The relative IR rates (stress treated samples vs. untreated control samples) (IRR_ratio_diff) of the SR genes changed significantly under cold, heat, drought, and salt in leaf, root and xylem. Significant change indicated that relative IR rates changed more than 30% under a certain stress. The blank indicated no difference in IR rates. The number to the right of the gene name indicated the intron site of the corresponding gene in Figure 3.

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download. Figure S4.tif Table S3.docx Figure S3.tif Table S5.xls   Table S1.docx Table S2.docx  Table S4.xls Figure S1.tif