Formation and Comparative Analysis of Full-Length Transcriptome Sequencing and Next Generation Sequencing In Medicago Sativa L. Roots Under Abiotic Stress

Background: Medicago sativa L. (M. sativa L.) is a legume with high salt tolerance and a major forage crop with high biomass production. However, the large-scale full-length cDNA sequences of M. sativa L. in response to abiotic stress remain unclear. Results: We provided the complete transcriptome for M. sativa L. roots under different abiotic stressors using a combination of single-molecule real-time sequencing and next generation sequencing. Our results indicated that there were 21.53 Gb clean reads, which consisted of 566,076 insert reads and 409,291 full-length non-chimeric reads. We obtained 194,286 consistent transcripts based on a cluster analysis of full-length reads, and 41,248 high quality transcript sequences based on non-full-length reads. After correction using second-generation data for third-generation low-quality data, we obtained 81,017 transcript sequences according to a cogent analysis. The sequence structural analysis acquired 33,058 simple sequence repeats and 42,725 complete coding sequence regions. In addition, 77,221 transcripts were annotated by eight functional databases; 3,043 lncRNAs were predicted and 4,971 alternative splicings were acquired. Moreover, we conrmed the levels of highly differentially expressed transcripts (ADH1, PEPC, MJG19.6, PCKA and GAPC1) in M. sativa L. roots under NaCl and polyethylene glycol stress. Conclusions: Therefore, we fully and massively exposed the full-length transcripts related to abiotic stress in M. sativa L., which will lay the foundation for understanding gene regulation in M. sativa L. under abiotic stress. puried Agencourt RNAClean rst-strand cDNA, the Strand Master was generate the second-strand cDNA. After end repairing, the cDNA fragments were amplied using the PCR Master Mix. the library was quantitatively determined with the Agilent 2100 bioanalyzer instrument and RT-qPCR. The Illumina HiSeq xten platform

enzyme activity on the cell membrane, and disrupt metabolism [10,11]. Besides, abiotic stress can reduce the photosynthetic rate, decrease the assimilation product, and change the respiration rate of crops [12,13]. For example, salt stress (high concentration of NaCl) will cause a loss of water in plants, ion toxicity, nutrient imbalance, oxidative stress, and so on [14]. Drought stress is a vital stress factor affecting plant photosynthesis and growth [15]. Drought stress can indirectly lead to the occurrence of leaf photoinhibition, reduce photosynthetic e ciency, and cause serious damage to crops [16]. Dark stress can result in the complex characters including leaf senescence, the elongation of the hypocotyl and petiole, and early owering [17]. Temperature is also a crucial ecological factor that restricts the growth and distribution of plant, and low temperature can markedly destroy the cell membrane system and reduce SOD activity [18]. Abscisic acid (ABA), as a kind of plant hormone, can alleviate the peroxidation of chlorophyll in plant leaves, contribute to the synthesis of chlorophyll [19]. At present, the effects of abiotic stress on crops have become a bottleneck in agricultural development in numerous areas [20]. Therefore, it is of signi cance to improve the tolerance of crops to abiotic stress by studying stress resistance in M. sativa L. However, the genome and transcriptome data of M. sativa L. roots under abiotic stressors have not been reported, which greatly hampers study of the underlying molecular mechanisms of abiotic stress during growth and development of M. sativa L.
A transcriptome is the complete set of transcripts for a certain type of cells or tissues in a speci c developmental stage or physiological condition [21,22]. A transcriptome analysis reveals the gene expression levels of organisms as well as structural variations and can be used to discover new genes [23]. The research methods and platforms for transcriptomes are undergoing rapid changes and bioinformatics analysis has also gradually improved. The single-molecule real-time (SMRT) sequencing technology from Paci c Biosciences (PacBio) quickly and accurately provides transcriptional information for organisms [24,25]. Therefore, this technology has become a better alternative for full-length cDNA molecular sequencing and has been widely applied for whole-transcriptome pro ling of humans and other species. The rapid development of next generation sequencing (NGS) over the past few years has also increased data throughput and read length, and simultaneously brought down sequencing costs [26].
This has allowed for new breakthroughs in the area of biology and ushered medical genetics into a new era.
In the present study, we utilized SMRT sequencing to generate and identify the full-length transcriptome of M. sativa L. roots under various abiotic stressors, such as darkness, low temperature (4 °C), NaCl (400 mM), polyethylene glycol (PEG) (25%) and abscisic acid (ABA, 100 µM). According to the transcriptome data, we analyzed the full-length transcriptome sequences by alternative splicing, simple sequence repeat (SSR), transcription factor, complete coding sequence (CDS), single nucleotide polymorphism (SNP), and transcript functional annotation. We also predicted the coding sequences and long non-coding RNAs (lncRNAs). In addition, we combined PacBio sequencing and NGS technology to further identify the differentially expressed transcripts and analyze the functional annotation in M. sativa L. roots under CK (control), 400 mM NaCl, and 25% PEG stress conditions. The results will be valuable for further investigations of M. sativa L. roots under abiotic stress.

Results
Overview of full-length transcriptome sequencing in abiotic stress-treated M. sativa L. roots. In this study, M. sativa L. roots were grown under different abiotic stressors (darkness, low temperature, 400 mM NaCl, 25% PEG, and 100 µM ABA) for 0, 3, 6, 12, 24, 48, and 72 h (Fig. 1A). Then, we extracted total RNA from the M. sativa L. roots, which were subjected to different abiotic stressors, equally mixed the RNAs together. After puri cation and repair, we constructed and examined the library using Agilent 2100 system (Agilent Technologies, Palo Alto, CA, USA). A full-length transcriptome sequencing experiment was carried out using PacBio Sequel (Fig. 1B). The obtained accurate full-length transcripts were analyzed using PacBio Single Molecule Real Time (SMRT™) DNA sequencing technology (Fig. 1C). A total of 21.53 Gb clean reads were obtained and utilized to correct the SMRT reads in M. sativa L. roots under abiotic stress. Based on full passes ≥ 0 and sequence accuracy > 0.80, we successfully obtained 566,076 readsof-insert (ROI) (total bases: 1,058,177,144) with mean length of 1,869 bp, mean read quality of 0.96, and 19 passes.
Summary of Pacbio RS sequencing base on sequencing by synthesis (SMRT). Besides, the ROI read length distribution of each size bin and the 1-6 k size bin were shown, and the ROI read length was mainly distributed around 2000 ( Fig. 2A and 2B). The number of full passes is related to the length of the cDNA, but generally decreases with the increase in cDNA length. The accuracy of the ROI sequence is affected by the number of full passes. The more the full passes contribute the higher the sequence accuracy. ROI quality values re ect the accuracy of the sequence. The distribution of full passes of ROI sequences was shown in Fig. 2C, and the distribution of ROI quality values was high (Fig. 2D). Besides, through the full-length transcriptome sequences, we obtained 34,871 ltered short reads, 116,022 nonfull-length reads, 415,183 full-length reads, and 409,291 full-length non-chimeric reads. The average fulllength non-chimeric read length (FLNC) was 1,916; the full-length percentage was 73.34% (Table S2). The FLNC length distributions of each size bin and the 1-6 k size bin were what was expected, and the FLNC length also mainly distributed around 2000 ( Fig. 2E and 2F). According to the ROI classi cation for the 1-6 k size, the percentage of ltered short reads was 6.2%; the percentage of full-length (chimeric) was 1%; the percentage of full-length (non-chimeric) was 73.2%; the percentage of non-full-length (no poly-A) was 11.6%; and the percentage of the non-full-length (no primer) was 8.9% (Fig. 2G).
Alternative splicing and SSR analyses and the prediction of coding sequences. In our results, there were 194,286 consensus isoforms (average consensus isoform read length of 2,077), 41,248 polished highquality isoforms, 152,636 polished low-quality isoforms; the percent of polished high-quality isoforms was 21.23%. The consensus isoform length distribution of each size bin was exhibited, and the consensus isoform length was also mainly distributed around 2000 (Fig. 3A). In addition, BUSCO was used to evaluate transcriptome integrity, and the results indicated that the number of complete and single-copies was 418, the number of complete and duplicated was 536, the number of fragmented transcriptomes was 29, and the number of missing transcriptomes was 457 (Fig. 3B). Meanwhile, BLAST software was applied to predict the candidate events of alternative splicing through pairwise comparison of the unreferenced transcriptomes in the three generations of after redundantion [27]. If the comparison results satisfy the conditions in the previous research, they are considered as candidate alternative splicing events. And we also exhibited the alternative splicing events in Table S3. Furthermore, we analyzed SSRs using the MIcroSAtellite identi cation tool (MISA). Here, the detection of SSRs was part of the genetic analysis for M. sativa L. roots under abiotic stress. And the distribution of SSR types was displayed. There were 3944 compound SSR (c), 35 compound SSR with overlapping bases between two SSRs (c*), 13133 Mono-nucleotide (p1), 3535 Di nucleotide (p2), 6825 Tri nucleotide (p3), 349 Tetra nucleotide (p4), 117 Penta nucleotide (p5) and 113 Hexa nucleotide (p6) ( Table S4). The coding region sequences and the corresponding amino acid sequences of the transcripts were predicted using TransDecoder software, and we also successfully obtained a total of 75,596 ORFs, including 42,725 complete ORFs. The complete length distribution of the predicted CDS coding proteins was shown in LncRNA prediction and transcription factor analysis. Meanwhile, the numbers of lncRNA transcripts were presented in Fig. 4A, which was predicted by CPC, CNCI, CPAT and a pfam protein structure domain analysis. We discovered 3043 lncRNAs through the conjoint analysis of the CPC, CNCI, CPAT and pfam. Next, the target genes of the predicted 3,043 lncRNA sequences were also predicted using LncTar, and the predicted target genes were exhibited in Table S5. In addition, the transcription factors were predicted and analyzed using iTAK software; 8,336 transcription factors were predicted, and the distribution of the different transcription factors was shown in Fig. 4B. We discovered that the transcription factors mainly  (Table S6). And the integrated_function annotation of all identi ed transcripts was also displayed in Table S7. According to the NR database, in the homologous species distribution, Medicago truncatula accounted for 80.03%, Fusarium verticilliides accounted for 3.92%, Cicer arietinum accounted for 2.66%, and Medicago sativa accounted for 2.66% (Fig. 5A). The GO annotation system was divided into biological processes (BP), molecular functions (MF), and cellular components (CC). We discovered that these differentially expressed transcripts under abiotic stresses were mainly enriched in CC terms (cell part, cell, organelle, membrane, organelle part, membrane part, macromolecular complex and cell junction, etc), MF terms (catalytic activity, binding, transporter activity, structural molecule activity, nucleic acid binding transcription factor activity, electron carrier activity, molecular transducer activity and enzyme regulator activity, etc) and BP terms (metabolic process, cellular process, single-organism process, response to stimulus, biological regulation, localization and cellular component organization or biogenesis, etc) (Fig. 5B). The COG database can be utilized to directly classify homologous gene products. Our results exhibited that general function prediction only accounted for 18.84%, replication, recombination and repair accounted for 10.4%, transcription accounted for 10.28%, and signal transduction mechanisms accounted for 9.84% (Fig. 5C). The eggNOG database serves as an annotation of functional descriptions and classi cations for directly homologous groups. Our data revealed that the eggnog function classi cation under abiotic stresses mainly included posttranslational modi cation, protein turnover, chaperones (7.89%), signal transduction mechanisms (6.87%), transcription (5.11%), and carbohydrate transport and metabolism (4.92%), etc (Fig. 5D). The KOG database was used to divide homologous genes from different species into different orthologous clusters. Our data displayed that the KOG function classi cation under abiotic stresses mainly contained general function prediction only, posttranslational modi cation, protein turnover, chaperones, signal transduction mechanisms, translation, ribosomal structure and biogenesis, and carbohydrate transport and metabolism, etc (Fig. 5E). The KEGG database is a collection of various pathways, representing molecular interactions and reaction networks. Our data from KEGG analysis disclosed that the enrichment pathways of the transcripts under abiotic stresses mainly included circadian rhythm-plant (ko04712), ubiquitin mediated proteolysis (ko04120), plantpathogen interaction (ko04626), peroxisome (ko04146), endocytosis (ko04144), eyc. Our data exhibited the ubiquitin mediated proteolysis (Fig. 5F). We also exhibited all KEGG pathways and the related genes (Table S8).
Analysis of SNP and the expression of transcripts. The number of SNPs in HomoSNP, HeteSNP, and AllSNP was exhibited in Table S9. We also provide the SNP-related data in the Table S10. The distribution of SNP density was high in the 0-1 kb (Fig. 6A). In addition, the non-redundant transcripts obtained by third-generation sequencing (single-molecule real-time sequencing) were used as a reference for sequence alignment and subsequent analysis. STAR was used to conduct sequence alignment between Clean Reads and transcripts to obtain location information on transcripts. And the comparison of nonredundant transcripts between second-generation and third-generation sequences was shown in Table   S11. The total distribution of expression for the transcriptomes was displayed in Fig. 6B. To further examine the degree of dispersion in transcript expression, a boxplot of fragments per kilobase of exon per million (FPKM) in each sample was prepared to intuitively compare the expression levels of the transcripts in different samples (Fig. 6C). The correlation of biological replicates can not only test the reproducibility of biological experiments, but also assess the reliability of differentially expressed genes.
Pearson's correlation coe cient analysis was used as the evaluation index of biological repeatability and correlation. We discovered that the three biological replicates in each group were relatively good according to the heatmap of the pairwise comparisons (Fig. 6D).
Identi cation of differentially expressed transcripts in M. sativa L. roots under CK, NaCl, and PEG stresses. Furthermore, based on the full-length transcriptome sequencing, the differentially expressed transcripts were then identi ed by NGS technology in M. sativa L. roots under CK, 400 mM NaCl, and 25% PEG stress. The differentially expressed transcripts were screened by DESeq, and the correlation plots of the M. sativa L. transcripts under CK, 400 mM NaCl, and 25% PEG stress were exhibited in Fig. 7A. In addition, the volcano plot and hierarchical clustering analysis revealed the differentially expressed transcripts in M. sativa L. transcripts in response to the three stressors. There were 4,080 differentially expressed transcripts (2,609 downregulated and 1,471 upregulated) in the CK stress group compared to the 400 mM NaCl stress group; 5,854 transcripts were differentially expressed (3,241 downregulated and 2,613 upregulated) in the CK stress group compared to the 25% PEG stress group; 8463 transcripts were differentially expressed (3,896 downregulated and 4,567 upregulated) in the 400 mM NaCl stress group compared to the 400 mM NaCl stress group ( Fig. 7B and 7C, Table 1).  8A). The COG function classi cation in M. sativa L. roots under CK, NaCl, and PEG stresses mainly included general function prediction only, signal transduction mechanisms, carbohydrate transport and metabolism, and transcription, etc (Fig. 8B). The eggNOG function classi cation in M. sativa L. roots under CK, NaCl, and PEG stresses mainly included carbohydrate transport and metabolism, osttranslational modi cation, protein turnover, chaperones, energy production and conversion, and amino acid transport and metabolism, etc (Fig. 8C). The KOG function classi cation in M. sativa L. roots under CK, NaCl, and PEG stresses mainly included General function prediction only, signal transduction mechanisms, posttranslational modi cation, protein turnover, chaperones, and translation, ribosomal structure and biogenesis, etc (Fig. 8D). In addition, the data from KEGG analysis presented that the enrichment pathways of differentially expressed transcripts between CK and NaCl stresses mainly included carbon metabolism, biosynthesis of amino acids, glycolysis/gluconeogenesis, starch and sucrose metabolism, ribosome and protein processing in endoplasmic reticulum, etc; the enrichment pathways of differentially expressed transcripts in PEG stress mainly included carbon metabolism, biosynthesis of amino acids, starch and sucrose metabolism, glycolysis/gluconeogenesis, phenylpropanoid biosynthesis, plant hormone signal transduction and plantpathogen interaction, etc; the enrichment pathways of differentially expressed transcripts between NaCl and PEG stresses mainly included biosynthesis of amino acids, carbon metabolism, glycolysis/gluconeogenesis, ribosome, starch and sucrose metabolism, protein processing in endoplasmic reticulum and RNA transport, etc (Fig. 8E).Therefore, the differentially expressed transcripts in M. sativa L. roots under CK, NaCl, and PEG stress were mainly enriched in carbon metabolism, biosynthesis of amino acids, glycolysis/gluconeogenesis pathways.
Screening and identi cation of differentially expressed transcripts. Based on the KEGG analysis, we selected the highly differentially expressed transcripts, which were related to carbon metabolism, biosynthesis of amino acids, glycolysis/gluconeogenesis pathways among the CK, NaCl, and PEG stress groups. As shown in Fig. 9A, hierarchical clustering exhibited the distributions of the Top10 up-regulated and Top10 down-regulated transcripts in Medicago sativa L. roots between NaCl and CK groups (Left), between PEG and CK groups (Middle), and between PEG and NaCl groups (Right). Subsequently, we also selected 5 upregulated and 5 downregulated transcripts between the three groups for veri cation. The results of the RT-qPCR assay disclosed that ADH1 was upregulated, PEPC and MJG19.6 were downregulated in the NaCl stress group relative to the CK stress group; PCKA and GAPC1 were upregulated, PEPC was downregulated in the PEG stress group compared to the CK stress group; BCAT2, PCKA and GAPC1 were upregulated, ADH1 and PEPC were downregulated in the PEG stress group compared to the NaCl stress group (Fig. 9B and 9C).

Discussion
Agricultural production is "open" large-scale production in a natural environment [28,29]. Growth and development of crops are often limited due to the changeable natural environment [30]. Poor environmental conditions are often encountered during agricultural production, such as water loss or waterlogging damage; too low or too high temperature; and cold or heat damage [14,31,32]. These disasters caused by a poor environment are a tremendous threat to world crop production and a major problem that needs to be solved [33]. Abiotic stress is a commonly encountered environmental factor in nature that seriously affects plant growth and development [34]. Abiotic stress mainly includes low temperature, high salt, and drought conditions [31]. Low temperature stress decreases crop yield and quality [35]. Under low temperature stress, the photosynthetic capacity of plants decreases and excess light energy increases, leading to massive accumulation of reactive oxygen species (ROS) [36]. Excessive ROS attack proteins, nucleic acids, lipids, and other biomolecules, leading to cell death and tissue destruction [37]. Calcium ions are an important signaling molecule during plant stress [38]. Studies have shown that salt stress rapidly increases Ca 2+ in plant cells [39][40][41]. PEG is an inert non-ionic long chain polymer that is soluble in water. PEG is the best material for studying drought stress in plants [42]. ABA is a small lipophilic plant hormone that plays an important role in the regulation of the plant stress response [43]. Some studies have suggested that ABA levels change signi cantly in plants under drought, salinity, and other stressors [44]. Light is an important environmental factor for plant growth [45], and darkness is often used to study light stress in plants [46,47]. M. sativa L. has become a promising crop for use as a bioenergy feedstock; therefore, it is of signi cance to explore the mechanisms of abiotic stress and improve the stress resistance of M. sativa L. during agricultural production.
At present, several studies have reported the underlying relevant mechanisms of abiotic stress in M. sativa L. roots [48][49][50]. For instance, osmotic pressure and salt stress change the microtubule system of interphase cells in M. sativa L. roots by changing salt ions and cations [48]; silicon priming signi cantly improves the tolerance of M. sativa L. to high alkaline stress [49]; the co-transformation of bar and CsLEA enhances the tolerance of M. sativa L. to drought and salt stress [50]. In addition, high-throughput sequencing platforms have also been applied to investigate the transcriptome changes in M. sativa L. under abiotic stress, including studying the potential mechanisms of the M. sativa L. response to cold stress [51]; transcriptome analysis of lead stress in M. sativa L. roots [52]; and metabolomics analysis of root-symbiotic rhizobia of M. sativa L. under alkali stress [53]. However, information from these studies has been insu cient.
Transcriptome research is a necessary tool to understand life processes. However, RNA-Seq2.0 technology based on the NGS high-throughput platform cannot accurately obtain or assemble complete transcript information, and cannot recognize the transcripts of isoforms, homologous genes, supergene families, or alleles, making it di cult to understand the meaning of life processes at a deeper level. Fulllength transcriptome sequencing based on PacBio SMRT single-molecule real-time sequencing technology does not interrupt RNA fragments, and full-length cDNA obtained by reverse transcription using rapid ampli cation of cDNA ends technology can be used. The ultra-long reads (median 10 kb) of the platform contain a single complete transcript sequence. Furthermore, no assembly is required in the post-analysis, as what is measured is the result [54,55]. The advantages of SMRT sequencing have been comprehensively studied [56]. SMRT sequencing produces full-length transcripts compared with shortread sequencing [57]. In addition, SMRT sequencing can be used to analyze alternative splicing, primaryprecursor-mature RNA structures, and RNA processing [58]. In this study, M. sativa L. roots were treated with different stress conditions including dark, low temperature, 400 mM NaCl, 25% PEG, or 100 µM ABA for 0, 3, 6, 12, 24, 48, and 72 h. The full-length transcriptome was fully tested and analyzed by full-length transcriptome sequencing. We obtained a total of 21.53 Gb of clean data, including 566,076 ROI and 409,291 FLNC reads. A total of 194,286 consensus isoforms were identi ed by transcript clustering analysis of the FLNC reads. After removing the redundant sequences, we obtained 81,017 transcripts. We also predicted 4,971 alternative splicing events. In addition, we identi ed 33,058 SSR and 42,725 CDS regions. Furthermore, 77,221 transcripts were annotated and 3,043 lncRNAs were predicted. And all of these raw sequencing data has been successfully uploaded to the BioProject (Accession code: PRJNA531296; link: https://www.ncbi.nlm.nih.gov/sra/PRJNA531296)).
Based on the data and a combination of NGS and SMRT sequencing, we also screened the differentially expressed transcripts in M. sativa L. roots under CK (control), high salt (400 mM NaCl) and drought (25% PEG) conditions by NGS. In addition, the differentially expressed transcripts were documented by GO, COG, eggNOG, KOG, and KEGG pathway analyses. GO analysis, as a gene functional classi cation system, can comprehensively describe the genetic characteristics and gene products of organisms [59]. In our study, GO analysis uncovered that differentially expressed transcripts in M. sativa L. roots under NaCl and PEG stresses may be involved in the metabolic process, cellular process, single-organism process, response to stimulus, biological regulation. Meanwhile, we discovered that catalytic activity, binding, transporter activity, nucleic acid binding transcription factor activity might be associated with the NaCl and PEG stresses in M. sativa L. roots. Besides, our data exhibited that the differentially expressed transcripts were also mainly enriched in cell, organelle, and membrane. Therefore, we suggested that these differentially expressed transcripts in cells have made signi cant contributions in the cellular metabolic process of M. sativa L. after NaCl and PEG stress through regulating the catalysis, binding, transport, or transcriptional activity.
Moreover, based on the KEGG analysis, we discovered that these differentially expressed transcripts in M. sativa L. roots under NaCl and PEG stresses were mainly enriched in carbon metabolism, biosynthesis of amino acids, glycolysis/gluconeogenesis pathways. Carbon metabolism is the uptake, transport, storage and utilization of carbon by plants [60]. Carbon metabolism can affect the root growth and water absorption of plants by changing physiological and hydraulic functions, thus affecting the stress resistance of plants [61]. The biosynthesis of amino acids has also been demonstrated to enhance plant adaptability to abiotic stress by participating in the alteration of certain physiological metabolism, the regulation of related gene expression and the activity of key enzymes in plants [62,63]. Besides, it has been con rmed that the glycolysis/gluconeogenesis pathway mainly responds to the abiotic stress by regulating the ATP supply of plants [64]. Therefore, these three pathways might be in connection with the NaCl and PEG stresses of M. sativa L.
In our study, we further screened the Top10 upregulated and Top10 downregulated transcripts in M. sativa L. roots under NaCl stress vs CK, PEG stress vs CK, and NaCl stress vs. PEG stress based on the carbon metabolism, biosynthesis of amino acids, glycolysis/gluconeogenesis pathways. We demonstrated that ADH1 was upregulated, PEPC and MJG19.6 were downregulated in M. sativa L. roots under NaCl stress compared to that under CK; PCKA and GAPC1 were upregulated, PEPC was downregulated in M. sativa L. roots under PEG stress compared to that under CK; meanwhile, BCAT2, PCKA and GAPC1 were upregulated, ADH1 and PEPC were downregulated in M. sativa L. roots under PEG stress compared to that under NaCl stress. Therefore, we proved that ADH1, PEPC and MJG19.6 were associated with NaCl stress in M. sativa L. roots; PCKA, GAPC1 and PEPC were associated with PEG stress in M. sativa L. roots; PCKA, GAPC1, ADH1 and PEPC were also differently expressed between NaCl stress and PEG stress. Among them, ADH1 (MF: alcohol dehydrogenase activity, zinc ion binding; CC: cytoplasm; BP: oxidation-reduction process) was connected with glycolysis/gluconeogenesis, fatty acid degradation, tyrosine metabolism, alpha-Linolenic acid metabolism pathways. PEPC (MF: phosphoenolpyruvate carboxylase activity; CC: cytosol; BP: carbon xation, tricarboxylic acid cycle and photosynthesis) was connected with pyruvate metabolism, carbon xation in photosynthetic organisms and carbon metabolism pathways. MJG19.6 (MF: histidine dehydrogenase activity, zinc ion binding, NAD binding; BP: histidine biosynthetic process, spermidine biosynthetic process, response to UV, pollen development, oxidation-reduction process; CC: chloroplast stroma) was related to histidine metabolism and biosynthesis of amino acids pathways. PCKA (MF: phosphoenolpyruvate carboxykinase (ATP) activity, ATP binding, kinase activity; BP: gluconeogenesis, phosphorylation) was related to glycolysis/gluconeogenesis, citrate cycle (TCA cycle), pyruvate metabolism, carbon xation in photosynthetic organisms, carbon metabolism. GAPC1 (MF: NAD + activity, NADP binding, NAD binding; CC: cytoplasm; BP: glycolytic process, oxidation-reduction process) was in connection with glycolysis/gluconeogenesis, carbon xation in photosynthetic organisms, carbon metabolism and biosynthesis of amino acids. Therefore, we suggested that these transcripts might provide novel insight into stress-response mechanisms in M. sativa L. roots. In future research, we will also dig into a large amount of experimental data, and focus on the veri cation of relevant transcripts.

Conclusions
In this study, we comprehensively identi ed the full-length transcriptome sequences in M. sativa L. roots under abiotic stress (low temperature, darkness, high salt, drought, and ABA) for the rst time. We also identi ed the differentially expressed transcripts and analyzed the annotations of the differentially expressed transcripts in M. sativa L. roots under high salt and drought conditions by combining PacBio sequencing and NGS technologies. Besides, we also screened and identi ed the aberrantly expressed transcripts (ADH1, PEPC, MJG19.6, PCKA and GAPC1), which might be potentially involved in the in uence process of M. sativa L. roots under NaCl and PEPC stresses. Therefore, ADH1, PEPC, MJG19.6, PCKA and GAPC1 might serve as potential biomarkers for the vigorous growth of M. sativa L. On the whole, our study systematically investigated the full-length transcriptome sequences in M. sativa L. roots under abiotic stress. This study provides useful information on the abiotic stress-mediated response in M. sativa L. as well as other plants. However, studies are needed to further validate the enormous amount of biological information in our study. Furthermore, it will be necessary to further explore functions and mechanisms of transcripts which we have identi ed.

Materials And Methods
Plant cultivation and stress treatment. M. sativa L. was provided by Institute of Animal Husbandry and Veterinary Science, Beijing Academy of Agricultural Sciences, Beijing, China. The plants were grown in soil in a greenhouse under a light/ (16 h)-dark (8 h) regime, temperature of 25 ± 1 °C, and relative humidity of 80 ± 5% and were watered twice weekly. For NaCl treatment, M. sativa L. seedlings were cultured by increasing the concentration of NaCl (1/4 (100 mM) NaCl, 1/2 (200 mM) NaCl and 400 mM NaCl) in medium to avoid shock effect; For PEG treatment, M. sativa L. seedlings were also cultured by increasing the concentration of PEG (Final concentration was 25% PEG) in medium; For darkness treatment, M. sativa L. seedlings were cultured in a completely dark incubator; For low temperature (4 °C) treatment, we adopted light incubator and temperature-controlled refrigerator to simulate the low-temperature environment, M. sativa L. seedlings were moved into the controlled temperature refrigerator (the cooling speed was 2 °C/h, and the nal constant temperature was 4 °C). For ABA treatment, M. sativa L.
seedlings were moved into the medium including100 µM ABA. After collection, the root samples were carefully washed with the corresponding temperature of sterile water and dried, and stored in liquid nitrogen to reduce damage.
RNA sample preparation, Library preparation and SMRT sequencing. The M. sativa L. roots were exposed to different stress conditions, including darkness, low temperature (4 °C), 400 mM NaCl, 25% PEG and 100 µM ABA for 0, 3, 6, 12, 24, 48, and 72 h. Total RNA was isolated using TRIzol (Invitrogen, Carlsbad, CA, USA) following the manufacturer's instructions. Electrophoresis on a 1% agarose gel was used to detect RNA degradation and contamination. The NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Rockland, DE, USA) was used to analyze the purity and concentration of RNA (OD260/280). RNA integrity was evaluated by the RNA Nano 6000 Assay Kit on the Agilent Bioanalyzer 2100 system (Agilent Technologies, Palo Alto, CA, USA). All total RNA samples were equally mixed together for the following experiments. Then the poly (T) oligo-attached magnetic beads were utilized to purify the mRNA from the mixed RNAs (5 µg RNAs  Isoform sequence clustering. Iterative sequence clustering was analyzed by SMRT analysis software with the iterative clustering for error correction algorithm. Similar sequences were clustered, each of which yielded a consensus isoform. The consistent sequences in each cluster were corrected by applying quiver. Finally, we obtained high-quality transcripts (HQ, high-quality isoforms) with > 99% accuracy.
Transcriptome integrity assessment. BUSCO was used to evaluate the integrity of the transcriptome after removing the redundancies [65].
Alternative splicing analysis. Pre-mRNA can be spliced in a variety of ways. Different exons are selected to produce different mature mRNAs, which are translated into different proteins and constitute the diversity of biological characters. This kind of post-transcriptional mRNA processing is called alternative splicing. Alternative splicing can be predicted based on the transcripts of three generations of noreference transcriptome after de-redundancy.
Simple sequence repeat (SSR) analysis. Transcripts > 500 bp were screened, and the SSRs were analyzed using the MIcroSAtellite identi cation tool.
Prediction of coding sequences. Coding sequence (CDS) sequences were predicted using TransDecoder software based on the open reading frame (ORF) length, amino acid sequence, Pfam database protein structure domain, and log-likelihood score [66].
SNP analysis. STAR software was adopted to compare the Reads of each sample with the Unigene sequence [67], and SNP sites were identi ed using GATK based on the SNP Calling process of RNAsequence [68]. These SNP sites can be applied to analyze whether they can affect gene expression levels or the types of protein products. The identi cation criteria of SNP include: (1) No more than 3 consecutive single base mismatches occurred in the range of 35 bp; (2) The standardized SNP quality value is greater than 2.0.
LncRNA prediction from PacBio sequences. It is necessary to screen the coding potential of the transcripts to determine whether they have coding potential and to lter out the transcripts with coding potential to obtain the predicted lncRNAs. Based on the previous studies [69,70], the lncRNAs were then predicted using the coding-non-coding index (CNCI) [27], pfam protein structure domain analysis, coding potential assessment tool (CPAT), and the coding potential calculator (CPC) [71].
Prediction of lncRNA targeted transcripts. Target genes were predicted for the 3,043 predicted lncRNA sequences. LncRNA and mRNA acted on each other due to the complementary pairing of the bases. LncRNA target genes were mainly predicted using the LncTar target gene prediction tool.
Transcription factor analysis. A transcription factor refers to the sequence of nucleotides in upstream gene-speci c proteins. These proteins regulate the combination of RNA polymerase and the DNA template, and then regulate gene transcription. ITAK software was used to predict the plant transcription factors [72].  [80]. The sequences of the gene primers are displayed in Table S1.