High RNA quality extracted from the tolerant crop Cyamopsis tetragonoloba (L.) despite possession of low RNA integrity number

Abstract Salinity is one of the most limiting constraints of crop production. Nevertheless, highly-tolerant crop plants (e.g. ‘Guar’ Cyamopsis tetragonoloba) can thrive well under salinity conditions. RNA-Seq plays a central role in identifying genes involved in the response mechanisms to salt stress in plant species. High-quality RNA, free of genomic DNA is a critical determinant for success in subsequent downstream experiments to shed light on tolerance mechanisms. In this research, an inspection of two different total RNA isolation protocols was employed for preparing highly pure and intact RNAs with good yield from Guar tissue under both high salinity and control conditions. The extracted RNA appeared in all quality metrics to be of adequate quantity and quality for most downstream applications, including high-throughput transcriptome sequencing and expression analysis. However, the outcomes showed that the most widely used RNA quality measurement, the RNA Integrity Number (RIN), is a poor indicator with a weak correlation to the integrity of RNA from this plant. In contrast to human and animal cells, plants have various ribosomal RNA species. Consequently, whenever the RNA was separated on an agarose gel, several RNA bands did appear, which gives the false impression of being unable to analyze the results of the bioanalyzer integrity. Low RIN number RNA samples were further processed by RNA-sequencing RNA-seq experiments from guar. RNA-seq analysis via Illumina’s paired-end method was carried out to trace back and evaluate the extracted total RNA, resulting in high-quality data for subsequent stress tolerance research.


Introduction
Soil salinization is a major concern to the environment and worldwide crop production. Studies examining the salinity tolerance of plants are important for the sustainable economic growth of food production in saline lands. Towards this aim, salt-tolerant species are vital to salt stress investigation [1,2]. The salinity tolerance under salt stress is determined by efficient regulation of different metabolic pathways, physiological processes and gene networks [3][4][5][6]. One example of this kind of species is Guar (Cyamopsis tetragonoloba (L.) Taub.). Guar is a summer annual legume, a potential alternative crop in semi-arid regions because of its high tolerance to drought [7] and salinity [8,9].
Identification of candidate genes in Guar for adaptive traits and molecular markers that are linked to variation under salinity requires a cost-effective and suitable approach. Improved sequencing and transcriptome analysis are new technologies that provided a means to monitor genome-wide changes in transcriptomes to study expression patterns under salinity [10]. Over the last decade, RNA-seq has been extensively used in researching the responses of plants to abiotic stresses, particularly salt stress [11][12][13][14] as an effective technique for investigating the expression of a considerable number of genes in a specific tissue at a particular interval of time. Such technology could uncover candidate genes and main metabolic pathways involved in salinity tolerance through examining the functional annotation of RNA-seq and differentially expressed genes (DEGs) [15]. However, only a set of RNA that truly represents cell transcription for a particular sample will provide accurate information on the characteristics and expression levels of the transcripts analyzed. Therefore, acquiring sufficient, intact, clear, and stable RNAs is a necessary precondition and a major determinant for the progress of many RNA-seq downstream approaches used to characterize such tolerance strategies [16].
unfortunately, obtaining RNA from plants is a rather more complicated and harder process than those from humans and animals [17][18][19]. The first obstacle is to extract the appropriate quality of RNA (i.e. undegraded RNA, lack of contaminants). RNA is very labile in comparison to DNA; degradation of RNA can occur rapidly due to many factors (cellular RNases, temperature or chemical treatment) contributing to this process [20]. An even worse issue would be the abundance of plant complex primary and secondary metabolites (ex, phenolics and polysaccharides) that differ drastically inside and between species [21][22][23][24][25][26], which could restrict the use of such technologies, but may also obstruct experiments that deal with RNA in particular [27]. A second obstacle is to determine RNA quality metrics predicting efficiency in sequencing [28], validated here as the total number of long scaffolds (approximately 1000 bp) constructed from the Illumina sequence reads, providing an estimation of the number of full-length transcripts in a sample [29].
Various studies on RNA commercial kits or extraction techniques revealed the variability in the amount and consistency and quality of the collected RNA, which could influence the outcomes of downstream applications [30][31][32]. These studies, discussing the effectiveness of a specific RNA extraction protocol in a variety of species, suggest that the quantity and quality of extracted RNA could differ considerably between species and that RNA extraction protocols should be adapted for each species and type of tissue. Thus, the selection of a suitable technique for RNA extraction is crucial. Generally, the RNA Integrity Number (RIN) is the measurement of choice to evaluate the integrity and extent of degradation of the RNA samples attained by the 2100 Agilent BioAnalyzer based on the microcapillary electrophoretic RNA separation. The RIN scale ranges from (degraded) 1 to (intact) 10 and generally RIN > 9 is necessary in most core facilities [33]. In contrast to RIN, RNA Integrity Number equivalent RIN e reflects the relative signal ratio in the fast zone to the 18S peak signal, which is determined automatically by the 2200 TapeStation software. The estimated values are displayed in the same manner as the RIN from 1 to 10, with the maximum RNA quality being assigned a RIN e value of 10 [34]. In this study, the assessment of two distinct protocols to isolate total RNA from Guar was carried out. In addition, evaluating and uncovering the various metrics that are intended to measure RNA quality would provide a generic empirical analysis of the key determinants of sequencing performance.

Plant material
The previously characterized guar accession "BWp 5595" [35] for its contrasting tolerance to salinity was subjected to salt stress in a pot experiment. The seeds were surface sterilized using 5% Chlorine bleach, sodium hypochlorite followed by three rinses with autoclaved distilled water for 30 min. The seeds were planted in culture plates under conditions of growth chamber (light/dark cycle 16 h/8 h at a temp of 21° C) till the development of radicals. The seedlings were transferred into pots with a diameter of 10.5 inches in a mixture of the equivalent ratio of peat moss, perlite and soil under a 16 h photoperiod and a temperature of 25° C (day) and 22° C (night). The salt treatment was applied in a completely randomized design (CRD) once the seedlings had six mature leaves (5 weeks) [35]. A concentration gradient of sodium chloride solution was applied in the treatment pots to decelerate the salt damage: the first day 50 mmol/L, then raised daily by 50 mmol/L and eventually maintained at 200 mmol/L until the end of the experiment, while the control plants were periodically watered. After 120 h, fresh leaves (3-biological replicates) from NaCl-treated and control plants were collected and directly immersed in liquid nitrogen.

RNA extraction and quality assessment
There is no optimum single technique to extract total RNA from all tissues or species. Two distinct methods of RNA extraction were evaluated to identify any impact of the RNA purification process on the quality of RNA (Table 1). Total RNA was extracted following the manufacturer's instructions of the commercial column-based system (QIAGEN RNeasy ® plant Mini Kit) and the non-commercial solvent phenol chloroform-based RNA extraction protocol [36]. DNA was successfully eliminated using DNase Digestion, DNase I, Bovine pancreas, > 1800 u, RNase Free (Biomatik), enabling us to continue with RNA free from DNA. RNA purity and yield were calculated using a NanoDrop 8000 Spectrophotometer (Thermo Fisher Scientific), while RNA integrity was calculated using three metrics: 1-Inspected by denaturing formaldehyde agarose gel electrophoresis, 2-The ratio (26S/18S) of ribosomal RNA subunits and 3-the RNA integrity number (RIN) and its counterpart RNA integrity number equivalent (RIN e ) values, which are recognized as the "standard" for the integrity and quality of the RNA [34,37,38]. The RIN was evaluated through the Agilent 2100 Bioanalyzer with its plant RNA Nanochip assay under the plant RNA configuration available option in the release B.02.07 of the Bioanalyzer program, whereas RIN e was evaluated through the Agilent 2200 TapeStation with its RNA ScreenTape assay following the manufacturer's protocol. The dried total RNA samples (RNAstable, Biomatrica) were redirected to Macrogen Inc. (Seoul, Korea) for RNA-seq analysis.

Library preparation and RNA sequencing
A total of 12 RNA-Seq libraries (2 conditions x 2 extraction methods x 3 biological replicates) were constructed using the TrueSeq Stranded mRNA LT Sample prep Kit (Illumina) for paired-end application ( Table 1). The mRNA was fragmented chemically succeeded by the synthesis of cDNA and ligation of the adapter. The stranded library molecules were analyzed on a 2200 TapeStation (Agilent Technologies) for peak size selection and region molarity calculations. For each sample, the transcriptome sequencing was carried out by using Illumina HiSeq 2500 system.

Transcriptome assembly
The sequence data were investigated for quality using the FastQC v0.11.5 [39]. The raw fastq sequences were trimmed for adapters and read length using Trimmomatic v0.36 software. De novo transcriptome was assembled using Trinity ver2.4.0 [40,41] along with a K-mer set to be 32. The number of transcripts ≥1000 bp (a sequencing success metric preferably symbolizes an estimation of the number of complete transcripts) was calculated, which could trace back and prove a sample RNA integrity [28].

Data analysis
The data were statistically analyzed using the IBM SpSS Statistic Software package, Version 25 (SpSS Inc./ IBM Group, Armonk, Ny, uSA). The mean comparison was conducted with the Duncan test (P value <0.05).

Results and discussion
C. tetragonoloba is a recently domesticated annual legume crop species and the only cultivated crop species among the trinity species of the genus Cyamopsis [42]. Its considerably higher tolerance to most abiotic stress conditions is widely recognized [35], which reflects its valuable genetic makeup reserve. In order to minimize economic harm and ensure crop species productivity and food security, it is necessary to generate and render its tolerance mechanisms information available, which further contributes to the overall abiotic tolerance [43]. The recent developments in high-throughput sequencing technologies, including RNA-seq, would certainly help in examining Guar transcriptome and discover the mechanisms influencing its tolerance [44].
The objective of RNA-seq is to portray the transcriptome at the time the tissues were recovered; therefore, acquiring a suitable amount of high-quality, undamaged RNA is the first vital step toward the investigation of next-generation sequencing platforms [45]. Different methods of RNA extraction or commercial kits differ in yield as well as in the integrity and quality of the collected RNA, which could influence the outcomes of downstream applications [32]. Hence, the choice and assessment of a suitable RNA extraction method are crucial.
The most popular approaches for RNA extraction from plant developed so far are generally defined by the following methods: urea [46]; detergents such as cetyl trimethyl ammonium bromide (CTAB) [47][48][49]; acid phenol [50]; guanidinium/guanidine salts [51][52][53] or sodium dodecyl sulfate [54,55]. Such methods, including the widely used commercial kits (Ambion RNAqueous Kit, Invitrogen TRIzol Reagent, and Qiagen RNeasy plant Kit), may be sufficient for specific plant species and tissues. This study aimed to compare the two RNA extraction methods: column-based from Qiagen RNeasy mini kit and the organic solvent-based extraction method (phenol/chloroform) to resolve any effect these methods on the RNA integrity and to test their efficacy in extracting good quality RNA appropriate for functional genomic researches such as RNA-seq.
The success of total RNA extraction should be verified by testing the quantity, quality and integrity of RNA. RNA quality is usually judged in three ways. Firstly, agarose gel electrophoresis demonstrated intact and bright bands representing the ribosomal subunits of the undegraded total RNA [20,[56][57][58]. Secondly, measurement of absorbance 1-The absorption ratios A260/230 and A260/280 were used to detect polysaccharide/polyphenolic contaminants, protein contaminants and high salt, respectively [33,59]. pure RNA is supposed to have a value of 2.0-2.1 for OD A260/280, whereas OD 260/230 nm of pure RNA has a value around 2 [27,60,61]. 2 -utilizing an instrument such as the Agilent Bioanalyzer, while its RIN values on a scale from 1 to 10, with 1 by far the most damaged and 10 the most integrated RNA sample. Such range is based on the distribution of RNA molecules size measured by microcapillary electrophoresis [58,62,63]. 3-The efficiency of total RNA extraction was demonstrated by the success of the sequencing and assembly process [28]. It was measured as the number of successfully assembled long scaffolds (≥ 1000 bp) from the Illumina sequence reads, which represents an estimation of the total number of full-length transcripts in the sample. Collectively, to be precise, samples are considered adequate for sequencing if they fulfill the minimum of these criteria: OD 260/230 ≥ 1.5, OD 260/280 ≥ 1.9, r26S/18S ≥1, RIN ≥7 and total RNA concentration ≥3 μg.

RNA concentration and quality assessment
The integrity, quality and quantity of the isolated RNA have been verified by three separate means. The electrophoresis analysis of the 1 percent formaldehyde denaturing agarose gel permitted validation in such a way that the separation of the bands of the various types of RNAs was easily visualized for the two protocols examined in this study. plant leaf tissue is widely utilized in RNA purification and usually produces several RNA bands due to different species of ribosomal RNA, included in large and small subunits. It showed intact and sharp 25S, 24S, 23S, 18.5S, 18S, 16S, 5.8S and 5S rRNA bands that should be present in a high-quality and integrity RNA sample to be used in downstream applications [64]. The respective densities of the 23S, 24S and 25S rRNA bands were around 1.5-and 2-fold those of the corresponding 16S, 18S or 18.5S rRNA bands, which indicate elevated RNA integrity. Moreover, there were no noticeable bands all across loading wells, suggesting no contamination of genomic DNA (Figure 1). DNase digestion of the purified RNA with RNase-free DNase I turned out to be efficient (Biomatik).
The ratios between the absorbance of RNA at 260 nm and 280 nm (260/280 ratio) ranged from 2.02 ± 0.06 to 2.07 ± 0.07, 1.89 ± 0.09 to 1.91 ± 0.06 for RNeasy and phenol-based extractions, respectively ( Table 2). The ratios between the absorbance at 230 nm and 260 nm (230/260 ratio) via phenol-based extraction ranged between 0.40 ± 0.03 and 0.57 ± 0.09 in all samples, which suggested high salts concentration. The RNeasy A260/A230 ratio was above 2.0 for all the samples that implied substantial low pollutants of proteins, salts, polysaccharides or polyphenols.
The yield of total RNA extracted by the phenol-based method was higher at the rate of 18.21 to 22.13 μg compared to RNeasy Kit 8.44 to 10.59 μg, while the commercial kit was higher in RNA quality ( Table 2). The results of the NanoDrop test suggested that the RNAs were suitable for RNA-Seq. The concentration and purity were also checked by Agilent 2100 bioanalyzer test. The Agilent 2100 bioanalyzer system confirmed the absence of genomic DNA in the RNA samples. The protocol yielded total leaf RNA of 3 µg up to 12 µg ( Table 2). The minimal concentration for library construction for RNA sequencing (transcriptome analysis) using the Illumina platform is ≥0.1 μg using the Illumina Truseq stranded mRNA Kit. The differences in measurements of RNA concentration using different instruments may have been due to technical differences in the RNA measurement methods. pipetting errors may also be a factor, as 1 μL of sample is measured using the Agilent 2100 instrument while 2 μL of the sample was measured using the Nanodrop instrument.
RIN values were calculated on the Agilent 2100 Bioanalyzer employing the plant RNA module [65], the highest quality RNA (RIN 6.1-6.3) was acquired through using the RNeasy Kit (Qiagen) (Figures 2 and 3).
Regarding the phenol-based method, the RIN acquired was roughly 0.4-0.7 lower when compared to the RNeasy Kit, although no evidence of deterioration was observed from the RNA gel. Agilent TapeStation with its independent quality score RNA Integrity Number equivalent (RIN e ) for total RNA confirmed its counterpart RIN value with an average RIN e of 6.4 through RNeasy Kit while 5.6-5.7 obtained using the phenol based method.  The RIN values span from 1 (deteriorated) to 10 (undamaged); however, commonly, RIN ≥ 7 is usually deemed to be adequate for RNA Sequencing technologies [66]. Below this is considered as RNA degradation and will impact transcript quantification. RIN measurement was originally intended to examine the integrity of RNA isolated from mammalian cells [63]. In contrast to human and animal cells, plant tissues contain chloroplast ribosomes apart from mitochondrial and cytosolic ribosomes (Table 3) [67]. Moreover, Green tissue may contain additional rRNAs in contrast to non-green tissues, such as roots [65]. Therefore, when the RNA was separated on an RNA gel, typically multiple RNA bands appeared which confuse and fail to interpret the integrity results.
unfortunately, no assay is present to properly recognize the RIN for leafy plant tissues, and this cannot be adjusted employing the algorithm designed by Agilent in the Beta version of the software with plant-specific RNA analysis configurations is still being tested [68]. There are two concerns with plant total RNA. Firstly, the plant rRNAs display different peaks than those of animal rRNAs. Secondly, the peaks of chloroplast rRNA are regarded as degradation byproducts in the analysis and therefore reveal lower RIN values. The short peaks beside the 18S/28S peaks were much sharper and are likely a type of chloroplast rRNA, which, regrettably, cannot be examined by the Agilent system.
Owing to those RNA bands, the Agilent Bioanalyzer could not adequately analyze intact total RNA leading to minimal RIN levels (less than 8) or even no RIN values [69]. Moreover, the RNA extraction method did not significantly affect the RIN value. For instance, extraction via the RNeasy mini kit resulted in significantly higher RIN than the phenol-based method; nevertheless, this is probably due to the increase in gel background. It was revealed by comparing the RIN to the gel, as it was obvious that the RIN did not properly represent the integrity of the RNA. Hence, the RIN values should be verified by visualizing the integrity of the RNA band on the RNA gel.

Library construction, sequencing, and raw data retrieval
In order to ensure the highest precision of the sequencing data, it is necessary to check the quality and precisely calculate the concentration of the cDNA libraries prior to sequencing. It is also essential  to evaluate the distribution of the fragment size of the RNA-seq libraries. The fragment sizes must remain within the predicted range of sizes (depends entirely on the protocol) and libraries must be devoid of ligation/pCR artifacts (typically small compared to the target molecules). The size of the fragments could be measured through electrophoresis, ideally by using a reliable system like the ABI Bioanalyzer.
The fragmentation quality and size were checked at the end of library construction (Table 4; Figure 4). The RNA amount in each library was quantified with the Agilent 2200 TapeStation. All the generated libraries contained similar amounts of fragmented RNA (37.91 ± 1.4 ng/μL) and had similar mean insert sizes of the libraries in a range from 298 to 299 bp. This result shows that the yield of the libraries generated with both extraction protocols and conditions is comparable in yield and quality. Twelve libraries of cDNA (2 extraction methods × 2 conditions × 3 biological replicates) were constructed from both treatment and control total RNA on the Illumina HiSeq platform, and 447,248,338 raw reads were acquired with a data set size of 37 million paired-end reads per sample with a length of 101 bp, which were submitted under NCBI Bioproject accession number pRJNA644633. The basic statistics of the total number of bases, reads, GC (%), Q20 (%) and Q30 (%) were calculated for the six samples and summarized in (Table 5). The respective Q20 scores were 97.58 to 98.41% and 96.91 to 98.23% for RNeasy and phenol-based methods, respectively, which demonstrates excellent sequence quality.

RNA-Seq de novo assembly
Tightly restricted settings were executed which involved a simple correction to guarantee superior quality assembly. upon elimination of low-quality bases and adapters, roughly 92.34 percent of the reads advanced through the quality trimming and a total of 412,970,860 clean reads were acquired with a great quality score of Q30 > 96.21 percent. (Table 6; Figure 5). The construction of the transcriptome was accomplished utilizing Trinity v2.4.0 including a K-mer value of 32. RNeasy based RNA samples did generate slightly better results with a maximum of 68,148   transcripts, rebuilt to ~295 Mbp with the greatest transcript length of 11676 bp and an N50 of 3354 bp ( Table 7).
The goal of this paper was focused on the total count of "longest" scaffolds (≥1000 bp), which could be used as a way to evaluate the general effectiveness of  the extracted total RNA. This is primarily because the deteriorated RNA restricts the reconstruction of long scaffolds influencing sequencing performance. Likewise, our findings suggest that contaminants in RNA samples intervene with the cDNA synthesis from mRNA. A total of 51,381 transcripts were ≥ 1000 bp in length, which is far more comparable to those generated from most of the plant species in the extensive study for methods for isolating total RNA evaluation [29].

Conclusions
The complexity of isolating high-quality RNAs from plant cells containing plentiful complex secondary metabolites generally hampers research projects that deal with RNA.
The results from this study indicate that the commercial method RNeasy mini kit is a slightly better and suitable protocol for isolating intact and high-yield RNA from guar leaves without any contamination suitable for most transcriptome downstream processes. There were less pronounced significant differences in the quality, quantity, and integrity of the extracted total RNA between control and treated samples through both protocols. These outcomes were further proved through the effectiveness in the quality of either the constructed cDNA library or the assembled transcriptome. However, almost all plant species possess variable sizes of rRNA, forming intricacies in the reading of the RIN. Even with the recent upgrade of the Agilent 2100 Expert program, a lower RIN value than normal (< 6 − 7) remains obtainable. The data suggested that perhaps the issue appears to exist in the BioAnalyzer algorithm for evaluating the plant RNA containing several RNA bands rather than RNA integrity. Moreover, our results demonstrate that the most widely employed RNA quality metric (RIN) is not a strong predictor of the quality of plant RNA integrity. These outcomes offer new insights into the most effective methods for extracting high-quality RNA for sequencing and assembling plant transcriptomes. The techniques and suggestions given above could improve the cost and efficiency of RNA sequencing for genome centers and individual labs.

Disclosure statement
No potential conflict of interest was reported by the authors.