Reference genes selection for Calotropis procera under different salt stress conditions

Calotropis procera is a perennial Asian shrub with significant adaptation to adverse climate conditions and poor soils. Given its increased salt and drought stress tolerance, C. procera stands out as a powerful candidate to provide alternative genetic resources for biotechnological approaches. The qPCR (real-time quantitative polymerase chain reaction), widely recognized among the most accurate methods for quantifying gene expression, demands suitable reference genes (RGs) to avoid over- or underestimations of the relative expression and incorrect interpretation. This study aimed at evaluating the stability of ten RGs for normalization of gene expression of root and leaf of C. procera under different salt stress conditions and different collection times. The selected RGs were used on expression analysis of three target genes. Three independent experiments were carried out in greenhouse with young plants: i) Leaf100 = leaf samples collected 30 min, 2 h, 8 h and 45 days after NaCl-stress (100 mM NaCl); ii) Root50 and iii) Root200 = root samples collected 30 min, 2 h, 8 h and 1day after NaCl-stress (50 and 200 mM NaCl, respectively). Stability rank among the three algorithms used showed high agreement for the four most stable RGs. The four most stable RGs showed high congruence among all combination of collection time, for each software studied, with minor disagreements. CYP23 was the best RG (rank of top four) for all experimental conditions (Leaf100, Root50, and Root200). Using appropriated RGs, we validated the relative expression level of three differentially expressed target genes (NAC78, CNBL4, and ND1) in Leaf100 and Root200 samples. This study provides the first selection of stable reference genes for C. procera under salinity. Our results emphasize the need for caution when evaluating the stability RGs under different amplitude of variable factors.


Introduction
Calotropis procera (Aiton) W. T. Aiton (Apocynaceae) is an evergreen shrub highly tolerant to drought and salt stresses with remarkable invasive ability in arid and semiarid regions [1]. Due to its pharmacognostic features, this shrub has been used in traditional medicine for the treatment of various diseases [1]. Ecophysiological studies have emphasized the superior physiology a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 in 0.5% (v/v) sodium hypochlorite solution for 5 min. Seeds were germinated in Petri dishes with wet filter paper and kept in a growth chamber (at 25˚C, 12 h photoperiod, and 70% relative humidity). After radicle emergence, seedlings were transferred to pots containing 7 kg of sandy soil and maintained in a greenhouse for three months. Plants were distributed in three independent experiments (S1 Table): i) Leaf 100 -young plants watered every day with NaCl (100 mM), during 45 days. At 30 min, 2 h, 8 h and 45 days of salt stress, youngest fully expanded leaves were collected; ii) Root 50 and iii) Root200-young plants watered every day with NaCl (50 or 200 mM for Root 50 and Root 200 , respectively). At 30 min, 2 h, 8 h and 1 day of salt stress, root tissue samples were collected. Control samples were watered daily with distilled water and collected for each salt stress time, respectively in each experiment. All samples were collected from three plant replicates. Samples were frozen in liquid nitrogen and stored at -80˚C until RNA isolation.

Total RNA isolation and cDNA synthesis
Total RNA was isolated from samples (leaf and root tissues) using the SV Total RNA Isolation System (Promega, Fitchburg WI, USA) by following the manufacturer's instructions. RNA integrity was checked in 1.5% (w/v) agarose gel electrophoresis, stained with blue-green loading dye I (LGC Biotecnologia, SP, Brazil) and the quantity and quality of RNA samples were evaluated by fluorometry (Qubit, Oregon, USA). Reverse transcription reaction was carried out with 1 μg of total RNA, using the GoScript Reverse Transcription System Kit by (Promega, Fitchburg WI, USA) according to manufacturer's instructions (Promega) and stored at -20˚C.

RNA-Seq libraries: Synthesis, sequencing, and analysis
We also performed transcriptome sequencing of C. procera leaves samples (NCBI Sequence Read Archive identification: PRJNA508417) exposed to NaCl (100 mM) at 30 min, 2 h, 8 h and 45 days after salt stress, including not stressed control samples (0 h and 45 days), according to the Leaf 100 experiment description (S1 Table and S1 Fig). Each of the six RNA-Seq libraries was composed by a bulk combining equimolar RNA amounts of the three biological replicates were sequenced using Illumina paired-end sequencing technology on Illumina Hi-Seq TM 2500 platform (S1 Fig). After cleaning the raw reads and discarding low-quality reads, we ran Trinity [26] to assemble the clean reads into transcripts as described in Haas et al. [27].
Transcript quantification for RNA-Seq reads was performed with RSEM based on mapping the RNA-Seq reads of each experimental library (treatments 30 min, 2 h, 8 h compared to 0 h control and 45 days after stress imposition x 45 days control), against the assembled transcriptome [28] (S1 Fig). To estimate differential gene expression between our libraries, we used the edgeR tool [29], implemented in the Bioconductor package [30], requiring R software for statistical computing. The differentially expressed transcripts [log 2 Fold-Change (FC) > 2.0 or < -2.0, and P-value < 0.05] were identified based on comparisons between experimental libraries and respective controls, using the normalized number of fragments mapping on each library. The 'Fold-Change' (FC) term afore-mentioned is a measure describing how much a quantity changes as compared with an initial (control) to a final value (treatment).

RNA-Seq libraries
Ten RGs were selected based on promising candidate genes according to previously published papers for other plant species [31][32][33], besides Log 2 FC between +1.0 and -1.0 and Pvalue > 0.05 for all the RNA-Seq expression contrasts (Table 1).
We selected, additionally, three target genes (TGs) related to salt stress response from the C. procera leaf transcriptome (RNA-Seq) to be used in qPCR gene expression analyses. TGs choice was based on two factors: (i) on their up-regulation (Log 2 FC > 2.0 and P-value < 0.05), at least one RNA-Seq expression contrast; and (ii) reported participation in the plant response to saline stress. The following TGs were scrutinized: ND1 (NADH dehydrogenase subunit 1 [34], CNBL4 (Calcineurin B-like protein 4 [35,36], and NAC78 (NAC domain-containing protein 78-like [37,38] (Table 1).

qPCR setup
The qPCR reactions were performed on PCR LineGene 9600 (Bioer, Hangzhou, China) using GoTaq qPCR Master Mix (Promega, Fitchburg WI, USA). Briefly, a 10 μL reaction mixture consisted of 5 μL SYBR Green Super Mix (Applied Biosystems, Foster City CA, USA), 2 μL of diluted cDNA (1/10), 0.3 μL for each primer (5 μM) and 2.4 μL ddH 2 O. Non-template controls were also included for each primer pair. Reactions were carried out under the following conditions: 95˚C for 2 min, followed by 40 cycles of 95˚C for 15 s and 62˚C for 1 min. The melting curve was generated by varying the amplification temperature from 65-95˚C. All qPCR reactions were carried out in triplicate (biological and technical) [25]. The amplification efficiency (E) was determined from a standard curve generated by serial dilutions of cDNA (1/10, 1/100, 1/1000, and 1/10000) for each primer, in triplicate, and calculated by using the equation:  [39]. Slopes in the range of -3.58 to -3.10 were considered acceptable for the qPCR assay [40]. These slope values correlated to amplification efficiencies between 90% (E = 1.9) and 110% (E = 2.1).

Analysis of the reference genes expression stability
Three of the most notorious softwares available-geNorm v 3.5 [21], NormFinder v. 0.953 [22], and BestKeeper [23]-were used to evaluate the expression stability of ten candidate RGs: ACT104, ACT, CYP23, FBOX, MAPK2, UBQ11, UBP25, PPR, r40S and TBB4 (Table 2). For geNorm and NormFinder, the raw Cq-values were transformed into relative quantities-Q = E ΔCq , where E represents the average efficiency for each gene, ΔCq is the difference between the lowest quantification cycle (Cq-value) of a sample of a particular gene and the Cq-value of each sample in a dataset [41]. In geNorm, the expression stability value (M) was calculated based on the average of the pairwise variation (V) for a candidate RG with all other genes tested, the default limit M � 1.5. Genes with the lowest M-value have the most stable expression [21]. The average M of all genes together is then calculated by stepwise exclusion of the least stable gene until the two most stable genes in the remaining set cannot be ranked any further. Besides, geNorm also Table 2. Primer pair of the candidate reference genes and target genes used in this study. allows estimating the optimal number of RGs that must be used for normalization process. Normalization factor (NF) is calculated based on the geometric mean of the expression of the two most stable RGs and then the NF n+1 with the next most stable gene. To determine the number of genes to be used for accurate normalization, the pairwise variation (V n/n+1 ) was determined out of two sequential NFs (NF n and NF n+1 ) [21]. Vandesompele et al. proposed V � 0.15 as a cut-off, below which the inclusion of an additional RG is not required [21]. NormFinder calculates the stability value using mathematical modeling algorithm to consider the intra-and inter-group variation of the candidate RGs. The lower stability value represents the highest stability. The fundamental principle is that a stable RG should have minimal variation across experimental groups and subgroups [22].

Reference genes MAPK2
In BestKeeper, the raw Cq-values were used to calculate the Pearson correlation coefficient (r), which was obtained by the pairwise comparison between the BestKeeper index generated by the algorithm and the candidate RGs. Pearson correlation was determined as an indicator of expression stability, in which genes with higher r-value and P-value < 0.05 were more stable [23]. Samples with SD-value (standard deviation) > 1 were excluded from analysis [23]. Data from geNorm (M-values), NormFinder (stability values) and BestKeeper (r-value and SD) were used to generate rankings. The

Evaluation of target genes expression by qPCR
The expression pattern of three TGs (Table 1) Target) / E (ΔCq RG) , where E represents the average efficiency for each gene, ΔCq is the difference between mean Cq-value of a control sample and the mean Cq-value of treated sample. The REST bases its performance on pairwise comparisons (between RGs and TGs, control and treatment samples) using randomization and bootstrapping techniques (Pairwise Fixed Reallocation Randomization Test [42]. Hypothesis testing (P < 0.05) was used to determine whether the difference in expression between the control and treatment conditions was significant.

Reference genes (RGs) and target genes (TGs) qPCR amplification
Ten candidates RGs were selected across C. procera RNA-Seq data, evaluated by qPCR and used to study the transcriptional modulation of three TGs. Products of these genes were associated with known functions involved in basal or vital cellular processes ( Table 2). The specificity of PCR products was confirmed by the presence of a single amplicon with the expected size, with no amplicon visualized in non-template controls, as confirmed by 2% agarose gel electrophoresis (S2 Fig). The specificity of qPCR products was also confirmed by melting curves, each showing a single peak (S3 Fig).

Global analysis of expression stability
Considering all collection times together (global analysis) ( (Tables 3 and S2).
The geNorm algorithm showed M-value < 1.5 for all candidate RGs in all treatments (S2 Table). The four most stable RGs for NaCl-stressed leaves (Leaf 100 ) were CYP23, ACT, PPR, and r40S, while CYP23, UBP25, ACT104, and ACT were most stable for NaCl-stressed roots (both Root 50 and Root 200 ) ( Table 3). On the other hand, the less stable RGs were UBP25 and UBQ11 for Leaf 100 ; FBOX and TBB4 for Root 50 ; FBOX and TBB4 for Root 200 were the less stable RGs (S2 Table).
According to the NormFinder algorithm, the four most stable RGs were ACT, TBB4, PPR and r40S in Leaf 100 ; CYP23, UBP25, ACT104 and UBQ11 in Root 50 and UBP25, CYP23, ACT104 and r40S in Root 200 ( Table 3). The less stable RGs in Leaf 100 were UBQ11 and UBP25; for Root 50 were FBOX and TBB4, and for Root 200 were FBOX and TBB4 (S2 Table).
Although each software has its own statistical method to provide a stability rank, there is a certain degree of congruence among their results. In the current study, the congruence among geNorm, NormFinder, and BestKeeper is presented, concerning the four top-ranked RGs using all collect times together (global analysis) (Fig 2). For Leaf 100 samples, we observed 75%, 75% and 50% congruence between geNorm vs. NormFinder, NormFinder vs. BestKeeper and geNorm vs. BestKeeper, respectively (Fig 2A). In turn, we had congruence for root samples (Root 50 and Root 200 ) between geNorm vs. NormFinder, NormFinder vs. BestKeeper and geNorm vs. BestKeeper, corresponding to 75%, 75%, and 100%, respectively (Fig 2B and 2C).
The RGs choice to use in qPCR analysis was determined according to geNorm, which also provided high congruence between other softwares studied and presented the optimal number of RGs required for reliable normalization according to V-value � 0. 15

Analysis of the expression stability considering factorial time combination
We also evaluated expression stability of the RGs per factorial combination from all collection times for each experiment, totaling 15-time combinations (S2 Table). Comparing all 15-time combinations in each algorithm revealed that the four most stable RGs are not strictly preserved for Leaf 100 , Root 50, and Root 200 (Fig 3 and S2 Table). In this context, we averaged the congruence of all different collection times compared to global collection time ( Table). On the other hand, concerning to global time combinations for Root 50 , average congruence for geNorm, NormFinder and BestKeeper was 71%, 77%, and 77%, respectively (Fig 3D-3F and S2 Table). Moreover, we observed average congruence of 73%, 73%, and 79% for geNorm, NormFinder, and BestKeeper, respectively, concerning to global time combinations for Root 200 (Fig 3G-3I and S2 Table).
Isolating the first hours (30 min, 2 h, 8 h) for each experiment and their factorial combinations (totaling seven time combinations), revealed the more frequent RGs among the rank top four out of seven time combinations (Fig 4 and S2 Table). For Leaf 100 , the more frequent RG was PPR, according to geNorm and NormFinder. PPR, TBB4 and MAPK2, according to Best-Keeper (Fig 4A and S2 Table). For Root 50, the more frequent RGs was CYP23, according to geNorm; UBP25, according to NormFinder; CYP23, UBP25 and ACT104, according to Best-Keeper (Fig 4B and S2 Table). For Root 200 , the more frequent RGs were CYP23, UBP25 and PPR according to geNorm; ACT104 according to NormFinder; ACT104 and UBP25 according to BestKeeper (Fig 4C and S2 Table).

Reference genes of Calotropis procera under salt stress
In short-term salt stress (2 h), gene expression analysis via qPCR revealed that most target genes exhibited constitutive expression in both salt-stressed tissues (Leaf 100 and Root 200 ) ( Fig  5). The only exception was NAC78, which was up-regulated in Leaf 100 (Fig 5C). Interestingly, the gene expression modulation occurred, preferentially, at 8 h of salt-stress, with up-regulation in Leaf 100 (Fig 5A-5C) and down-regulation in Root 200 (Fig 5D-5F) of all TGs tested. On the other hand, in the last treatment times after salt stress (i.e., 45 days and 1 day, in Leaf 100 and Root 200 , respectively) we observed constitutive expression for three target genes. The exception occurred for ND1 at 1 day (in Root 200 ), in which the expression was down-regulated ( Fig 5D).
Additionally, we compared the qPCR/ Leaf 100 relative expression and Leaf 100 RNA-Seq data. Our results revealed that CNBL4, NAC78 qPCR results in Leaf 100 2 h, 8 h, and 45 d ( Fig  5B and 5C) were according to the respective RNA-Seq data ( Table 1). For ND1, the qPCR ( Fig  5A) and RNA-Seq (Table 1) gene expression results converged in the 2 h and 45 days treatments.

Discussion
The advent of high-throughput next-generation DNA sequencing (NGS) platforms has provided more comprehensive and maximized studies on diverse genomes, including non-model plant species [7,43,44]. At the same time, advances in RNA sequencing (RNA-Seq) methods have effectively aided in characterization and quantification of transcriptomes (even without a reference genome). They contributed to the understanding of genes expression regulation under different experimental conditions [6,45,46]. However, due to the existence of potential errors during the preparation, synthesis, sequencing and analysis of gene expression libraries (including RNA-Seq), a second method is required to validate the results indicated by the first. The qPCR is currently the most appropriated method for such purpose [12,47], and quality control measures are necessary to mitigate potential errors in qPCR results. Thus, the selection of suitable reference genes is a fundamental requisite. The use of inappropriate RGs may overestimate or underestimate the relative expression of the target genes and lead to [18].
In this study, transcripts of C. procera (RNA-Seq), identified statistically as constitutively expressed (considering log 2 FC and P-value), were used as a source for candidate reference genes screening. The expression levels and stability analysis of ten RGs were evaluated in leaf and root samples of C. procera under different salt concentrations (NaCl). Using geNorm, NormFinder and BestKeeper software allowed us to analyze the expression stability of RGs in salt concentrations individually and factorial of time combinations.
According to the Cq-value and stability expression analysis, discrepancies were observed among candidate RGs under all conditions studied (including different tissues, salt concentrations and collection time combinations), indicating the importance of studies on RGs stability under different experimental conditions. Although several works have reported the use of traditional RGs as suitable in qPCR assays [48][49][50], recent studies have shown expression stability for many of these genes may be affected in different plant species under experimental conditions [16,20]. These reports, consistent with our results, support the careful evaluation of candidate RGs under given experimental conditions [16,51]. The RGs stability rankings suggested by different softwares were not often entirely identical for the same experimental conditions, as distinct statistical algorithms and analytical procedures are applied [52]. Despite the high degree of similarities, we found less congruence between results of geNorm vs. BestKeeper, for Leaf 100 (30 min, 2 h, 8 h, and 45 days) (Fig 2A). Such a relative divergence between BestKeeper and other softwares was also reported by other authors. According to Zhang et al. [33], in an experiment conducted on Halostachys caspica under salt stress, 25% congruence between BestKeeper vs. geNorm and 100% between geNorm vs. NormFinder were found. Similarly, de Andrade et al. [53] found high correlation among geNorm vs. NormFinder. However, geNorm vs. BestKeeper showed the lowest correlation. In this context, the choice of RGs to use in the qPCR analysis was determined by geNorm and confirmed by other softwares. The geNorm is one of the most widely used for gene expression stability analysis, besides informing the optimal number of RGs necessary to validate the TGs [16,17].
In spite of RGs specificity for each time combination, we found CYP23, a Peptidyl-prolyl cis-trans isomerase involved in key processes of protein folding [54], as the most frequent among the four most stable RGs, considering all experimental conditions studied. Similarly, Singh et al. [32] found CYP as the most stable RGs for wounding, heat, methyl jasmonate and biotic stress, for different tissues and combined stress samples. Based on this scenario, CYP23 is a powerful RG candidate to be further tested on expression analysis of C. procera under different experimental conditions, especially under salinity.
Analyzing all time combinations on Leaf 100 experiment, we found ACT, a cytoskeletal protein associated with plant cell growth [55], as the most frequent RG among the four most stable RGs. Previously, actins were identified as stable RGs in salt, drought, cold and heat stress [31,52]. Furthermore, UBP25 was most frequent RG among the four most stable for Root 50 and Root 200 experiments. On the other hand, UBP25 was the less stable for Leaf 100 submitted to prolonged period of salinity (45 days). Interestingly, all time combinations containing time 45 days for Leaf 100 showed UBP25 as one of the less stable RGs, considering all softwares studied. However, UBP25 was among four most stable RGs for most of the first hours combinations (excluding 45 days) on Leaf 100 . UBP25 participates in ubiquitin-proteasome system (UPS) for maintenance of homeostasis and modulation of the stability proteins under salinity and other abiotic stresses [56,57], inducing the less stability under the high salt stress (45 days, Leaf 100 ) compared to the other candidate RGs.
The following target genes, related to salt-stress response, had their expression analyzed by RNA-Seq and qPCR: ND1 (NADH dehydrogenase subunit 1), which acts on the mitochondrial electron transport chain and is involved with rapid systemic signaling triggered by salinity and other abiotic stresses [34]; CNBL4 (Calcineurin B-like protein) involved on SOS pathway as calcium sensors, working in combination with kinases and ion channels to exclude cytosolic salt [35,36]; and NAC78 (NAC domain-containing protein 78-like) that belongs to NAC transcription factor family (NAC-TFs) involved in regulating plant growth, development processes and abiotic stress responses, including drought and salinity [37,38]. The qPCR data of ND1 (exception for 8 h treatment), CNBL4, and NAC78 for Leaf 100 experiment are in agreement with RNA-Seq expression results. The convergence of the results between these two approaches (that is, data validation) increases the robustness of our gene expression data, since the qPCR is considered a gold standard validation method for expression analysis. The up-regulation of the gene expression in response to salt stress as CNBL4, NAC78 and ND1 in leaf tissue, contribute to the establishment in Calotropis procera to high salinity adverse environments.
Regarding the root expression of target genes (qPCR / Root 200 experiment), they were not up-regulated in any of the treatments (showing up down-regulation or constitutive expression). When compared to Leaf 100 experiment results, this suggests: CNBL4, NAC78 and ND1 participate, more actively, in leaf response to salt stress (that is, tissue-specific transcriptional modulation); and /or the transcriptional modulation of the referred targets is dependent on the NaCl concentration. To determine the cause associated with those results, further inquiries are required. However, this gene sample already suggests the complexity of the molecular physiology of C. procera under stress, highlighting the capacity of adaptation of its transcriptome to different conditions and/or to the demand of different organs.
Our study provides, powerful background about ten candidate RGs for the first time, which can be used in C. procera studies under salt stress and can provide great potential to be tested in other experimental conditions. We indicate the most reliable RGs for 15-time combinations under three different experimental conditions, including two plant tissues and three NaCl concentrations. The CYP23 is a powerful RG candidate for expression normalization of C. procera under different experimental conditions. In addition, UBP25 should be avoided as RG for longlasting salt stress in C. procera's leaf. Finally, our findings emphasize the need for caution when evaluating the RGs stability in a set of samples under high amplitude of variant factors. The use of more than one software supported a reliable way to select the best RGs to validate TGs on qPCR.