Identification of differentially expressed genes in sunflower (Helianthus annuus) leaves and roots under drought stress by RNA sequencing

Sunflower is recognized as one of the most important oil plants with strong tolerance to drought in the world. In order to study the response mechanisms of sunflower plants to drought stress, gene expression profiling using high throughput sequencing was performed for seedling leaves and roots (sunflower inbred line R5) after 24 h of drought stress (15% PEG 6000). The transcriptome assembled using sequences of 12 samples was used as a reference. 805 and 198 genes were identified that were differentially expressed in leaves and roots, respectively. Another 71 genes were differentially expressed in both organs, in which more genes were up-regulated than down-regulated. In agreement with results obtained for other crops or from previous sunflower studies, we also observed that nine genes may be associated with the response of sunflower to drought. The results of this study may provide new information regarding the sunflower drought response, as well as add to the number of known genes associated with drought tolerance.


Background
Drought is one of the most serious ecological problems challenging mankind. Statistically, of all natural insults, drought stress has the highest impact on crop yields worldwide and its impact may equal the total impact of all other abiotic stresses combined (Chen et al. 2009). Due to the magnitude of drought's influence, improving crop drought tolerance is an urgent priority. Modern molecular biological technologies have greatly facilitated elucidation of plant response mechanisms to water stress, which in turn are relevant to understanding the genetic basis of drought tolerance of plants. Ultimately, identification of drought tolerance genes should inform breeding strategies for eventual creation of drought resistant sunflower hybrids.
Sunflower (Helianthus annuus) is the fourth largest source of vegetable oil in the world. In 2016, the global planting area of this crop was greater than 24,970,000 hm 2 and total output approached 45,750,000 tons (the Food and Agriculture Organization of the United Nations, FAO). China is one of the most important producers and consumers of sunflower in the world. Since sunflower plants are somewhat tolerant to drought, poor soil conditions and salt, sunflower is suitable for cultivation in most areas of Heilongjiang Province. Consequently, improved sunflower drought tolerance would allow cultivation of this crop across an even larger area than currently utilized, extending to the mid-western arid and semi-arid middle-and low-yielding fields of this province. By first studying sunflower varieties most resistant to drought, with characterization of drought resistance genes and molecular mechanisms underlying drought tolerance, improved drought-tolerant sunflower might then be effectively developed.
In recent years, researchers have paid increasing attention to the molecular biology underlying drought tolerance mechanisms in order to improve sunflower drought tolerance. For example, Liu et al. (2003) analyzed the expression of five drought-induced genes in roots, stems and leaves of sunflower using differential display PCR and real-time quantitative PCR. Using a different approach, Diaz-Martin et al. (2005) cloned a drought-related DREB transcription factor, HaDREB2, in sunflower using gene interaction analyses. Meanwhile, Roche et al. (2007) analyzed the expression of genes associated with metabolism and signal transduction in leaves and immature embryos of sunflower using a cDNA array. Subsequently, 409 differentially expressed genes (DEGs) were identified, of which 82 were organ specific and induced by drought stress. In addition, Kane and Risesberg (2007) identified candidate sunflower drought and salt tolerance genes using selective screening and discovered 17 genes that were induced by each of these stresses. Subsequently, Cheng et al. (2009) transformed the drought and salt resistance gene P5CS into sunflower and obtained six transformed buds which were resistant to Kan, thus showing that successful gene transfer and expression had occurred; however, no fertile transgenic plants were obtained. Soon thereafter, Sauca et al. (2011) introduced the same drought resistance gene into cultivated sunflower inbred lines using an embryo rescue technique. More recently, Yi et al. (2013) demonstrated that a betaine aldehyde dehydrogenase (BADH) gene is induced by drought and salt stress in sunflower.
Moreover, due to the lack of a complete sunflower transcriptome, the study of molecular mechanisms underlying drought resistance has mainly focused on several independent genes instead of focusing on broader stressinduced genetic response networks. Because tolerance to drought is a complex phenotypic trait controlled by polygenes, it would not be useful to research any single gene or single class of genes to understand molecular mechanisms underlying drought tolerance. Fortunately, transcriptome profiling and digital gene expression (DGE) analysis using high throughput sequence technology have facilitated discovery of DEGs under drought stress without relying on genome sequence information. Although annotation of DEGs is challenging without a genome sequence, the information acquired from DGE profiling could nonetheless describe the stress response mechanism to a certain extent.
In this study, DGE profiling of leaves and roots of sunflower (inbred line R5) in response to drought stress was conducted using Illumina high throughput sequencing technology. The DEGs that were subsequently described here should lay the foundation for research of molecular mechanisms of sunflower drought tolerance. Moreover cloning and expression analysis of DEGs identified in this study may provide several candidate genes and information to improve sunflower drought stress tolerance using genetic engineering technology.

Plant growth and stress treatments
The sunflower line R5 was used in this study. R5 seeds were provided by the Industrial Crops Institute of Heilongjiang Academy of Agricultural Sciences. Sunflower seeds were rinsed with sterilized distilled water three times and allowed to germinate in Petri dishes for 72 h at 28 °C in darkness. The germinated seeds were transferred into pots containing nutrients, soil and vermiculite (1:1). Seedlings were grown in an artificial climate chamber with 16 h light and 8 h darkness (22 °C) with 70% relative humidity. Seedlings at the four-leaf stage showing appropriate growth status were rinsed with sterilized distilled water and placed into culture bottles filled with 15% PEG in water for 24 h to serve as drought stressed plants. Control plants (distilled water only) were grown in parallel and collected at the same time points. The process was repeated in triplicate, where each replicate was biologically and temporally independent.

Sample preparation and RNA isolation
The leaves and roots of drought-stressed and control groups were harvested for gene expression analysis. The tissue samples were immediately frozen in liquid nitrogen and stored at − 80 °C. Total RNA was isolated separately from the leaves and roots of sunflower seedlings using TRIzol Reagent (Tiangen, Beijing, China) following the manufacturer's protocols. Yield and quality of the RNA samples were determined using a NANODROP 2000 spectrophotometer (Thermo Fisher Scientific, USA). Since no genome sequence has yet been generated for sunflower, a portion of the total RNA from each of 12 samples studied here was pooled for construction of a library for use as a reference to validate transcriptome sequencing results presented here.

Library preparation and sequencing
Total RNA preparations from 13 samples (including one pool of all samples and three samples from each of four sample types: leaves from PEG-treated plants, roots from PEG-treated plants, leaves from control plants, roots from control plants) were used for mRNA isolation and concentration using magnetic oligo (dT) beads. The mRNA was broken into short segments by adding fragmentation buffer. With mRNA as the template, first-strand cDNA was synthesized by adding random hexamers and second-strand cDNA was synthesized by adding buffer, dNTPs, DNA polymerase I and RNase H. Double-stranded cDNA was purified using AMPure XP beads. The ends of purified double-stranded cDNA were repaired, A-tails were added and overlapping sequences were joined. Next, AMPure XP beads were used for fragment size selection. Finally, the cDNA libraries were constructed using PCR amplification and products were purified using AMPure XP beads. Qubit 2.0 was used to perform a preliminary quantitation of the cDNA library. Each library was diluted to 1.5 ng/µL and an Agilent 2100 Bioanalyzer was used to analyze the insert size range of the cDNA library. Quantitative PCR (qPCR) was used to analyze the effective concentration of each library. In order to ensure the quality of each library, the effective concentration of library must be > 2 nM. According to the effective concentration and data quantification, the various libraries were pooled and sequenced using Illumina HiSeq/MiSeq.

Sequencing data analysis
In order to ensure the quality of downstream analyses, the raw reads obtained from sequencing were filtered. Clean reads were obtained after removing from the raw sequence data the reads containing only adapters, reads with a content of N > 10% and low quality reads. All downstream analyses were based on clean reads with high quality.
Trinity software (v2012-10-05) was used for splicing of transcripts (Grabherr et al. 2011). The databases used for gene annotation in this study included: nr (NCBI non-redundant protein sequences), nt (NCBI nucleotide sequences), Pfam (Protein family), KOG/COG (euKaryotic Ortholog Groups/Clusters of Orthologous Groups of proteins), Swiss-Prot (a manually annotated and reviewed protein sequence database), KEGG (Kyoto Encyclopedia of Genes and Genomes) an GO (Gene Ontology). MISA (1.0) software was used to detect simple sequence repeats (SSRs) in the unigenes and Primer 3 (2.3.5) was used to design primers to characterize SSR markers. Clean DEG sequence reads were mapped to the transcripts by RSEM (Li and Dewey 2011) and were used to obtain the read count for each sample that mapped to each gene. Finally, the read count number was converted to FPKM (expected number of Fragments Per Kilobase of transcript sequence per Millions base pairs sequenced) (Trapnell et al. 2010). DESeq (Anders et al. 2010) was used to analyze differential gene expression. The screening threshold was padj < 0.05. This method was based on a negative binomial distribution model; the read count of gene i in the sample of j was designated K ij . The P values were adjusted using the Benjamini and Hochberg method. The corrected P value of 0.005 and log2 (fold change) of 1 were set as thresholds for significance for scoring of differential expression. The method of GO enrichment analysis in this study was GOseq (Young et al. 2010). KOBAS (2.0) software was used to test the statistical enrichment of differential expression of genes in the KEGG pathway (Mao et al. 2005).

Quantitative real-time PCR (qRT-PCR) analysis
Five genes (comp9755_c0, comp11967_c0, comp29815_ c0, comp36138_c0 and comp72325_c0) which have been identified may be related to drought stress response were validated by quantitative real-time PCR (qRT-PCR) using the same RNA samples as in the DGE library construction. Leaves and roots were removed from the freezer and ground in liquid nitrogen. Six pairs of gene-specific primers (five candidate genes and one actin gene) were designed based on target gene sequences using Primer Express software (Additional file 1: Table S1). Firststrand cDNA synthesis and qRT-PCR were carried out using a PrimeScript ™ RT reagent Kit with gDNA Eraser (TaKaRa), SYBR Premix Ex Taq ™ II (Tli RNaseH Plus) and ROX plus (TaKaRa). Each PCR (18 µL) contained 10 µL 2 × Real-Time PCR Mix, 5 µM of each primer and diluted cDNA. The thermal cycling conditions were 95 °C for 30 s followed by 40 cycles of 5 s at 95 °C and 40 s at 60 °C. All reactions were performed in triplicate, including the non-template controls. The relative expression levels were calculated as 2 −(△Ct of treatment − △Ct of control) .

Results of sunflower transcriptome sequencing
To identify genes that are differentially expressed in response to drought stress, 12 samples [leaves of plants treated with PEG (three samples), roots of plants treated with PEG (three samples), leaves of control plants (three samples) and roots of control plants (three samples)] were mixed for the construction of a library for transcriptome sequencing to serve as a reference DGE sequence library. In this study, a total of 62,252 unigenes were obtained by merging overlapping transcript fragments for each locus. By analyzing the length of transcripts and unigenes, we found that the numbers of unigenes of lengths of 200-500 bp, 0.5-1 kb, 1-2 kb or greater than 2 kb were 37,645, 12,190, 8714 and 3703, respectively. The average length of all gene fragments was 390 bp. All unigenes were annotated using NR, NT, KO, Swiss-Proto, PFAM, GO, KOG databases. The results demonstrated that 39,356 unigenes could be annotated using at least one database, accounting for 62.33% of the total unigenes observed. Of these, 4152 unigenes could be annotated using all seven databases, which accounted for 6.66% of the total number of unigenes (Table 1), the detailed results of annotation is listed in Additional file 2: Table  S2. After annotation using the GO database, successfully annotated genes could be categorized into 47 sub-groups from three main categories (biological process, cellular component and molecular function) (Additional file 3: Table S3). The results of this study should provide abundant gene resources for future studies focusing on the molecular biology of sunflower.
In this study, 4,283,598 bp nucleotides belonging to 62,252 unigenes were analyzed for SSR calling. Subsequently, 8808 SSR sequences were found in 7441 unigenes of which 1120 unigenes contained more than one SSR sequence and 507 SSR sequences were present in compound formations. The primers of these SSR sequences are listed in Additional file 4: Table S4.

Identification of DEGs in sunflower leaves and roots in response to drought stress
The results of correlation analysis of three replicates for each treatment showed that the correlation index between the replicates ranged from 0.786 to 0.920 (Fig. 1), which indicated that the sequencing data were reliable and could be used for further analysis.
In this study, 876 and 269 genes were differentially expressed between control and drought-stressed plants in their leaves and roots, respectively. Next, gene expression profiles were compared between leaves and roots under drought stress. Under drought stress, 805 and 198 DEGs were identified to be leaf-specific and rootspecific, respectively, and 71 of these genes were differentially expressed both in leaves and roots. Among these DEGs, 616 and 172 genes were up-regulated during drought stress in leaves and roots, respectively, while 260 and 97 genes were down-regulated in leaves and roots, respectively (Fig. 2). Of the 616 up-regulated genes in leaves, 26 and 415 genes could be annotated in sunflower and other plant species, respectively, while 175 genes could not be annotated. Of the 260 down-regulated genes in leaves, 14 and 191 genes could be annotated in sunflower and other plant species, respectively. In the 172 up-regulated genes in roots, 5 and 110 genes could be annotated in sunflower and other plant species, respectively, while 57 genes could not be annotated. Of the 97 down-regulated genes in roots, three and 60 genes could be annotated in sunflower and other plant species, respectively, while 34 genes could not be annotated.
In response to drought, both leaves and roots showed a greater number of up-regulated genes than down-regulated genes. Some stress-induced genes were regulated in a tissue-specific manner, which indicates that these genes may play roles as part of distinct mechanisms for coping with drought stress. Another 71 genes exhibited the same pattern of expression in leaves versus roots, indicating that overlaps exist at the transcriptional level between leaves and roots in their response to drought stress.

Functional categorization of stress-regulated genes
Differentially expressed genes from leaves and roots were annotated using the GO database. Clearly annotated genes belonged to three main categories (cellular component, molecular function and biological process) and the distributions of gene numbers for each category were shown in Fig. 3. Notably, the "cellular component" category was not found within the top 15 categories in leaves.
In this study, we focused on a group of genes known to be involved in stress responses (Table 2). Of 17 genes involved in responses to abiotic stimuli, nine genes are known to be involved to responses to water stimuli and 16 genes were differentially expressed in leaves; only one gene, designated Comp11967_c0, was differentially expressed in both leaves and roots. We also found that each gene responded to more than one abiotic stress. This result indicated that relatively conserved mechanisms may be shared by sunflower responses to various abiotic stresses.
To identify the biological pathways represented within the DGE libraries, all annotated genes were mapped using the KEGG database to identify significantly enriched genes involved in metabolic or signal transduction pathways. Subsequently, 409 and 119 DEGs were assigned to 105 and 47 KEGG pathways in leaves and roots for drought, respectively. The top 20 pathways among all of the KEGG annotations of leaves and roots are listed in Table 3. The top KEGG pathways which contained the highest number of DEGs in leaves and roots were metabolic pathways, biosynthesis of secondary metabolites, plant hormone signal transduction, ribosome and RNA transport.

Validation of DGE data by qRT-PCR
To evaluate the validity of DGE sequencing and to further analyze the patterns of differential gene expression, five genes were selected for detection using real-time PCR with gene-specific primers (Additional file 1: Table S1) and the data were compared to DEG sequence data. The results demonstrate that both in leaves and roots, four genes, comp9755_c0, comp11967_c0, comp29815_c0, comp36138_c0, were up-regulated after PEG treatment, but the expression of comp72325_c0 was significantly decreased during drought stress. The results of qRT-PCR are consistent with the trend observed for DEG sequencing (Fig. 4). Comparison of data obtained from DEG sequencing analysis methods to data from qRT-PCR confirmed the reliability of the DEG sequencing method.

Discussion
Numerous studies have shown that adaptation mechanisms of plants in response to drought make up a complex metabolic regulatory network based on the expression of many genes. Although response mechanisms to various stresses are mainly conserved, specific plants often possess some of their own unique response mechanisms. Therefore, expression profiling studies of sunflower at the transcriptional level may benefit from knowledge of molecular drought tolerance mechanisms for other plants.
Before the large-scale application of high-throughput sequencing, cDNA microarrays were the traditional method for conducting gene expression profiling. Using cDNA microarrays, a number of drought response genes have been identified in Arabidopsis thaliana (Shinozaki et al. 2007), rice (Wang et al. 2007), maize (Zheng et al. 2006) and other crops. However, accurate expression profiling could only be conducted using a clear genome sequence to guide comparison of samples with identical genetic backgrounds for various conditions. Since no complete genome sequence exists for sunflower, cDNA microarray analysis could only be performed using the genome sequence of another crop. However, differing Fig. 1 Pearson correlation between samples. C represents the control group; P represents the PEG-treated group. R 2 represents the correlation coefficient. The darker the blue background, the greater the correlation coefficient genetic backgrounds between sunflower and reference crops often caused high analytical uncertainty, generating erroneous results. In contrast, the sequences of unigenes obtained using high-throughput sequencing reflected objective changes in actual gene expression, but later annotation of DEGs was limited by the number of available reference sequences. The results of this study showed that DGE sequencing can achieve near complete coverage of the entire genome sequence with reliable and reproducible sequencing results under various conditions. However, we also found that the lack of genome sequence information may limit gene annotation, subsequently influencing in-depth mining and selection of candidate genes. Of the total 1074 DEGs identified in this study, nine unigenes were associated with water stress, as demonstrated using annotation through sequence alignment. These DEGs included three dehydrin-like genes, one NAC transcription factor, two galactosidase family genes, one aldehyde dehydrogenase family gene, one CBL-interacting serine/threonine-protein kinase gene and one glycosyltransferase gene. Many researchers have observed a positive correlation between the expression and accumulation of plant dehydrin genes and plant resistance to abiotic stress (Hu et al. 2010). Meanwhile, the NAC transcription factor (Fang et al. 2015), galactosidase family gene (Santos et al. 2015), aldehyde dehydrogenase family gene (Singh et al. 2013), calcineurin-B-like (CBL)-interacting serine/threonine protein kinase gene (Chen et al. 2012) and glycosyltransferase gene (Tognetti et al. 2010) have all been shown to play important roles in response to abiotic and biotic stresses. Although 321 DEGs identified in this work could not be annotated, this study provides a rich set of candidate genes for further study of sunflower responses to drought stress. Although the functions and molecular mechanisms of these stress genes have not yet been fully elucidated, powerful genetic engineering methodologies, such as RNA interference, gene overexpression and transgenic technologies hold promise for defining their roles in sunflower drought tolerance.

Conclusions
A total of 62,252 unigenes were obtained by transcript profiling sequencing in this study. Under drought stress, 876 and 269 DEGs were identified by DGE sequencing in sunflower leaves and roots, respectively. A greater number of up-regulated genes than down-regulated genes were identified in both leaves and roots. By analyzing and annotating DEGs, we found 17 genes that play roles in sunflower response to abiotic stimuli and thus may have relevance to drought tolerance. Of these, nine genes may be associated with responses to water-related stimuli. To our knowledge, this study is the first high throughput sequencing study for gene expression profiling analysis of sunflower under drought stress. The results of this study demonstrate that the drought stress response in sunflower leaves and roots comprises a complex mechanism involving multiple metabolic pathways. These findings have enriched our understanding of sunflower drought tolerance and provide many candidate genes associated with drought resistance for future development of sunflower varieties better suited to growth in marginal lands.  Comparison of the expression profile of comp11967_c0 in leaves and roots determined by qRT-PCR and DEG sequencing, respectively. C, c Comparison of the expression profile of comp29815_c0 in leaves and roots determined by qRT-PCR and DEG sequencing, respectively. D, d Comparison of the expression profile of comp36138_c0 in leaves and roots determined by qRT-PCR and DEG sequencing, respectively. E, e Comparison of the expression profile of comp72325_c0 in leaves and roots determined by qRT-PCR and DEG sequencing., respectively