The RNA ‐ Seq ‐ based high resolution gene expression atlas of chickpea ( Cicer arietinum L.) reveals dynamic spatio ‐ temporal changes associated with growth and development

Chickpea is one of the world's largest cultivated food legumes and is an excellent source of high ‐ quality protein to the human diet. Plant growth and development are controlled by programmed expression of a suite of genes at the given time, stage, and tissue. Understanding how the underlying genome sequence translates into specific plant phenotypes at key developmental stages, information on gene expression patterns is crucial. Here, we present a comprehensive Cicer arietinum Gene Expression Atlas (CaGEA) across different plant developmental stages and organs covering the entire life cycle of chickpea. One of the widely used drought tolerant cultivars, ICC 4958 has been used to generate RNA ‐ Seq data from 27 samples at 5 major developmental stages of the plant. A total of 816 million raw reads were generated and of these, 794 million filtered reads after quality control (QC) were subjected to downstream analysis. A total of 15,947 unique number of differentially expressed genes across different pairwise tissue combinations were identified. Significant differences in gene expression patterns contributing in the process of flowering, nodulation, and seed and root development were inferred in this study. Furthermore, differentially expressed candidate genes from “ QTL ‐ hotspot ” region associated with drought stress response in chickpea were validated.

During last five decades, the area under chickpea cultivation and the productivity continue to remain the same in spite of its important role in human health. Chickpea production is far below the present demand and has not achieved its potential yield owing to major constraints such as several abiotic (drought, heat, salinity, etc.) and biotic (ascochyta blight, fusarium wilt, etc.) stresses. Seed morphology in chickpea is of two types: kabuli (bold pale brown) and desi (with small dark brown) seeds. Recently reported draft genome assemblies of both the chickpea types, kabuli (CDC Frontier; Varshney et al., 2013) and desi (ICC 4958;Jain et al., 2013) provide a powerful resource for chickpea functional genomics research. Furthermore, genomic and transcriptomic resources were also rapidly developed in chickpea (Agarwal et al., 2016;Hiremath et al., 2011;Kudapa et al., 2014;Nayak et al., 2010;Thudi et al., 2011;Varshney et al., 2009). These resources have been widely used in chickpea research to investigate tolerance/resistance to abiotic (Kale et al., 2015;Pushpavalli et al., 2015;Thudi et al., 2014;Varshney et al., 2014) and biotic (Sabbavarapu et al., 2013) stresses. As part of the effort to better understand legume biology, several transcriptomic studies have been carried out on different tissues under a variety of experimental/stress conditions in chickpea and identified hundreds of differentially expressed genes (Garg, Patel, Tyagi, & Jain, 2011;Hiremath et al., 2011;Kaashyap et al., 2018;Kudapa et al., 2014;Varshney et al., 2009).
Understanding of gene expression has changed dramatically with the evolution of new technologies. High-throughput technologies such as microarrays and next-generation sequencing have been comprehensively applied to generate large amounts of genome-wide gene expression data and understand key genetic bottlenecks that limit chickpea yield under different stress conditions Hiremath et al., 2011;Kudapa et al., 2014;Varshney et al., 2009).
However, no single study has yet brought together genome-wide data on gene expression profiles for all major plant organs at different developmental stages of chickpea, leaving a large gap between the genome sequence and phenotype. To understand how the underlying genome sequence results in specific plant phenotypes, information on gene expression patterns is crucial. Gene expression atlases could predict cluster of genes expressed in each tissue at different plant developmental stages. Technologies such as Affymetrix GeneChip (Benedito et al., 2008) and RNA-Seq (O'Rourke et al., 2014;Severin et al., 2010) have been employed in investigating gene expression atlas in several crop species. In legumes, gene expression atlas has been developed for model legumes, Medicago (Benedito et al., 2008) and Lotus (Verdier et al., 2013), and recently in crop legumes where genome sequence is available, for example, in soybean (Libault et al., 2010;Severin et al., 2010), common bean (O'Rourke et al., 2014), pea (Alves-Carvalho et al., 2015), cowpea (Yao et al., 2016), peanut (Clevenger, Chu, Scheffler, & Ozias-Akins, 2016), and pigeonpea (Pazhamala et al., 2017). Expression profiling and systematic analysis of annotated patterns of gene expression will complement genomic information and help predict a gene's function.
In the present study, C. arietinum (chickpea) Gene Expression Atlas (CaGEA) has been developed from 27 chickpea tissues across five developmental stages, namely, germination, seedling, vegetative, reproductive, and senescence, of a chickpea breeding cultivar, ICC 4958. The genotype, ICC 4958, used in the present study is one of the most drought tolerant and widely used cultivar in chickpea breeding programme. The atlas data generated utilizing RNA-Seq technology were analysed to disclose expression profiles of all expressed genes in different tissues/stages of chickpea. Differential gene expressions that are accompanied by changes in the expression of key regulatory genes such as transcription factors were examined.
The gene expression atlas of chickpea would not only provide a global view of gene expression patterns in all major tissues and organ systems but also will serve as valuable resource for functional genomics and accelerate gene discovery in legumes.

| Plant growth and sample collection
Drought tolerant chickpea breeding cultivar "ICC 4958" was used in the present study. Plant tissue samples were collected at five major developmental stages of the plant, namely, germination and seedling, vegetative, reproductive, and senescence. These samples were grouped into four sets for further analysis based on their developmental stages. Tissues from germination (three tissues) and seedling (two tissues) stages were grouped under Set I (total of five tissues); vegetative to Set II (four tissues); reproductive to Set III (nine tissues); and senescence to Set IV (nine tissues; Figure 1). For germination stage, the surface sterilized seeds were soaked in distilled water for 6 hr and were allowed to germinate in petriplates using Whatman No. 1 filter paper soaked with 10 ml of distilled water for 24 to 48 hr. Three petriplates with 10 seeds per replicate were used for each sampling. For seedling stage, the seeds were sown in paper cups and allowed to grow for 8-10 days after germination (DAG) under glasshouse conditions. Seeds were sown in pots and maintained in glasshouse for vegetative (25-30 DAG), reproductive (40-50 DAG), and senescence (90-110 DAG) stages. Three biological replicates were maintained at each stage for sample/tissue collection. Biological replicates were constituted by at least three plants sown at different intervals and located randomly in the glasshouse. A total of 27 samples (Table 1) from all stages (Set I-IV) were targeted for collection in three biological replications planted on separate days. After harvest, tissues were washed thoroughly with 0.1% DEPC water, flash frozen in liquid nitrogen, and stored at −80°C for RNA extraction.

| RNA extraction and cDNA library construction
The total RNA was extracted from the harvested tissues using Nucleospin RNA plant kit (Macherey-Nagel, Duren, Germany) as per manufacturer's protocol (http://www.macherey-nagel.com/). RNA was purified using the RNeasy MinElute cleanup kit (Qiagen, www. Qiagen.com) according to manufacturer's instructions. Purified RNA was quantified using a NanoDrop ND-100 spectrophotometer (Thermo Scientific), and its integrity was evaluated, using Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA). The assessment of RNA integrity is a critical first step in obtaining meaningful gene expression data. Equal amounts of RNA samples from three independent biological replicates (of each tissue) having RNA Integrity Number (algorithm for assigning integrity values to RNA quality measurement from Agilent Technologies) value ≥8 were pooled prior to library preparation and subsequent sequencing. The cDNA libraries were prepared using Illumina TruSeq RNA library protocol outlined in "TruSeq RNA Sample Preparation Guide" (Part no. 15008136).

| Illumina sequencing and data preprocessing
Paired-end sequencing was performed in two sets: a set of 22 samples (nos.1 to 22) were sequenced using Illumina HiSeq 2000 at Genotypic Technology Pvt. Ltd., India, and a second set of 5 samples (nos. 23 to 27) were sequenced in-house (Sequencing and Informatics Services Unit, ICRISAT) using Illumina HiSeq 2500 sequencing system. The raw reads generated from all 27 samples were subjected to quality filtering using NGS-QCBOX (Katta, Khan, Doddamani, Thudi, & Varshney, 2015) and Trimmomatic v0.35 (Bolger, Lohse, & Usadel, 2014). The low quality reads (Phred score < 20; read length < 50 bases) and reads with adapter contamination were removed to generate a set of high quality reads termed as clean data thereafter. The complete downstream analysis was based on clean data.

| Global gene expression analysis, clustering, and visualization
The RNA-Seq data were analysed using Tuxedo pipeline (Trapnell et al., 2012), an open source pipeline. The filtered reads from all samples were aligned on the chickpea genome  using Tophat v2.1.0 (Kim et al., 2013) with default parameters. The resulting alignment reads from each sample were then used to create a RABT (Reference Annotation Based Transcript) assembly using Cufflinks (v 2.1.1) (Trapnell et al., 2010). The assemblies generated were then merged into one consensus assembly using cuffmerge which was further used to quantify transcript abundance. Transcript abundance was estimated based on fragments per kilobase of transcript per million mapped reads (FPKM). Transcripts with abundance >1 FPKM and quantification status as "OK" were considered for further analysis. Identification of tissue-specific genes was performed on genes with FPKM ≥1 in at least one of 27 samples, by calculating tissue-specificity index (τ; Yanai et al., 2005) using the equation: where N is the number of samples and x i is the expression value of a gene normalized by maximum value across all samples. The value of τ range from 0 to 1, where higher the value more likely the gene is specifically expressed in that stage. For this study, genes with τ = 1 were considered tissue-specific.
For the analysis of the global gene expression patterns, hierarchical clustering was plotted using the "pheatmap" function by R software. Genes with FPKM >1 were log2 + 1 transformed and hierarchical clustering was performed. Samples were further clustered based on their pair wise correlations.

| Differential gene expression analysis
Identification of differentially expressed genes (DEGs) was done using cuffdiff (Trapnell et al., 2010). The DEGs with log2 fold change ≥2 (induced) and/ or ≤ −2 (repressed) and an FPKM ≥2 for either of the sample in each pair wise comparison were considered to be significantly differentially expressed. Log2 transformed FPKM values of the DEGs were further subjected to K-means clustering using Pearson correlation with an optimal number of clusters set at 20 in "pheatmap" of R software.

| Gene annotations, (gene ontology) GO term, and pathway analysis
The identified DEGs were subjected to Blastx similarity (E-value cut-

| Quantitative real-time polymerase chain reaction
Slow drought was imposed on ICC 4958 under greenhouse conditions as described by Ray and Sinclair (1998 (Rozen & Skaletsky, 2000). PCR was carried out as described by Mir et al. (2014). The data from different PCR runs or cDNA samples were compared by using the mean of the Ct values of the three biological replicates that were normalized to the mean Ct values of the endogenous gene. The relative expression ratios were determined using the 2 −ΔΔCt method and Student's t test was used to calculate significance (Livak & Schmittgen, 2001). Relative transcription levels are presented graphically.

| Reference guided assembly
The mapped reads (755 M) from all the 27 tissue samples (representing five developmental stages) were used to generate reference guided assembly, global gene expression profiles, and differential gene expression profiles. A reference-guided assembly of the whole dataset was generated using Cufflinks-Cuffmerge pipeline. This assembly generated 105,609 transcripts representing 32,873 gene loci. This gene loci number is considerably higher than the number of genes reported (28,269) in the chickpea genome assembly by Varshney et al. (2013). The comparison of the reference-guided assembly with the chickpea genome led to the identification of 4,604 (14%) novel genes that are not annotated in the genome. The results from reference-guided assembly indicate the potential of RNA-Seq in identification of novel genes.  (154). The lowest number of tissue-specific genes was present in Ger_Plumule (7), Sen_Leaf-yellow (7), Sen_Leaf (8), and so forth.

| Global gene expression patterns
These tissue-specific genes represented significant information for targeted gene expression and provided insights into the specialized process occurring in these tissues. In order to identify the tissuespecific biological processes, GO enrichment analysis on tissue-specific genes was carried out. As expected, over-representation of predictable specialized processes in different plant tissues such as "pollination" and "multi cellular organism development" in Rep_Buds, "reproduction" and "cell wall" in Rep_Pods, "transport" and "kinase activity" in Sen_Root, and "cell differentiation" in Veg_Root was observed. The total genes expressed were grouped into three categories based on expression in 27 samples. These include genes with low/moderate, high, and tissue-specific expression (Table 2).

| Differential gene expression
Pairwise analysis identified DEGs between any two combinations of the tissues at the same or different developmental stage studied. A total of 15,947 unique genes were identified to be significantly differentially expressed with ≤ -2 and ≥ 2 fold change in any of the combinations studied (Supporting Information- The pairwise combinations showing the highest and lowest numbers of induced genes also showed the highest and lowest numbers of repressed genes (Figure 3). In addition, unique set of genes exhibiting expression when compared with similar/related tissues across different developmental stages were identified. For example, Ger_Plumule

| Expression trends across the different developmental stages and identification of tissuespecific genes
To study clusters of genes with similar expression trends across differ-   namely, 5, 6, 7, 9, 12, 13, 14, 17, and 18 did not exhibit any unique gene expression patterns across tissues. Overall, cluster analysis and selected clusters showing distinct preferential gene expression patterns with respect to specific tissues is depicted in Figure 6.

| Pathways triggered during plant growth and development
Pathway analysis was performed to investigate pathways activated during chickpea growth and developmental process. A total of 121 pathways representing 640 enzymes involved in biosynthesis, metabolism, signalling, and cell differentiation were found to be activated.

| Expression trends in the drought "QTL-hotspot" region
To investigate the overall patterns of gene expression in the "QTLhotspot" region for drought tolerance present on Ca4 pseudomolecule in chickpea, 26 genes (15 "QTL-hotspot_a" and 11 "QTL-hotspot_b") identified in the region from a different study were considered for analysis (Kale et al., 2015). Major QTLs in this region were identified for 11 traits such as root length density (cm/cm 3 ), root dry weight/ total plant dry weight ratio (%), shoot dry weight (g), plant height (cm), primary branches, days to 50% flowering, days to maturity, pods/plant, 100-seed weight (g), harvest index (%), and delta carbon ratio. Gene expression patterns of the 26 candidate genes (identified by Kale et al., 2015) in select tissues specific for the traits of interest were analysed to ensure the role of key genes responsible for regulation of drought tolerance in chickpea. These tissues include all root, flower, and seed tissues (Ger_Radicle, Sed_Primary Root, Veg_Root, Rep_Root, Rep_Buds, Rep_Flower, Rep_Immature seeds, Rep_Pods, Sen_Root, Sen_Immature seeds, Sen_Mature seeds) at different developmental stages. Of the 26 genes described above, 17 genes (Ca_04550,Ca_04551,Ca_04552,Ca_04555,Ca_04556,Ca_04557,Ca_04558,Ca_04560,Ca_04561,Ca_04563,Ca_04564,Ca_04566,Ca_04567,Ca_04568,Ca_04569,Ca_04570,and Ca_04571) showed differential gene expression in tissues studied across five developmental stages. Majority of these genes were found to be induced in root, flower, and seed tissues. Kale et al. (2015) identified a subset of 12 common genes out of 26 genes using two different analysis, namely, high density QTL mapping and gene enrichment analysis.
In the present study, nine genes (Ca_04561,Ca_04563,Ca_04564,Ca_04566,Ca_04567,Ca_04568,Ca_04569,Ca_04570, and Of the nine genes, three genes (Ca_04561, Ca_04566, and Ca_04568) were common for all the tissues studied. Upregulation of these genes in selected tissues (mentioned above) which were specific for drought tolerance related traits revealed the important role of key candidates in flower, root, and seed developmental process under drought stress. An overview of gene expression profiles across different developmental stages highlighting Ca4 containing "QTL-hotspot" has been depicted in Circos plot and heatmap ( Figure 8).
3.10 | Validation of candidate genes identified from drought tolerance "QTL-hotspot" region The RNA-Seq data were analysed for global gene expression patterns, pairwise comparisons for identifying DEGs in all possible combinations, gene clustering, and pathway analysis and also to identify key candidates from drought "QTL-hotspot" region. The 32,873 genes identified in the current global gene expression analysis is considerably higher than the number of genes reported (28,269) in the chickpea genome assembly by Varshney et al. (2013). This may be due to incomplete (approximately 74%) chickpea genome sequence available.
Significantly higher number of genes than the number of genes annotated in the chickpea genome were also observed in earlier studies . The comparison of the reference-guided assembly with the available chickpea genome led to identification of 4,604 (14%) novel gene loci that are not yet annotated in the chickpea genome. Overall, these results demonstrate the potential use of RNA-Seq approach in discovery of novel genes/transcripts in the sequenced genomes as well. Similarly, novel genes were identified using RNA-Seq in other crops such as maize (Stelpflug et al., 2016) and potato (Shan et al., 2013). Furthermore, these observations confirm that the tissues and developmental stages used in this study triggered a majority of genes in the genome and provided a comprehensive resource of genome-wide gene expression patterns in chickpea. Interestingly, hierarchical clustering of all expressed genes indicated shared transcriptional patterns among spatially related tissues across different developmental stages. This could be due to shared genetic reinforcements and morphological origins (Eveland et al., 2014;Thompson & Hake, 2009;Vollbrecht & Schmidt, 2009).
Altogether, the data indicate that different samples exhibit TFs such as protein kinases have been found to regulate the photosynthetic machinery and associated metabolic pathways during plant development (Gururani, Venkatesh, & Tran, 2015;Saibo, Lourenço, & Oliveira, 2008). In the current dataset, significant number of TFs were identified to be differentially expressed. It has been previously reported that TFs of various gene families perform a crucial function in growth and development via gene regulatory networks (Garg, Bhattacharjee, & Jain, 2015;Shinozaki & Yamaguchi-Shinozaki, 2007 (Miyashita & Good, 2008). Cysteine and methionine metabolism regulates plant growth and development and are also involved in seed development process. Methionine is glucogenic and an essential amino acid. It serves as a precursor for the synthesis of cystine and cysteine which are non-essential amino acids. Cystine and cysteine are interconvertible. S-adenosylmethionine is the precursor for the synthesis of plant hormone, ethylene, which regulates plant growth and development. It is reported that a member of this pathway, S-adenosylmethionine, is the precursor for the synthesis of plant hormone and ethylene, which is involved in ripening of fruits (Tan, Li, & Wang, 2013). The acquisition of sulphur by plants has become a major concern for the agriculture due to the decreasing trends of S-emissions from industrial sources and the consequent limitation of inputs from deposition. Cysteine is the first committed molecule in plant metabolism that contains both sulphur and nitrogen, and the regulation of its biosynthesis is of very important for the synthesis of a number of essential metabolites in plant metabolic pathways. Several studies reported that plants adapt inorganic sulphur and metabolize it further to organic sulphur compounds essential for plant growth, development, and stress mitigation (Kopriva et al., 2016). In the present study, sulphur metabolism has been identified as one of the important metabolic pathways triggered involving several enzymes such as nucleotidase, sulphur transferase, synthase, and kinase. Overall, pathway analysis results suggest the involvement of complex transcriptional regulation of various pathways governing many aspects of growth and development of the plant. Integration of these transcriptome data with other omics and genetics data can help in further selection and pin down the important candidate genes for functional analysis.
In chickpea, drought tolerance mechanism and yield enhancement under drought conditions has been well studied (Kale et al., 2015;Varshney et al., 2014). A large number of QTLs for several drought tolerance related traits (e.g., root depth, root biomass, and root length density) and yield-related traits (e.g., 100-seed weight, pods per plant, and seeds per pod) were identified by Varshney et al. (2014). In this context, identification and validation of expression patterns of genes involved in root and seed development was considered in the present study. WRKY (Ca_00420) and MYB (Ca_19393, Ca_03266); glutathione S-amino-terminal domain (Ca_03442, Ca_08920, Ca_14581, Ca_08920); AP2-domain class TFs (Ca_06032, Ca_06034, Ca_21773); and LRR receptor-like (Ca_22105, Ca_22057, Ca_26423) showed increased expression in root tissues across different developmental stages-Ger_Radicle, Sed_Primary Root, Veg_Root, Rep_Root, Sen_Root, Rep_Nodules, and Sen_Nodules. These genes were identified to be involved in root developmental process. Similar studies involving LRR and AP2-domain class have been reported earlier (see Mai et al., 2014). In addition, 26 candidate genes reported from the refined "QTL-hotspot" region responsible for drought tolerance using high density QTL mapping and gene enrichment analysis by Kale et al. (2015) were also considered in the present study to observe expression patterns across different developmental stages. QTLs identified in the earlier study represent root and yield related traits and hence tissues representing flower, bud, seed, and root at different developmental stages of the plant were studied. A subset of 12 genes from these 26 genes that are in common from high density QTL mapping and gene enrichment analysis were prioritized (Kale et al., 2015).
Among the subset of 12 genes, nine genes showing high differential expression in the present study were validated using qRT-PCR. Four genes, E3 ubiquitin-protein ligase, LRX 2, kinase interacting (KIP1-like) family, and homocysteine S-methyltransferase showed induced expression in root tissues under stress conditions at different developmental stages (vegetative, reproductive, and senescence) thus validating the RNA-Seq results emphasizing their important role in drought tolerance mechanism. Similar expression patterns of genes were evidenced by Kale et al. (2015). The Ca_04561 gene encoding E3 ubiquitin-protein ligase is known to be important post translational modification to regulate growth and development in all eukaryotes (Pokhilko et al., 2011;Thomann et al., 2005). LRX proteins are known to modify cell wall composition and influence plant growth (Draeger et al., 2015). Earlier studies demonstrated that protein kinases play a crucial role in abiotic stresses such as drought and salinity (Yang et al., 2008). Induced expression and the role of homocysteine Smethyltransferase under drought stress is well studied (Kale et al., 2015;Mohammadi, Moieni, Hiraga, & Komatsu, 2012;Wang, Cai, Xu, Wang, & Dai, 2016). The role of key genes in growth and development identified in the present study such as E3 ubiquitin-protein ligase, LRX, and protein kinases were supported by earlier studies in different plant species such as Arabidopsis and rice (Draeger et al., 2015;Pokhilko et al., 2011;Wang et al., 2016).
It has long been understood that cell growth and differentiation in plants are under genetic control. The availability of new tools and technologies allowed biologists to recognize genes that control various aspects of growth and development. In summary, the gene expression atlas provides a robust assessment of gene expression patterns across a wide developmental series of chickpea. The CaGEA facilitated identification of candidate genes of agronomic importance for possible deployment in chickpea breeding programme. This has been demonstrated by identification of key candidates involved in floral development, seed development, root development, and nodulation process providing potential genes for future studies.
Furthermore, this resource could also be utilized to identify and validate drought tolerance responsive genes crucial for GAB in chickpea.
Thus, this could be a valuable resource for basic and applied research for crop improvement in chickpea and other food legumes to understand key genes and genetic regulatory mechanisms involved in growth and development.