Association of variation in the sugarcane transcriptome with sugar content

Sugarcane is a major crop of the tropics cultivated mainly for its high sucrose content. The crop is genetically less explored due to its complex polyploid genome. Sucrose synthesis and accumulation are complex processes influenced by physiological, biochemical and genetic factors, and the growth environment. The recent focus on the crop for fibre and biofuel has led to a renewed interest on understanding the molecular basis of sucrose and biomass traits. This transcriptome study aimed to identify genes that are associated with and differentially regulated during sucrose synthesis and accumulation in the mature stage of sugarcane. Patterns of gene expression in high and low sugar genotypes as well as mature and immature culm tissues were studied using RNA-Seq of culm transcriptomes. In this study, 28 RNA-Seq libraries from 14 genotypes of sugarcane differing in their sucrose content were used for studying the transcriptional basis of sucrose accumulation. Differential gene expression studies were performed using SoGI (Saccharum officinarum Gene Index, 3.0), SAS (sugarcane assembled sequences) of sugarcane EST database (SUCEST) and SUGIT, a sugarcane Iso-Seq transcriptome database. In total, about 34,476 genes were found to be differentially expressed between high and low sugar genotypes with the SoGI database, 20,487 genes with the SAS database and 18,543 genes with the SUGIT database at FDR < 0.01, using the Baggerley’s test. Further, differential gene expression analyses were conducted between immature (top) and mature (bottom) tissues of the culm. The DEGs were functionally annotated using GO classification and the genes consistently associated with sucrose accumulation were identified. The large number of DEGs may be due to the large number of genes that influence sucrose content or are regulated by sucrose content. These results indicate that apart from being a primary metabolite and storage and transport sugar, sucrose may serve as a signalling molecule that regulates many aspects of growth and development in sugarcane. Further studies are needed to confirm if sucrose regulates the expression of the identified DEGs or vice versa. The DEGs identified in this study may lead to identification of genes/pathways regulating sucrose accumulation and/or regulated by sucrose levels in sugarcane. We propose identifying the master regulators of sucrose if any in the future.


Background
Among the domesticated grasses, sugarcane and sweet sorghum have undergone extensive selection for high accumulation of sucrose that serves as the primary sources of sugars for human and animal consumption, as well as ethanol production for fuel [1].The maturing sugarcane culm represents both an economically important and physiologically interesting experimental system to study the dynamics of carbohydrate partitioning and metabolism associated with the accumulation of high concentrations of sucrose. A distinctive feature of sugarcane is that high levels of sucrose storage occurs only in the culm parenchyma cells as against in other plants where storage of sugar or other storage molecule/s occurs in terminal sink organs such as tubers, grains, or fleshy fruits. Sucrose concentration that peaks in the sugarcane culm during the end of the vegetative cycle (called ripening) is utilized for the sexual reproductive phase and the remaining reserve is re-mobilized to produce new vegetative structures unlike the pattern in monocarpic annuals where there is a single cycle of storage and utilization for the reproductive phase [2]. In addition, sucrose is the only major form in which reduced carbon is exported from the source and hence all cellular processes outside the source are dependent on the mobilisation and utilisation of sucrose. Sucrose is the dominant storage reserve in sugarcane in contrast to most other plant stems that store polysaccharides such as starch or fructans with a low concentration of sucrose. As sugarcane matures, there is a shift in carbon partitioning from that of insoluble and respiratory components towards the osmotically active sucrose [3].
Although sugarcane stores the highest concentration (reaching about 0.7 M) of sucrose in the plant kingdom, studies on the physiological, biochemical and genetic basis of sucrose synthesis and accumulation have been limited compared to those in model plants like Arabidopsis or rice that do not accumulate high levels of sucrose. There are very few studies of sucrose accumulation primarily focusing on the sugarcane culm. Often these studies in sugarcane have reported a network of genes related to cell wall metabolism, carbohydrate metabolism, stress responses and regulatory processes [4][5][6][7][8][9][10][11]. Microarray analysis of sugarcane genotypes that varied in sucrose content revealed that many of the genes associated with high sucrose content showed overlap with drought data sets, but appeared to be mostly independent from abscisic acid signalling [12]. A large expressed sequence tag (EST) study of the sugarcane transcriptome and physiological, developmental and tissue-specific gene regulation was initiated in Brazil [13]. Sugarcane cultivars differing in both maximum sucrose accumulation (in Brix) capacity and accumulation dynamics during growth and culm maturation were studied cDNA microarrays and developmentally regulated genes related to hormone signalling, stress response, sugar transport, lignin biosynthesis and fibre content were identified [12]. An expression profiling of a set of genes associated with sucrose accumulation was studied using quantitative real time reverse transcription PCR (qRT-PCR) in 13 genotypes of sugarcane and its progenitor species including S. officinarum, S. spontaneum and related genera Erianthus arundinaceus [14]. High brix genotypes exhibited increased expression of sucrose nonfermenting related kinases and cellulose synthases in an expression study comparing high and low brix genotypes of sugarcane using qRT-PCR [15]. In another transcriptome study using next generation sequencing (NGS) [16] enrichment of transcripts involved in a network of sucrose synthesis, accumulation, storage and retention in relation to the agronomic characteristics of the genotypes contrasting for rust resistance was observed. Casu et al. [9] proposed that sucrose accumulation may be regulated by a network of genes induced during culm maturation which included clusters of genes with roles that contribute to key physiological processes including sugar translocation and transport, fibre synthesis, membrane transport, vacuole development and function, and abiotic stress tolerance. These studies show that the sugarcane culm is a composite organ associated with numerous diverse functions other than sucrose storage. A gene networking pattern involving genes associated with culm maturation and sucrose accumulation, sugar transport, vacuole development, lignification, suberisation and abiotic-stress tolerance can be inferred from these studies. The present study aimed to identifying transcripts that were associated with sucrose accumulation using a set of seven high sugar and seven low sugar genotypes by expression profiling of mature and immature culm tissues and bioinformatic analyses of culm transcriptomes. The upregulation of several thousands of transcripts associated with sucrose biosynthesis was demonstrated in the high sugar, and maturing culm of sugarcane. This is the first transcriptome study showing the association of expression of a large number of genes with sucrose synthesis and accumulation in the sugarcane culm tissue.

Plant material and phenotypic data collection
Fully grown, disease free 12 months old plants grown in the field in a randomized complete block design were selected for analysis. The genotypes were derived from a sugarcane population provided by Sugar Research Australia (SRA), Brisbane, Australia, previously described in [17]. Sugar content measured as Brix (a measure of the soluble solids in sugarcane juice) was used for classifying the genotypes as high and low sugar genotypes ( Table 1). The low sugar genotypes had a Brix range of 17-18.4 while the high sugar genotypes had a Brix range of 19.4-21.4. The Brix at the point of collection was used for defining the high and low sugar groupings. These genotypes may have high or low sugar content in other environments. A wide variation in sugar content was not obtained as these genotypes were commercial cultivars and introgression lines in the breeding pipeline with a sugar content above 16 and fibre content below 15 on a fresh weight basis. Culm samples (from both top and bottom tissues, the 4th internode from top and 3rd internode from the bottom of the cane) were collected from four representative stalks and pooled for each internode sample. All samples were collected between 10 am to 2 pm to limit the diurnal fluctuations in the transcriptome. After collection, the samples were immediately flash frozen in liquid nitrogen and stored at −80°C until RNA extraction. In addition, HPLC (high performance liquid chromatography) and NIR (near infrared spectroscopy) was used to measure the sugar composition and fibre content on a fresh weight basis. A sub-sample of each genotype was processed through a mechanical grinder, a component of the SpectraCane system (Biolab, Australia) and scanned by NIR for fibre content, Brix and sugar content (commercial cane sugar -CCS). For details see Additional file 1: Tables S1a, S2, S3; Figure S1.

Sample collection and preparation for RNA-Seq
The frozen sugarcane samples were pulverized using a Retsch TissueLyser (Retsch, Haan, Germany) at a frequency of 30/S for 1 min 30 s and about 1 g of ground sample powder was used for RNA extraction. RNA extractions were conducted as described by Furtado et al. (2014) [18] employing a Trizol kit (Invitrogen) and a Qiagen RNeasy Plant minikit (#74134, Qiagen, Valencia, CA, USA). For RNA quality and quantity assessment, a NanoDrop8000 spectrophotometer (ThermoFisher Scientific, Wilmington, DE, USA), and an Agilent Bioanalyser 2100 with the Agilent RNA 6000 Nano kit (Agilent Technologies, Santa Clara, CA, USA) were used. Only RNA samples with a RIN value of >7.5 were chosen for library preparations. About 3 μg each of 28 internodal RNA samples was used for indexed-library preparation (average insert size of 200 bp) with a TruSeq stranded with Ribo-Zero Plant Library Prep Kit for preparing total RNA library (Illumina Inc.) as described in [19]. The library was subjected to sequencing in two lanes (equimolar) using an Illumina HiSeq4000 instrument to obtain paired-end (PE) read of 150 bp. The library preparation and sequencing was conducted at the Translational Research Institute, The University of Queensland, Australia.

RNA-Seq data processing
Read adapter and quality trimming were performed in CLC Genomics Workbench v9.0 (CLC-GWB, CLC Bio-Qiagen, Aarhus, Denmark) with a quality score limit of <0.01 (equivalent to Phred Q score ≥ 20), and allowing a maximum of two ambiguous nucleotides. Only PE reads with a length ≥ 35 bp were kept for further analyses. Further information on the RNA Bioanalyser profiles, raw RNA-Seq reads, trimming, quality parameters including size distribution and GC content is described in detail in [19] and in Additional file 1; Table S1b. Table 2 gives the details of reads from each genotype (top and bottom internode tissues) after quality trimming.

Differential gene expression (DGE) analyses
Using the CLC-GWB v9.5.1 software, RNA-Seq experiments were performed with a minimum length fraction of 0.9 and a minimum similarity fraction of 0.8. The  [20]. The CLC-GWB provides a comprehensive RNA-Seq tool for differential gene expression accompanied by statistical analyses. The Baggerley's test that is used in this case [21] is the proportion-based statistical analysis that uses raw count data (un-transformed, not-previously-normalized) as input for setting up the experiment and uses total or unique gene/exon reads for calculating the differentially expressed genes. This test compares counts by considering the proportions that the counts for each gene make-up of the total sum of counts in each group. That is, it takes into account the proportion of every genotype in a group for a gene to be considered as differentially expressed. When Edge test [22] in CLC-GWB (an equivalent tool to EdgeR available in R Package v3.4.0) was used, this consistency (of differential expression across all genotypes in a group) was not observed. Similarly, the Differential expression for RNA-Seq tool available in the recent version of CLC GWB 10.1.1 gave a different set of results for the DGE experiments (data not shown here) and hence was not included for further analyses. As sugarcane genotypes differ genetically among and between each other, the criterion was to select only those genes that were differentially expressed despite the genetic differences inherent to the genotypes. For example, a gene was considered differentially expressed only when it is consistently differentially expressed in all the seven genotypes in one group in comparison with all the seven genotypes in the other group. Further the Baggerley test also corrects for the differences in the sample sizes (within and between library variations) by comparing the expression levels at the level of proportions rather than raw counts [21]; CLC manual).
The reads for each genotype in the high and low sugar groups were separately mapped against reference databases, the Saccharum officinarum gene indices (SoGI), the sugarcane Iso-Seq transcriptome database (SUGIT, TSA accession number GFHJ01000000) and the sugarcane assembled sequences (SAS) from the sugarcane expressed sequence tags database (SUCEST). The SoGI database was downloaded from the DFCI gene indices [23] which had adequate gene or protein function descriptions. In the present case, the SoGI dataset represented 282,683 ESTs that resulted in 121,342 unique sequences after clustering. A collection of ∼240,000 ESTs generated by the SUCEST project from 26 cDNA libraries from different sugarcane tissues sampled at various developmental stages [24] were assembled into 43,141 distinct contigs using CAP3 [25]. This set of 43,141 contigs make up the SAS database. The SAS database was not annotated and the annotation was performed using the BLASTX against the nr protein database with an e value of 10-5 , for 100 hits using the high-performance computing facility (HPC), at The University of Queensland, Australia. In addition, we used a newly constructed SUGIT, sugarcane long reads database described in [26]. In brief, the database was derived from a pooled RNA sample collection including those genotypes used in this study, plus leaf and root tissue samples of 22 commercial and introgressed sugarcane genotypes. The basic descriptions of the databases are given in Table 3 and the methodology is summarised in Fig. 1.

Identification of differentially expressed transcripts
For all the RNA-Seq experiments, involving high and low sugar groups, low sugar samples were used as the references, for comparing top and bottom (immature and mature culm tissues respectively), bottom internode sample was used as the reference for identifying DEGs that were upregulated or down regulated. This means, if one transcript was up-regulated in the reference group, it was down-regulated in the group being compared, and vice versa. Proportion based statistical analysis (Baggerley's Test) and a Volcano plot were used to compare gene expression levels in the two groups that were considered for differential gene expression (high and low sugar, top and bottom internode samples) in terms of the log 2 fold change (at FDR 0.01). The DEGs were further sorted and selected at three different fold change levels, i.e., above and equal to 2, above and equal to 10 fold and below 2  SoGI-Saccharum officinarum gene indices; SUGIT-Sugarcane Iso-Seq transcriptome database; SAS-sugarcane assembled sequences; bp-base pairs fold change to identify highly expressed and those expressed at low levels.

Functional annotation of identified differentially expressed transcripts
Functional annotation of the transcripts was performed using MapMan categories [27] using BlastX (e-value ≤10 −5 , with a cut off value of 80% similarity) against Arabidopsis thaliana and Oryza sp. and SwissProt/UniProt Plant Proteins. In addition Blast2GO [28] followed by KEGG pathway mapping analyses were performed for the DEGs.
Validation of gene expression using quantitative real-time PCR (qPCR) In addition, a correlation analysis was performed to validate the expression levels of eight selected transcripts from the RNA-Seq analyses in this study using qPCR expression values of the same transcripts extracted from a separate study [29]. The RPKM values obtained for four tissue samples (two top and two bottom internodes) of two genotypes (QC02-402 and QN05-803) were correlated against the respective qPCR expression values (Cq qPCR normalised gene expression), using Microsoft Excel 2013.

RNA-Seq analyses and identification of differentially expressed genes
The mapping of reads to each reference database is shown in the   Table 5 for details). However, in the SAS-DGE, there were more DEGs in the HST vs HSB comparison, with 2826 DEGs. This comparison resulted in only 21 and 31 DEGs with the SUGIT and SoGI DGEs respectively (Fig. 3). In addition, the common and unique transcripts among the three comparisons in three different DGEs were found (Figs. 4 and 5). For additional information on experimental set up and statistical analyses, see Additional files 2: Table S4-S6 (complete list of DEGs in the SOGI-DGE), Additional files 3: Table S7-S9 (complete list of DEGs in the SUGIT-DGE), and Additional files 4: Table S10-S12 (complete list of DEGs in the SAS-DGE).
The results of the qPCR expression values were found to be significantly correlated with the RPKM values for selected genes (r = 0.629, p < 0.001, n = 32, df = 30). The details of qPCR validation analysis are provided in Additional file 5: Table S13 and Figure S2.

Identification of consistently differentially expressed transcripts between high and low sugar genotypes
The results of the DGE analyses are given in the Table 5. The DEGs at different fold change cut off values, i.e. ≥2, ≥10 and <2 fold changes were identified. This resulted in the identification of DEGs that are expressed at high levels (10 fold and above), low levels (<2) apart from the cut off of 2 and above ( Table 6). In addition, to check for specific sucrose/sugar related transcripts, filtering was done for "sucrose" and "sugar" as key words in the DGE experiment files as the DEGs were in large numbers. Although some transcripts related to sucrose/sugar may have been missed, this approach helped screening the large number of DEGs. At the fold change value of 2 and above, the sucrose and sugar related genes were 63, 68 and 49 in HSB vs LSB and 75, 74, and 60 in in LST vs LSB using SoGI-DGE, SUGIT-   Tables S14-22 and  some are listed in the Tables 7, 8 and 9. At the fold change value of 10 and above, the sucrose/sugar related transcripts were very few in number and included sucrose synthase (SuSy), sucrose transporter (SuT), sucrose phosphate synthase (SPS) and a SWEET transporter (Table 6). Further, SuSy2 and SuT3 were consistently present in all three sets of DEGs for LST vs LSB, at the maximum fold change value of 10 and above, showing upregulation in LST.
In HSB vs LSB, SuSy 2 and SuT 2 were observed in SoGI-DGE and SUGIT-DGE, upregulated in HSB, whereas no sucrose/sugar related transcripts were present in SAS-DGE at this fold change. At the fold change value of below 2, there were no sucrose/sugar related transcripts for these two comparisons in any of the DGEs. In HST vs HSB, sucrose/sugar related transcripts were not found in SoGI-and SUGIT-DGEs, however, at the fold change cut off value of  (Table 6). Overall, genes specific to sucrose synthesis and accumulation were enriched in the HSB vs LSB and LST vs LSB experiments, while genes for secondary metabolites, were found to be enriched in the case of the HST vs HSB comparison. There were no DEGs in HST vs LST experiment in the three DGE analyses.

Gene ontology annotation
The gene ontology annotation using MapMan resulted in grouping and classification of the DEGs into different functional categories. The DGE analysis between LST and LSB was almost similar in number and composition to the DEGs obtained between HSB vs LSB (Additional file 7: Figure S3, Table S23).

Upregulated transcripts in high sugar genotypes when compared with low sugar genotypes
The DGE analyses of HSB vs LSB and LST vs LSB were showing a similar trend and the two sets of DEGs had an extensive overlap (see Additional file 7: Figure S3). About 89.3% (with SoGI) of the transcripts differentially expressed were similar in both the comparisons (63% in SUGIT and 96.8% in case of SAS). Hence only the DEGs of HSB vs LSB is considered for further discussion. Only a few transcripts are discussed here. For a complete list DEGs of all the three DGE analyses, refer Additional files 2, 3 and 4: Tables S4-S12. In addition, a list of unique and commonly expressed transcripts in each group was prepared (Additional File 8: Tables S24-28) for

Sucrose, starch and other sugar derivatives
In the SoGI-DGE, there were 71 sucrose related transcripts consisting of sucrose synthases 2 and 3, sucrose phosphate  SoGI-Saccharum officinarum gene indices; SUGIT-Sugarcane Iso-Seq transcriptome database; SAS-sugarcane assembled sequences synthase (SPS) 2 and 3, sucrose phosphate phosphatase (SPP), sucrose non-fermenting related protein kinases; impaired sucrose induction 1-like protein and sucrose transporters (SuT) 2 and 4. About 22 transcripts were sugar related including transport, efflux, and glycosyltransferases. Ten transcripts were related to alkaline/neutral invertases and three transcripts with homology to sucrase from Oryza sativa were found. There were ten high-glucose regulated protein 8-like transcripts. Forty six transcripts, were related to intermediary metabolism of fructose phosphates, the most expressed being fructose-bisphosphate aldolase cytoplasmic isozyme. Sixteen transcripts were related to xylose metabolism and β-glucosidase related transcripts were observed. Fifteen hexose related transcripts were transporters, while 18 transcripts were related to triose phosphates metabolism. Fifty three UDP-related transcripts were found, out of which six were UDP-glucose-dehydrogenases. There were also UDP-sugar, arabinose, xylose, galactose transporters, −epimerases and -pyrophosphorylase related transcripts. Fucoses are hexose sugars and nine transcripts associated with them include fucosidases and fucosyltransferases. Thirteen mannose, trehalose and sorbitol related transcripts were found. Glucans metabolizing genes were another prominent group found to be highly expressed with 45 transcripts including β-1, 4 glucan synthases and endoglucanases. Nine transcripts related to alpha amylases were also upregulated in high sugar genotypes. In addition, 85 transcripts were found to be related to kinases including hexokinases, fructokinases (1, 2 and 3), phosphofructokinases, carbohydrate kinases and galactokinases. In the SUGIT-DGE, there were 208 sucrose related transcripts. In addition to the transcripts observed in the SoGI-DGE, sugar transport 5 and 7, sugar transporter ERD6 like, bidirectional sugar transporter SWEET1 and 4 like, and an abundance of ABC transporters B, C, D, E, G, F, and I for sugar were found in SUGIT-DGE. In the SAS-DGE, 75 transcripts were related to sucrose consisting of galactinol-sucrose galactosyltransferase 1,2 and 6, sucrose transporters SUC3 and its isoform X2, SUC4, SPP 2 and SPS 1, 3 and 4, bidirectional sugar transporter2a, 4, 14 and 16, sugar transporter ERD6-like 5, 6 and 16, sugar transport 5 and its isoform X1, 7 and 9 transcripts for sugar phosphate phosphate translocator. Interestingly one transcript for invertase inhibitor and one transcript for sulfofructose kinase like transcript which were not detected in the other two DGEs were found. Starch synthases II b and c, III, IV and starch branching and debranching (pullulanase and isoamylase) enzymes were found to be upregulated. The KEGG pathway map for starch and sucrose related DEGs are shown in the Additional file 9: Figure S4.  SoGI-Saccharum officinarum gene indices; SUGIT-Sugarcane Iso-Seq transcriptome database; SAS-sugarcane assembled sequences pyrophosphatase were found to be upregulated. A transcript was found to match the bacterial sugar transport system probably due to contaminating sequences. An abundance of ABC transporters could be observed in all DGEs.

Organellar
Transcripts related to the chloroplast, notably chloroplastic group IIB intron splicing facilitator CRS2, alpha-glucan water dikinase, rubisco large subunit alpha binding, chloroplast post-illumination chlorophyll fluorescence increase protein, starch synthases II b and c, III, IV and starch branching and debranching (pullulanase and isoamylase) enzymes to name a few from the three DGEs. The ribosomal proteins were one of the most upregulated transcripts in all the DGEs comprising of nuclear, cytoplasmic, chloroplast and mitochondrial ribosome related functions especially of 30S, 40S, 50S, and 60S and acidic ribosomal transcripts.

Senescence/ripening/stress
Transcripts related to senescence including senescenceinducible chloroplast stay-green protein and leaf senescence proteins, senescence-inducible chloroplast stay-green protein, heat shock related transcripts of DNA and chloroplast, wound inducible protein, ripening ABA induced, autophagy, programmed cell death, cell death related protein, and defender against cell death, vascular death associated transcript were found. Transcripts were related to stress (light, water, heat, salt, ozone-responsive, bio-stress) and pathogenesis related transcripts, hypersensitive induced response proteins, 22 kDa drought inducible proteins, dehydrins and transcripts related to proline were found to be upregulated.

Flowering
Flowering related transcripts including pistil, pollen, immature pollen, flowering-time protein isoforms, phytochrome and flowering time, flowering locus, GIGANTEA, OVA4 ovule abortion 4, and fertilization independent were upregulated in high sugar genotypes. Proteins related to the egg apparatus, seed maturation, shrunken seed and seed starch branching enzyme related transcripts were upregulated. HASTY 1 flower development, agamous-like MADS box AGL12, photoperiod-independent early flowering 1, early flowering 3, flowering time control FY, luminidependens are some of the flowering related transcripts found across the DEGs.

Signalling
Transcripts of signalling related to DNA damage, signal recognition, pollen, and integral membrane, 14-3-3 like proteins. Out of a large number of kinases, serine/threonine phosphatases, appeared to have a dominant role during sucrose accumulation. Also, it was observed that several signalling events can be inter related with others from the pattern of gene expression observed to be upregulated in high sugar genotypes (Additional file 9: Figure S5).

Fibre/cellulose
In

Uncharacterized
Interestingly, in SoGI-DGE, about 6552 transcripts were found to match the chromosomal regions of Vitis vinifera (SoGI annotation) which are whole genome shotgun sequences. In SUGIT-DGE, 243 transcripts were uncharacterized and in SAS-DGE, 320 transcripts were found to be uncharacterized.

Down regulated transcripts in high sugar genotypes
The transcripts down regulated in high sugar genotypes included 17S, 18S, 26S, ribosomal RNA genes, cytochrome P450, and photosystem I 700, a stem specific transcript and leaf specific transcript from Saccharum hybrid cultivar, rRNA intron encoded homing endonuclease, zinc finger protein and uncharacterized transcripts in the three DGEs.

Discussion
Two groups of genotypes, high sugar and low sugar, were formed based on the sugar content in terms of Brix as in sugarcane most of the soluble solids in the juice (70-91%) correspond to sucrose [12,30]. Differential expression of genes was studied between the two groups and between top and bottom internodal samples (immature and mature) of the two groups. Therefore, gene expression changes were studied among high sugar top internode (HST), high sugar bottom internode (HSB), low sugar top internode (LST) and low sugar bottom internode (LSB)samples in various comparisons. Thus, the HST vs LST and HSB vs LSB were comparisons between the high and low sugar genotypes, whereas HST vs HSB and LST vs LSB were comparisons between top and bottom intermodal samples. For the DGE analyses, three databases were used as references individually wherein a large number of DEGs were identified from each. The databases were chosen to be specific for sugarcane. SoGI and SAS are derived from 26 different cDNA libraries [24] as a result, a large number of DEGs where obtained. The SUCEST database which encompasses SoGI and SAS is reported to cover >90% of the sugarcane genes [31]. The SUGIT database is essentially a long reads database sequenced using the latest Iso-Seq technology [17] which can further be used for refining the DEGs for isoform/allelic information. This database covers approximately 71% of the total predicted genes in sugarcane [17]. The common and unique transcripts from each database are not discussed further as the main objective of this paper was to find the DGE for sugar content. A subset of sucrose /sugar related DEGs were derived, which is interesting as several other studies on sucrose accumulation in sugarcane reported that sucrose related genes were less abundant or not expressed during the maturation stage [6,12,32]. There were approximately 70 transcripts related to sugar/sucrose in each DGE. Sucrose synthase (SuSy) and sucrose transporters (SuTs) were consistently found to be highly expressed in high sugar genotypes. Similar association was reported in [6,8,14]. The identity of the exact isoform of these two genes could not be found due to the varying annotations of the three databases used, which needs further studies. SuSy is reported to contribute to increasing the sink capacity, building cell wall materials and starch while sucrose transporters facilitate transportation of sucrose that leads to steady increase in sucrose content [30]. Further work on the isoforms/allelic expression of these genes would certainly be useful for understanding the finer details of their regulatory roles. The functioning of the two sucrose synthesis enzymes, SuSy and SPS and their regulation, has not yet been well demonstrated in sugarcane. SPS, sucrose non-fermenting related kinases, bidirectional sugar transporter SWEET, UDPsugar pyrophosphorylase, impaired sucrose induction 1 -like proteins were the other genes that were consistently present at lower fold changes. Interestingly, an invertase inhibitor gene was found to be highly expressed in LST (13 folds) in LST vs LSB in SAS-DGE. Invertase inhibitors have been previously reported to be highly expressed during the sucrose accumulation stages in sugarcane [33]. In addition to the above genes, the gene expression pattern in our study reveals a clear association between different gene networks during sucrose accumulation similar to earlier reports [9,12]. It is possible to make a direct parallel between sucrose content and gene expression levels for almost all the DEGs though the difference in the sugar content between the two groups is very narrow. Sucrose is a carbohydrate compound and was originally recognized only as an energy source for metabolism in plants but was recently shown to also function as a signalling molecule involved in regulation of various physiological processes in plants such as root growth, fruit development and ripening, and hypocotyl elongation [34]. Sugars serve as key components reflecting the plant's energy status and, therefore, the ability to continuously sense sugar levels and control energy status is a key to survival and therefore transcript levels of thousands of genes respond to changing sugar levels [34]. Further, different sugars can have different regulatory roles in physiological processes, and the developmental stage of the plant further determines the response to sugars [35][36][37]. Recently, it was observed that glucose facilitates the juvenile to adult phase change in Arabidopsis by repressing microRNA (miRNA) 156 expression [38][39][40]. Consequently, mutants in sugar signalling or starch metabolism display an altered juvenile phase [38]. At high concentrations, sugars can induce meristem quiescence as observed in the arrest of development of seedlings germinated on high sugar levels [34]. Sugar induced quiescence of the stem can be seen through the expression of several transcripts for no apical meristem and indeterminate spikelet transcripts in addition to senescence related transcripts. Transcripts related to less abundant and lignocellulosic sugars identified in this study included xylose, trehalose, galactose, arabinose, fucose, mannose, taurine and the sugar alcohols inositol and sorbitol. The constant synthesis and breakdown of sucrose into its hexose components helps regulating various physiological events associated with these less abundant sugars and maintain a reserve for tackling any stimuli including the accumulation of sugar in the form of sucrose. It could be possible that breeding programmes for high sucrose genotypes have resulted in selection for these sugars (total sugars, in addition to sucrose) and gene expression changes of certain regulatory genes [15]. Therefore, diverse phenotypes may stem from multiple effects of sucrose and other sugars as signal and storage compounds when accumulated in various developmental and compartmental patterns resulting from differential gene expression and regulation.
The vacuole occupies as much as 90% of most mature cells and can accumulate and store sucrose, glucose and fructose and serves as a primary pool of free calcium ions in plant cells. Furthermore, the space-filling function of the vacuole is essential for cell growth, as the cell enlargement is mainly through the expansion of the vacuole rather than of the cytoplasm [41]. A vast majority of the differentially expressed gene transcripts were vacuole related including aquaporins, glucans related, aspartic proteinases, endopeptidases, ABC transporters, TIPs, V-ATP synthases, vacuolar protein sorting proteins, proton pumps, Ca 2+ ATPases, calmodulins, that showed higher expression levels in high sugar genotypes. Further molecular characterization of vacuolar and tonoplast sugar transporters should advance our understanding of vacuole function, sugar transport and sugar accumulation in sugarcane.
Several transcripts related to plant defense, wounding, and disease were upregulated in high sugar genotypes together with the ripening and senescence related transcripts. Further, water stress and dehydration related gene transcripts were upregulated in the high sugar genotypes. Apart from the ripening and senescence related transcripts which indicate the physiological state of the stem, the up regulation of transcripts encoding plant disease resistance proteins suggests that the defense system of sugarcane was activated by high sugar levels which might contribute to protecting from the extreme stress caused by the high sucrose levels during the maturity stages. It may also create steep osmotic gradients between compartments with varying sucrose concentrations (more negative than −2.0 MPa during sucrose accumulation). The increased commitment to fibre synthesis in the maturing stem is evident in the upregulation of several fibre and cellulose related transcripts in the high sugar genotypes highlights the need to maintain the structure of the stem in conjunction with sucrose accumulation. These may act to restrict apoplastic movement of solutes between the vascular bundles and the sucrose-storage parenchyma cells [42]. Transcripts related to proline and glyoxalase were highly expressed in high sugar genotypes. The differential expression of genes related to fibre, cellulose and lignin synthesis shows that the osmotic regulation and structural maintenance as directed by the sugar levels. Though sucrose content in the sugarcane culm ranges from 14 to 42% of the culm dry weight [3], the majority of carbohydrates in sugarcane is lignocellulose, a major component in the cell wall. As cell elongation and sucrose accumulation ceases in the maturing sugarcane internodes, there is a major increase in cell wall thickening and lignification [43]. Cellulose accounts for about 42-43% in sugarcane and energy cane cultivars [44] and can be a prominent competing sink for carbon in sugarcane. Cellulose synthases 1, 2, 3, 4, 5, 6, 7, and 9 along with a novel transcript matching for a cement protein like gene that is upregulated in high sugar genotypes indicates that there are several aspects of sugarcane cell wall composition remain to be explored [45]. S-adenosylmethionine (SAM) produced by SAM-synthase is required as the methylation donor in lignin and suberin biosynthesis and secondary metabolism. It is also required as a precursor for SAM-decarboxylase, which is also up-regulated and important in polyamine synthesis, a response to osmotic stress. Elevated SPS activity is consistently correlated with high rates of cellulose synthesis and secondary wall deposition [46]. UDP-Glucose, apart from being the precursor for sucrose synthesis, is a nucleotide sugar central to diverse pathways of polysaccharide biosynthesis, leading to starch and cellulose, hemicellulose and callose synthesis. About 10 major monosaccharides in cell wall polymers are converted from glucose through UDP-Glucose related interconversion pathways. All UDP-Glucose related transcripts including UDP-Glucose dehydrogenases, pyrophosphorylases were upregulated in high sugar genotypes indicating a high correlation with sugar contents. Ethylene is often related to the lignification of plant tissues by increasing the expression of genes involved in the phenylpropanoid pathway [47]. This explains the parallel upregulation of cellulose synthases, ethylene related transcripts, as well as SPS in the DGEs. The mechanisms regulating cell wall biosynthesis and source-sink relations in sugarcane will be crucial constituents of any efforts to alter carbon partitioning between fibre and sugar in the culm. In addition, the alteration of cell wall biosynthesis genes in association with sucrose (Brix) content is an interesting indication of a correlation between these processes. Silencing or overexpression of some of these genes may lead to altered cell wall or increased sucrose content. Interestingly, when comparing two genotypes contrasting for lignin content, Vicentini et al. [45] found that a simple correlation between lignin content and differential expression of lignin genes is not always straightforward and most of the lignin biosynthetic genes did not show increased transcript levels in the high lignin genotype.
Sugar signals and the circadian clock are part of a complex network that controls floral transition. In sugarcane, sugar levels peak just before flowering induction. The signalling for senescence, arrest of apical growth, high sucrose levels and flowering induction are well coordinated. The upregulation of several flowering related genes like flowering locus D, pollen and pistil related transcripts in the high sugar genotypes clearly shows that the crop has attained its maximum sugar levels and was in a transition state to flowering though many are commercial cultivars that do not or flower rarely. Sugarcane has been selected for higher sugar content that involved strategies for delayed flowering and seed set, due to which a majority of sugarcane cultivars now are either sterile or the reproductive cycle has been delayed, or dormant for years [48]. Trehalose and its phosphate derivative trehalose-6-phosphate have recently gained importance as signalling molecules involved in carbon partitioning and also linking sugar status and diurnal rhythm to floral transition, in plants [49,50]. For example, high sucrose and trehalose-6-phosphate (T6P) levels signal a cellular sugar abundance status [37,51]. In addition to other sugar forms, the role of trehalose in sugarcane sucrose metabolism needs further studies as corroborated by the upregulation of several transcripts for trehalose phosphate synthase and trehalases.
Light interception and the stay-green trait are considered as major factors influencing the level of carbohydrates in the internodes [51]. Leaf angle is a genetic trait and higher sucrose yield in sweet sorghum can be achieved by genetic adjustment of leaf angles to optimum light interception. In addition, stay-green varieties of sweet sorghum were found to have higher stem sugar concentrations than senescing lines [52]. This may be due to the reduced need for re-mobilizing stem sucrose in addition to prolonged photosynthetic capacity [53]. Similarly, the upregulation of stay-green gene transcripts in the high sugar genotypes indicates an association between high sugar levels and higher photosynthetic capacity as the C4 enzymes are mainly localized in the chloroplasts. Further, high expression levels of photosynthetic, light harvesting, etiolation, starch, chlorophyll, gene transcripts were observed in high sugar genotypes. In addition, transcripts related to non-photosynthetic NADP malic enzymes [54] were upregulated in high sugar genotypes for which the functional significance is unknown in sugarcane. The rapid cycling of sugars in non-photosynthetic cells has been referred to as a 'futile cycle' [55] because of the continuous and simultaneous synthesis and degradation of sucrose. However, it is recognised that these cycles allow cells to respond in a highly sensitive manner to small changes in the balance between the supply of sucrose and the demand for carbon for respiration and biosynthesis and thus resulting in a strong sink [30]. This remobilisation of stored sucrose as a food supply results in rapid regrowth following stress or in germination of axillary buds of the internode [56]. Photosynthesis, growth and yield are strongly linked to N availability especially in C4 crops [57].The upregulation of N related transcripts in high sugar genotypes indicate that this is an ongoing process even if the crop has reached maturity.
The general cell related functions and growth, organellar and nuclear functions, biosynthetic pathways of pigments, amino acids, metabolites, hormonal signalling, transcription factors, various other transporters, proteins of transposons, root/stem/leaf related transcripts, were upregulated in the high sugar genotypes. The functions enriched in genes that are differentially expressed between different tissues in each comparison are consistent with the physiological changes associated with the development of that tissue, mainly sucrose content (Figs. 6 and 7). The absence of DEGs in HST vs LST suggests that the top internodes are metabolically active irrespective of their sucrose contents (i.e. high or low sugar genotype). The absence of sucrose related DEGs in HST vs HSB, where the top and bottom internodes of high sugar genotypes show almost similar expression patterns, indicates homogeneity for sucrose content throughout the culm. Further, the high sugar genotypes seem to invest in  (Table 6). Meanwhile, a large number sucrose related DEGs in LST vs LSB shows that a gradient for sucrose exists in the low sugar genotypes. This observation can be inferred in two possible ways. One is that the low sugar genotypes have an active top internode compared to bottom leading to sucrose futile cycling, resulting in less accumulation or the other way could be that the bottom internodes have slowed down metabolically over time, reaching their physiological threshold levels of sucrose. The former is unlikely as there are no acid invertases (cell wall/ vacuolar) expression observed in the DEGs which are involved in the sucrose breakdown. The latter is likely to be the reason and the bottom internodes play a major role in the sucrose content of the genotypes. Also, the bottom internode of high sugar genotypes shows high expression of sucrose related genes. Feedback inhibition or post translational regulation could possibly be involved in the low sugar genotypes having higher expression of the sucrose related genes in the top internode and in turn having a low sugar content. In addition, the low sugar genotypes could also be late maturing genotypes, as some of them are introgression lines (other than the commercial hybrids) not having an established sugar profile or maturity indices yet (for e.g., fibre: sugar ratio). Many factors besides Brix, like ratoonability, vigour, softness, several resistance mechanisms, secondary metabolites, starch, etc., may also differ among the genotypes taken that remain to be evaluated. There were 7814 transcripts unique to HSB vs LSB, and 3667 transcripts unique to LST vs LSB (of the 34,476 DEGs in HSB vs LSB and 30,809 DEGs in LST vs LSB). These transcripts may indicate tissue specificity of the genes or their isoforms which is to be explored. When these unique transcripts were filtered for sucrose/sugar genes, SPS, SPP, SuSy, and sugar transporter genes were more specific to HSB vs LSB whereas, only some of the sugar transporter genes were specific to LST vs LSB (Additional file 8: Tables S24-28).
It was proposed that sucrose accumulation may be regulated by a network of genes induced during culm maturation that contribute to key physiological processes including sugar translocation/transport, fibre synthesis, membrane transport, vacuole development and function, and abiotic stress tolerance [9,12]. We found a similar trend in this profiling study, in addition to a very high number of differentially expressed sucrose and sugar related transcripts that might help bridge missing links in the interlinking of biosynthetic pathways and their regulatory factors. It is to be studied if sucrose regulates the large number of genes or large number genes is required for controlling this trait. As sucrose emerges as a signalling molecule as seen in the recent studies [34,50], the all-pervasive nature of this sugar is likely to regulate the growth and developmental processes of the plant. It can be speculated for the presence of master switches or the major regulatory genes of this trait as further genomic information is obtained in the future. Many novel genes, like caffeoyl shikimate esterase that was recently discovered in Arabidopsis and reported to be absent in sugarcane [44] where found in the upregulated transcripts in our study. Further mining of the transcriptomes would certainly lead to new targets and new aspects for sucrose synthesis and accumulation in sugarcane.

Conclusion
The data reported here provide a comprehensive resource for sucrose related as well as culm maturation related studies in sugarcane. Further studies on a large data set with different developmental time points for genotypes contrasting for sugar content or energy canes that do not accumulate high levels of sugars should indicate targets for further biotechnological approaches. A dedicated analysis of transcription factors, and regulatory elements will further help understanding the complexity of the sugar network. Sucrose accumulation is very dynamic and unlike fruiting organs, the sugarcane culm is continuously exposed to every possible stimulus in the crop, soil and water continuum which results in a plethora of genes that are expressed at any point of time (approximately about 33,000). Although the present study identified more than 30,000 genes regulated and differentially expressed between high and low sugar genotypes, it is hard to pinpoint any particular group of genes or a gene to be responsible for the sucrose content and maintenance. Further, it is not possible for a gene to be lacking or not expressed in either of the groups as sucrose is a primary metabolite and principal transport sugar in sugarcane, which shows that the trait is quantitative and it is under transcriptional control. The machinery for sucrose synthesis is conserved across species and it is supposed that the complexity of sugarcane (See figure on previous page.) Fig. 7 Graphical representation of the expression pattern of some of SuSy transcripts in various comparisons. LST, low sugar top internode sample; LSB, low sugar bottom internode sample; HST, high sugar top internode sample; HSB, high sugar bottom internode sample; T-top tissue; B-bottom tissue. Shown here are the sucrose phosphate synthase III, sucrose non-fermented related protein kinase, sucrose transport protein, impaired sucrose induction 1-like protein transcripts from Saccharum officinarum gene indices, SoGI database, showing differential expression in top two comparisons (a) HSB vs LSB; (b) LST vs LSB while there is no differential expression in lower two comparisons (c) HST vs HSB; (d) HST vs LST at FDR < 0.01. X-axis shows the genotypes while Y-axis represents RPKM values genome must play an important role in the sucrose levels that are observed in sugarcane. With multiple forms of each enzyme, with their own isoforms, various localizations, compartmentalized processes, the availability of large vacuoles and a unique stem morphology together contributes to the sugarcane stem sucrose content. Further, the availability of multiple isoforms or alleles gives the crop the advantage of buffering against any functional disruption which is the main reason for the instability of transformation events in sugarcane [58]. With these challenges in the sugarcane crop, a multitude of strategies are required for any genetic manipulation or for identification of regulatory genes for important traits particularly sucrose.

Additional files
Additional file 1: Table S1a. Sugar profile of the genotypes taken for the study. Table S1b Quality Report of RNA Seq reads of samples used in the study. Table S2 Genotypes and their transcriptome samples taken based on sugar content (Brix). Table S3 Additional information with regard to genotypes selected. Figure S1 Graphical representation of the sugar profiles of the genotypes selected for the study. (XLSX 2248 kb) Additional file 2: Table S4. Complete list of DEGs in the SOGI-DGE.   Table S24. Common and unique transcripts between LST vs LSB and HSB vs LSB (SoGI-DGE). Additional file 9: Figure S4. Blast2GO and KEGG Mapping for DEGs in the SOGI-DGE with respect to starch and sucrose metabolism. Figure S5 Sucrose emerges as a signalling molecule regulating most of the interlinked plant functions in sugarcane. The gene expression pattern in culm tissue during sucrose accumulation in the genotypes studied reveals several networks of genes regulated by sucrose, and correlating with the sucrose content of the genotypes studied. (XLSX 469 kb)